问题描述
我在 jupyter notebook 中工作并且有一个 Pandas 数据框data":
I am working in jupyter notebook and have a pandas dataframe "data":
Question_ID | Customer_ID | Answer
1 234 Data is very important to use because ...
2 234 We value data since we need it ...
我想通过答案"列中的文本并获取数据"一词前后的三个词.所以在这种情况下,我会得到非常重要";我们重视"、因为我们需要".
I want to go through the text in column "Answer" and get the three words before and after the word "data".So in this scenario I would have gotten "is very important"; "We value", "since we need".
在 Pandas 数据框中有什么好方法可以做到这一点吗?到目前为止,我只找到了解决方案,其中答案"将是通过 python 代码运行的自己的文件(没有熊猫数据框).虽然我意识到我需要使用 NLTK 库,但我之前没有使用过它,所以我不知道最好的方法是什么.(这是一个很好的例子 在 Python 中将单词及其前 10 个单词的上下文提取到数据框)
Is there an good way to do this within a pandas dataframe? So far I only found solutions where "Answer" would be its own file run through python code (without a pandas dataframe). While I realize that I need to use the NLTK library, I haven't used it before, so I don't know what the best approach would be. (This was a great example Extracting a word and its prior 10 word context to a dataframe in Python)
推荐答案
这可能有效:
import pandas as pd
import re
df = pd.read_csv('data.csv')
for value in df.Answer.values:
non_data = re.split('Data|data', value) # split text removing "data"
terms_list = [term for term in non_data if len(term) > 0] # skip empty terms
substrs = [term.split()[0:3] for term in terms_list] # slice and grab first three terms
result = [' '.join(term) for term in substrs] # combine the terms back into substrings
print result
输出:
['is very important']
['We value', 'since we need']
这篇关于上下文中的python pandas数据框单词:前后获取3个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!