我有大量的DataFrame
文本,我想首先对其进行训练和LDA模型。所以我做:
doc_clean = df['tweet_tokenized'].tolist()
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
lda = LdaMulticore(doc_term_matrix, id2word=dictionary, num_topics=50)
现在,我已经训练有素的
lda
,我想逐行迭代抛出df
,并将属于给定主题的每一行的概率放入其对应的列。因此,首先我创建50个零列:for i in range(50):
col_name = 'tweet_topic_'+str(i)
df[col_name] = 0
然后,我使用
iterrows()
遍历行,并使用at
方法更新值:for row_index, row in df.iterrows():
new_doc = dictionary.doc2bow(row['tweet_tokenized'])
lda_result = lda[new_doc]
for topic in lda_result:
col_name = 'tweet_topic_'+(str(topic[0]))
df.at[row_index,col_name] = topic[1]
但是它不能正常工作,并且上面50列的值都不会更改并且保持为零。
知道我该如何解决吗?
更新:
我添加了
row = row.copy()
并将at
替换为loc
,现在效果很好。所以这是工作代码:
for row_index, row in df.iterrows():
row = row.copy()
new_doc = dictionary.doc2bow(row['tweet_tokenized'])
lda_result = lda[new_doc]
for topic in lda_result:
col_name = 'tweet_topic_'+(str(topic[0]))
df.loc[row_index,col_name] = topic[1]
最佳答案
使用以下帖子中的说明,我能够解决它:
Updating value in iterrow for pandas
for row_index, row in df.iterrows():
row = row.copy()
new_doc = dictionary.doc2bow(row['tweet_tokenized'])
lda_result = lda[new_doc]
for topic in lda_result:
col_name = 'tweet_topic_'+(str(topic[0]))
df.loc[row_index,col_name] = topic[1]
关于python - Pandas .at无法正常工作并且数据框未更改,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53599629/