我有大量的DataFrame文本,我想首先对其进行训练和LDA模型。所以我做:

doc_clean = df['tweet_tokenized'].tolist()
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
lda = LdaMulticore(doc_term_matrix, id2word=dictionary, num_topics=50)


现在,我已经训练有素的lda,我想逐行迭代抛出df,并将属于给定主题的每一行的概率放入其对应的列。因此,首先我创建50个零列:

for i in range(50):
    col_name = 'tweet_topic_'+str(i)
    df[col_name] = 0


然后,我使用iterrows()遍历行,并使用at方法更新值:

for row_index, row in df.iterrows():
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.at[row_index,col_name] = topic[1]


但是它不能正常工作,并且上面50列的值都不会更改并且保持为零。

知道我该如何解决吗?

更新:
我添加了row = row.copy()并将at替换为loc,现在效果很好。

所以这是工作代码:

for row_index, row in df.iterrows():
    row = row.copy()
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.loc[row_index,col_name] = topic[1]

最佳答案

使用以下帖子中的说明,我能够解决它:

Updating value in iterrow for pandas

for row_index, row in df.iterrows():
    row = row.copy()
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.loc[row_index,col_name] = topic[1]

关于python - Pandas .at无法正常工作并且数据框未更改,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53599629/

10-12 21:58