本文介绍了根据其他文本列将数字列添加到pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个数据框:
df = pd.DataFrame([['137', 'earn'], ['158', 'earn'],['144', 'ship'],['111', 'trade'],['132', 'trade']], columns=['value', 'topic'] )
print(df)
value topic
0 137 earn
1 158 earn
2 144 ship
3 111 trade
4 132 trade
我想要一个这样的附加数字列:
And I want an additional numeric column like this:
value topic topic_id
0 137 earn 0
1 158 earn 0
2 144 ship 1
3 111 trade 2
4 132 trade 2
因此,基本上我想生成一列,该列将字符串列编码为数字值.我实现了此解决方案:
So basically I want to generate a column which encodes a string column to a numeric value. I implemented this solution:
topics_dict = {}
topics = np.unique(df['topic']).tolist()
for i in range(len(topics)):
topics_dict[topics[i]] = i
df['topic_id'] = [topics_dict[l] for l in df['topic']]
但是,我很确定有解决此问题的更优雅,更灵巧的方法,但是我无法在Google或SO上找到任何东西.我读到了有关熊猫的 get_dummies ,但这会创建多个原始列中每个不同值的列.
However, I am quite sure there is a more elegant and pandaic way to solve this but I couln't find something on Google or SO.I read about pandas' get_dummies but this creates multiple columns for each different value in the original column.
感谢您的帮助或指导!
推荐答案
您可以使用
In [63]: df['topic'].astype('category').cat.codes
Out[63]:
0 0
1 0
2 1
3 2
4 2
dtype: int8
这篇关于根据其他文本列将数字列添加到pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!