本文介绍了如何做pd.get_dummies或其他方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
实际上,我的问题是基于:
Actually,My problem is based on the :
因此,数据应为:
import pandas as pd
import io
t="""
AV4MdG6Ihowv-SKBN_nB DTP,FOOD
AV4Mc2vNhowv-SKBN_Rn Cash 1,FOOD
AV4MeisikOpWpLdepWy6 DTP,Bar
AV4MeRh6howv-SKBOBOn Cash 1,FOOD
AV4Mezwchowv-SKBOB_S DTOT,Bar
AV4MeB7yhowv-SKBOA5b DTP,Bar
"""
data_vec=pd.read_csv(io.StringIO(t),sep='\s{2,}',names=['id','source'])
data_vec
这是data_vec:
This is the data_vec:
id source
0 AV4MdG6Ihowv-SKBN_nB DTP,FOOD
1 AV4Mc2vNhowv-SKBN_Rn Cash 1,FOOD
2 AV4MeisikOpWpLdepWy6 DTP,Bar
3 AV4MeRh6howv-SKBOBOn Cash 1,FOOD
4 AV4Mezwchowv-SKBOB_S DTOT,Bar
5 AV4MeB7yhowv-SKBOA5b DTP,Bar
如果我想要如下结果:(这意味着如何向量化多个标签或类别?)
If I want the result like follow:(It means how to vectorize the mutipletags or categories ?)
_id source_Cash 1 source_DTOT source_DTP Food Bar
0 AV4MdG6Ihowv-SKBN_nB 0 0 1 1 0
1 AV4Mc2vNhowv-SKBN_Rn 1 0 0 1 0
2 AV4MeisikOpWpLdepWy6 0 0 1 0 1
3 AV4MeRh6howv-SKBOBOn 1 0 0 1 0
4 AV4Mezwchowv-SKBOB_S 0 1 0 0 1
5 AV4MeB7yhowv-SKBOA5b 0 0 1 0 1
如果重复,请警告我删除!
If it is duplicate, warn me to delete!
推荐答案
一些str.split
和pd.get_dummies
魔术,由Scott Boston 进行了改进,并从原始版本进行了改进(从原始版本开始).
A bit of str.split
and pd.get_dummies
magic, inspired by Scott Boston and improved (from original version) thanks to JohnE.
df = df.set_index('id').source.str.get_dummies(',')
df.columns = df.columns.str.split().str[0].str.lower()
df = df.add_prefix('source_').reset_index()
print(df)
id source_bar source_cash source_dtot source_dtp \
0 AV4MdG6Ihowv-SKBN_nB 0 0 0 1
1 AV4Mc2vNhowv-SKBN_Rn 0 1 0 0
2 AV4MeisikOpWpLdepWy6 1 0 0 1
3 AV4MeRh6howv-SKBOBOn 0 1 0 0
4 AV4Mezwchowv-SKBOB_S 1 0 1 0
5 AV4MeB7yhowv-SKBOA5b 1 0 0 1
source_food
0 1
1 1
2 0
3 1
4 0
5 0
这篇关于如何做pd.get_dummies或其他方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!