问题描述
如果我具有以下数据,并且将其读入,则会获得.1或.2的列名作为类似的列.这是数据:
If I have the following data, and read it in, I get column names with .1 or .2 for like columns. Here is the data:
import io
dfff=io.StringIO("""address,phone,name,website,type,address,phone,name,website,type,address,phone,name,type
123 APPLE STREET,555-5555,APPLE STORE,APPLE.COM,BUSINESS,456 peach ave,777-7777,PEACH STORE,PEACH.COM,BUSINESS,789 banana rd,999-9999,banana store,BUSINESS""")
dfff=io.StringIO("""address,phone,name,website,type,address,phone,name,website,type,address,phone,name,type
123 APPLE STREET,555-5555,APPLE STORE,APPLE.COM,BUSINESS,456 peach ave,777-7777,PEACH STORE,PEACH.COM,BUSINESS,789 banana rd,999-9999,banana store,BUSINESS""")
dfff.seek(0)
newdf2=pd.read_csv(dfff)
这是输出,pandas将列重命名为具有相似列名的.1或.2.
Here is the output, pandas renames the columns to have .1 or .2 for similar column names.
newdf2
# address phone name website type address.1 phone.1 name.1 website.1 type.1 address.2 phone.2 name.2 type.2
#0 123 APPLE STREET 555-5555 APPLE STORE APPLE.COM BUSINESS 456 peach ave 777-7777 PEACH STORE PEACH.COM BUSINESS 789 banana rd 999-9999 banana store BUSINESS
如何将类似地址行合并到单独的行中,以获得此输出(由于没有website.2,它将为NaN或0或空白):
How do I combine like address lines into seperate rows, to get this output ( since there is no website.2, it would be NaN or 0 or blank):
# address phone name website type
#0 123 APPLE STREET 555-5555 APPLE STORE APPLE.COM BUSINESS
#1 456 peach ave 777-7777 PEACH STORE PEACH.COM BUSINESS
#2 789 banana rd 999-9999 banana store NaN BUSINESS
现在,我真的没有从哪里开始,但是我尝试堆叠数据,该数据可以按预期工作,但是拆栈只会恢复到原始数据:
Now, i don't really no where to start, but i tried to stack the data, that works as expected, but unstacking just brings back to the original data:
newdf2.stack().to_frame()
# 0
#0 address 123 APPLE STREET
# phone 555-5555
# name APPLE STORE
# website APPLE.COM
# type BUSINESS
# address.1 456 peach ave
# phone.1 777-7777
# name.1 PEACH STORE
# website.1 PEACH.COM
# type.1 BUSINESS
# address.2 789 banana rd
# phone.2 999-9999
# name.2 banana store
# type.2 BUSINESS
我在想必须有一种方法可以堆叠,从列中删除.,然后堆叠为我想要的格式?也许还有另一种方法?
I'm thinking there must be a way to stack, remove the .'s from the column, and unstack into the format i want? Or maybe there is another way?
推荐答案
您可以使用wide_to_long.
You can use wide_to_long.
df.columns = [f'{x}.0' if '.' not in x else x for x in df.columns]
df['id'] = df.index
df = pd.wide_to_long(df, stubnames=['address', 'phone', 'name', 'website', 'type'], i='id', j='row', sep='.')
df.reset_index(drop=True)
Out[1]:
address phone name website type
0 123 APPLE STREET 555-5555 APPLE STORE APPLE.COM BUSINESS
1 456 peach ave 777-7777 PEACH STORE PEACH.COM BUSINESS
2 789 banana rd 999-9999 banana store NaN BUSINESS
这篇关于如何将类似的列名称组合到Pandas中的单独行中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!