问题描述
我正在使用一个大型的csv文件,下一个最后一列有一串文本,我想用特定的分隔符分割。我想知道是否有一个简单的方法来使用大熊猫或python?
I'm working with a large csv file and the next to last column has a string of text that I want to split by a specific delimiter. I was wondering if there is a simple way to do this using pandas or python?
CustNum CustomerName ItemQty Item Seatblocks ItemExt
32363 McCartney, Paul 3 F04 2:218:10:4,6 60
31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300
我想拆分空格('')
然后在 Seatblocks
列中的冒号(':')
,但每个单元格将导致不同数量的列。我有一个重新排列列的功能,所以 Seatblocks
列在表格的末尾,但我不知道该怎么做。我可以使用内置的 text-to-columns
函数和一个快速宏来完成它,但是我的数据集有太多的excel可以处理的记录。
I want to split by the space(' ')
and then the colon(':')
in the Seatblocks
column, but each cell would result in a different number of columns. I have a function to rearrange the columns so the Seatblocks
column is at the end of the sheet, but I'm not sure what to do from there. I can do it in excel with the built in text-to-columns
function and a quick macro, but my dataset has too many records for excel to handle.
最终,我想记录这样的约翰·列侬的记录,并创建多行,每个座位的信息在另一行上。
Ultimately, I want to take records such John Lennon's and create multiple lines, with the info from each set of seats on a separate line.
推荐答案
这个空格分隔了Seatblocks,并给出每一行。
This splits the Seatblocks by space and gives each its own row.
In [43]: df
Out[43]:
CustNum CustomerName ItemQty Item Seatblocks ItemExt
0 32363 McCartney, Paul 3 F04 2:218:10:4,6 60
1 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300
In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()
In [45]: s.index = s.index.droplevel(-1) # to line up with df's index
In [46]: s.name = 'Seatblocks' # needs a name to join
In [47]: s
Out[47]:
0 2:218:10:4,6
1 1:13:36:1,12
1 1:13:37:1,13
Name: Seatblocks, dtype: object
In [48]: del df['Seatblocks']
In [49]: df.join(s)
Out[49]:
CustNum CustomerName ItemQty Item ItemExt Seatblocks
0 32363 McCartney, Paul 3 F04 60 2:218:10:4,6
1 31316 Lennon, John 25 F01 300 1:13:36:1,12
1 31316 Lennon, John 25 F01 300 1:13:37:1,13
或者,给每个冒号分隔的字符串在自己的列中:
Or, to give each colon-separated string in its own column:
In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]:
CustNum CustomerName ItemQty Item ItemExt 0 1 2 3
0 32363 McCartney, Paul 3 F04 60 2 218 10 4,6
1 31316 Lennon, John 25 F01 300 1 13 36 1,12
1 31316 Lennon, John 25 F01 300 1 13 37 1,13
这是一个有点丑陋,但也许有人会用更漂亮的解决方案。
This is a little ugly, but maybe someone will chime in with a prettier solution.
这篇关于大 pandas :如何将列中的文本分成多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!