本文介绍了将pandas数据框字符串拆分为单独的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本字符串数据框,它基本上代表每行一个或多个旅程。我正尝试分开旅程,以便分别查看。输入数据示例示例如下:

I have a dataframe of text strings which essentially represents one or many journeys per row. I'm trying to split the legs of the journey so I can see them individually. The example input dataframe looks as follows:

更新:

df_input = pd.DataFrame([{'var1':'A/A1', 'var2':'x/y/z', 'var3':'abc1'},
                         {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'},
                         {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

   var1 var2    var3
0  A/A1 x/y/z   abc1
1   B   xx/yy   abc2
2   c   zz      abcd

我尝试获取的输出应如下所示。因此,对于第一个示例,旅程行程从 A到A1,然后从A1到x,再从x到y,再从y到z 。如果还有一种方法可以添加额外的一列来表示旅程行程编号(1、2、3等),将非常有帮助。 var3 在这里并不重要,但我只是将其包括在内,以表明在拆分行时还有其他列会重复。

The output I'm trying to get should look as follows. So for the first example, the journey legs are A to A1 then A1 to x then x to y and then y to z. If there is also a way to add an additional column indicating the journey leg number (1,2,3 etc.) that'll be very helpful. var3 has no importance here, but I've just included it to show that there are other columns which get repeated when the rows are split.

df_output = pd.DataFrame([{'var1': 'A', 'var2': 'A1', 'var3':'abc1'},
                          {'var1': 'A1', 'var2': 'x', 'var3':'abc1'},
                          {'var1': 'x', 'var2': 'y', 'var3':'abc1'},
                          {'var1': 'y', 'var2': 'z', 'var3':'abc1'},
                          {'var1': 'B', 'var2': 'xx', 'var3':'abc2'},
                          {'var1': 'xx', 'var2': 'yy', 'var3':'abc2'},
                          {'var1': 'c', 'var2': 'zz', 'var3':'abcd'}])

  var1 var2 var3
0   A   A1  abc1
1   A1  x   abc1
2   x   y   abc1
3   y   z   abc1
4   B   xx  abc2
5   xx  yy  abc2
6   c   zz  abcd

有人可以帮忙吗?

谢谢

推荐答案

尝试使用爆炸

df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
  var1 var2  var3
0    A    x  abc1
0    A    y  abc1
0    A    z  abc1
1    B   xx  abc2
1    B   yy  abc2
2    c   zz  abcd

然后 groupby + shift

df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
  var1 var2  var3
0    A    x  abc1
0    x    y  abc1
0    y    z  abc1
1    B   xx  abc2
1   xx   yy  abc2
2    c   zz  abcd

这篇关于将pandas数据框字符串拆分为单独的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 19:34