问题描述
我有一个带有几列的Pandas数据框(单词,开始时间,停止时间,说话者).我想合并单词"列中的所有值,而扬声器"列中的值不变.另外,我想在组合中保留第一个单词的开始"值和最后一个单词的停止"值.
I have a pandas dataframe with several columns (words, start time, stop time, speaker). I want to combine all values in the 'word' column while the values in the 'speaker' column do not change. In addition, I want to keep the 'start' value for the first word and the 'stop' value for the last word in the combination.
我目前有:
word start stop speaker
0 but 2.72 2.85 2
1 that's 2.85 3.09 2
2 alright 3.09 3.47 2
3 we'll 8.43 8.69 1
4 have 8.69 8.97 1
5 to 8.97 9.07 1
6 okay 9.19 10.01 2
7 sure 10.02 11.01 2
8 what? 11.02 12.00 1
但是,我想将其转换为:
However, I would like to turn this into:
word start start speaker
0 but that's alright 2.72 3.47 2
1 we'll have to 8.43 9.07 1
2 okay sure 9.19 11.01 2
3 what? 11.02 12.00 1
推荐答案
我们将使用GroupBy.agg
和aggfuncs字典:
We'll use GroupBy.agg
with a dict of aggfuncs:
(df.groupby('speaker', as_index=False, sort=False)
.agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))
speaker word start stop
0 2 but that's alright 2.72 3.47
1 1 we'll have to 8.43 9.07
要按连续出现的次数分组,请使用移位的累积技巧,然后将其与扬声器"一起用作第二个分组者:
To group by consecutive occurrences, use the shifting cumsum trick, then use that as the second grouper along with "speaker":
gp1 = df['speaker'].ne(df['speaker'].shift()).cumsum()
(df.groupby(['speaker', gp1], as_index=False, sort=False)
.agg({'word': ' '.join, 'start': 'min', 'stop': 'max',}))
speaker word start stop
0 2 but that's alright 2.72 3.47
1 1 we'll have to 8.43 9.07
2 2 okay sure 9.19 11.01
3 1 what? 11.02 12.00
这篇关于如何根据 pandas 另一行中的值来合并一行中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!