我有一个这样的数据框:

df = pd.DataFrame()

  text      secFlag
0  book     1
1  headings 1
2  chapter  1
3  one      1
4  page     0
5  one      0
6  text     0
7  chapter   1
8  two       1
9  page     0
10  two      0
11  text     0
12  page      0
13  three     0
10  text      0
11  chapter   1
12  three     1
13  something  0


我想找到累加的总和,以便可以用运行中的索引号标记属于特定章节的所有页面。

**Desired output**


  text      secFlag  chapter
0  book     1       1
1  headings 1       1
2  chapter  1       2
3  one      1       2
4  page     0       2
5  one      0       2
6  text     0       2
7  chapter   1      3
8  two       1      3
9  page     0      3
10  two      0     3
11  text     0      3
12  page      0     3
13  three     0     3
10  text      0     3
11  chapter   1      4
12  three     1     4
13  something  0     4


这是我尝试的:

df['chapter'] = ((df['secFlag'].shift(-1) == 1)).cumsum()


但是,这并没有给我想要的输出,因为节标记中的值一旦为1,它就会增加。请注意,单词是文本的一部分,并且章节标题通常会包含多个单词。

您能建议一种简单的方法来完成此工作吗?
谢谢

最佳答案

如果需要通过1解决方案中的第一个secFlag标志,则为:

df['chapter'] = ((df['secFlag'] == 1) & (df['secFlag'] != df['secFlag'].shift())).cumsum()
print (df)
         text  secFlag  chapter
0        book        1        1
1    headings        1        1
2     chapter        1        1
3         one        1        1
4        page        0        1
5         one        0        1
6        text        0        1
7     chapter        1        2
8         two        1        2
9        page        0        2
10        two        0        2
11       text        0        2
12       page        0        2
13      three        0        2
10       text        0        2
11    chapter        1        3
12      three        1        3
13  something        0        3


细节:

a = (df['secFlag'] == 1)
b = (df['secFlag'] != df['secFlag'].shift())
c = a & b
d = c.cumsum()

print (pd.concat([df,a,b,c,d],
                 axis=1,
                 keys=('orig','==1','!=shifted','chained by &','cumsum')))
         orig             ==1 !=shifted chained by &  cumsum
         text secFlag secFlag   secFlag      secFlag secFlag
0        book       1    True      True         True       1
1    headings       1    True     False        False       1
2     chapter       1    True     False        False       1
3         one       1    True     False        False       1
4        page       0   False      True        False       1
5         one       0   False     False        False       1
6        text       0   False     False        False       1
7     chapter       1    True      True         True       2
8         two       1    True     False        False       2
9        page       0   False      True        False       2
10        two       0   False     False        False       2
11       text       0   False     False        False       2
12       page       0   False     False        False       2
13      three       0   False     False        False       2
10       text       0   False     False        False       2
11    chapter       1    True      True         True       3
12      three       1    True     False        False       3
13  something       0   False      True        False       3

关于python - 使用cumsum查找独特的章节,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/51683802/

10-10 21:16