我有以下熊猫数据框:

import pandas as pd

data = {"first_name": ["Alexander", "Alan", "Heather", "Marion", "Amy", "John"],
            "last_name": ["Miller", "Jacobson", ".", "Milner", "Cooze", "Smith"],
            "age": [42, 52, 36, 24, 73, 19],
                "marriage_status" : [0, 0, 1, 1, 0, 1]}

df = pd.DataFrame(data)
df

  age first_name last_name  marriage_status
0   42  Alexander    Miller                0
1   52       Alan  Jacobson                0
2   36    Heather         .                1
3   24     Marion    Milner                1
4   73        Amy     Cooze                0
5   19       John     Smith                1
....

marriage_status是二进制数据0和1的列。在每个1之前,我还想将前面的行设为a1。In this example, the dataframe would become:
  age first_name last_name  marriage_status
0   42  Alexander    Miller                0
1   52       Alan  Jacobson                1   # this changed to 1
2   36    Heather         .                1
3   24     Marion    Milner                1
4   73        Amy     Cooze                1   # this changed to 1
5   19       John     Smith                1
....

换句话说,这个列中有连续的“组”,我想让前面的行元素1而不是0。我该怎么做?
我的想法是创建一个for语句,但这不是基于pandas的解决方案。人们也可以尝试enumerate(),但是我需要将前面的值设为1;如果不加上,我不确定这是如何工作的。

最佳答案

我们可以使用or运算符|。。1当我们在一行中有一个True并且在下一行中有一个0时,evaluate toFalse

df.marriage_status = (
    df.marriage_status | df.marriage_status.shift(-1)
).astype(int)

df

   age first_name last_name  marriage_status
0   42  Alexander    Miller                0
1   52       Alan  Jacobson                1
2   36    Heather         .                1
3   24     Marion    Milner                1
4   73        Amy     Cooze                1
5   19       John     Smith                1

10-04 22:27