python - 根据 Pandas 数据框中的特定模式选择行

我有一个csv文件，我将其读入pandas数据框。有两个特定的列“Notes”和“ActivityType”，我想用作条件。如果“Notes”列包含“Morning exercise”或“Morning workout”的字符串值，和/或“ActivityType”列包含任何字符串值（大多数单元格为空，我不希望计算空值），则创建一个新列“MorningExercise”，如果满足任一条件，则插入1；如果不满足任一条件，则插入0。
我一直在使用下面的代码创建一个新列，并在“Notes”列中满足文本条件时插入1或0，但是我还没有弄清楚如果“ActivityType”列包含任何字符串值，如何包含1。

JoinedTables['MorningExercise'] = JoinedTables['Notes'].str.contains(('Morning workout' or 'Morning exercise'), case=False, na=False).astype(int)

对于“ActivityType”列，我认为应该使用pd.notnull()函数作为critieria。
我真的需要python中的一种方法来查看行中是否满足任一条件，如果满足，则在新列中输入1或0。

最佳答案

您需要设计一个regex模式来与str.contains一起使用：

regex = r'Morning\s*(?:workout|exercise)'
JoinedTables['MorningExercise'] = \
       JoinedTables['Notes'].str.contains(regex, case=False, na=False).astype(int)

细节

Morning       # match "Morning"
\s*           # 0 or more whitespace chars
(?:           # open non-capturing group
workout       # match "workout"
|             # OR operator
exercise      # match "exercise"
)

模式将查找Morning，然后是workout或exercise。

morning

python - 根据 Pandas 数据框中的特定模式选择行