背景
我有以下df
import pandas as pd
df = pd.DataFrame({'Text' : ['\n[SPORTS FAN]\nHere',
'Nothing here',
'\n[BASEBALL]\nTHIS SOUNDS right',
'\n[SPORTS FAN]\nLikes sports',
'Nothing is here',
'\n[NOT SPORTS]\nTHIS SOUNDS good',
'\n[SPORTS FAN]\nReally Big big fan',
'\n[BASEBALL]\nRARELY IS a fan'
],
'P_ID': [1,2,3,4,5,6,7,8],
'P_Name' : ['J J SMITH',
'J J SMITH',
'J J SMITH',
'J J SMITH',
'MARY HYDER',
'MARY HYDER',
'MARY HYDER',
'MARY HYDER']
})
输出量
P_ID P_Name Text
0 1 J J SMITH \n[SPORTS FAN]\nHere
1 2 J J SMITH Nothing here
2 3 J J SMITH \n[BASEBALL]\nTHIS SOUNDS right
3 4 J J SMITH \n[SPORTS FAN]\nLikes sports
4 5 MARY HYDER Nothing is here
5 6 MARY HYDER \n[NOT SPORTS]\nTHIS SOUNDS good
6 7 MARY HYDER \n[SPORTS FAN]\nReally Big big fan
7 8 MARY HYDER \n[BASEBALL]\nRARELY IS a fan
目标
保留以
'\n[SPORTS FAN]\
和\n[BASEBALL]\n
开头的行期望的输出
P_ID P_Name Text
0 1 J J SMITH \n[SPORTS FAN]\nHere
2 3 J J SMITH \n[BASEBALL]\nTHIS SOUNDS right
3 4 J J SMITH \n[SPORTS FAN]\nLikes sports
6 7 MARY HYDER \n[SPORTS FAN]\nReally Big big fan
7 8 MARY HYDER \n[BASEBALL]\nRARELY IS a fan
题
如何获得所需的输出?
最佳答案
尝试这个:
df_new = df.loc[df['Text'].str.startswith('\n[SPORTS FAN]') | df['Text'].str.startswith('\n[BASEBALL]')]
无需正则表达式