我正在预处理文本以进行分类,并且像这样导入我的数据集:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 2)
dataset
在终端上打印: lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an apple off an apple tree\nShak...
2 It's been a hard day's night\nAnd I've been wo...
3 Michelle, ma belle\nThese are words that go to...
但是,当我使用
dataset
仔细检查变量spyder
时,发现只有一列,而不是所需的两列。考虑到歌词本身带有逗号和“,”定界符将不起作用,
我如何更正上面的数据框以具有:
1)
lyrics
的一栏2)
classification
的一列每行对应的数据?
最佳答案
如果您的歌词本身不包含逗号(它们很可能包含逗号),则可以将read_csv
与delimiter=','
一起使用。
但是,如果不是这样,则可以使用str.rsplit
:
dataset.iloc[:, 0].str.rsplit(',', expand=True)
df
lyrics,classification
0 I should have known better with a girl like yo...
1 You can shake an...,0
2 It's been a hard day's night...,0
df = df.iloc[:, 0].str.rsplit(',', 1, expand=True)
df.columns = ['lyrics', 'classification']
df
lyrics classification
0 I should have known better with a girl like yo... 0
1 You can shake an... 0
2 It's been a hard day's night... 0
关于python - 根据定界符将dataframe列分为两列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/46165775/