我有一个dataframetest
和一个包含单词、字符和数字的复杂模式的列。我需要将用连字符分隔的单词提取到一个新的列中。
我不是regex专家,花了太多的时间和它抗争。谢谢你的帮助!
test = pd.DataFrame({
'id': ['1','2','3','4'],
'category': ['worda-wordb-1234.ds.er89.',
'worda-4567.we.77-ty','wordc-wordd-5698/de/','wordc-2356/rt/']
})
期望输出:
id category sub_category
0 1 worda-wordb-1234.ds.er worda-wordb
1 2 worda-4567.we.ty worda
2 3 wordc-wordd-5698/de/ wordc-wordd
3 4 wordc-2356/rt/ wordc
最佳答案
使用str.extract,
test['sub-category'] = test.category.str.extract('(.*)-\d+')
id category sub-category
0 1 worda-wordb-1234.ds.er89. worda-wordb
1 2 worda-4567.we.77-ty worda
2 3 wordc-wordd-5698/de/ wordc-wordd
3 4 wordc-2356/rt/ wordc