我有一个dataframetest和一个包含单词、字符和数字的复杂模式的列。我需要将用连字符分隔的单词提取到一个新的列中。
我不是regex专家,花了太多的时间和它抗争。谢谢你的帮助!

test = pd.DataFrame({
    'id': ['1','2','3','4'],
    'category': ['worda-wordb-1234.ds.er89.',
    'worda-4567.we.77-ty','wordc-wordd-5698/de/','wordc-2356/rt/']
    })

期望输出:
    id  category                sub_category
0   1   worda-wordb-1234.ds.er  worda-wordb
1   2   worda-4567.we.ty        worda
2   3   wordc-wordd-5698/de/    wordc-wordd
3   4   wordc-2356/rt/          wordc

最佳答案

使用str.extract,

test['sub-category'] = test.category.str.extract('(.*)-\d+')

    id  category                    sub-category
0   1   worda-wordb-1234.ds.er89.   worda-wordb
1   2   worda-4567.we.77-ty         worda
2   3   wordc-wordd-5698/de/        wordc-wordd
3   4   wordc-2356/rt/              wordc

10-06 11:22