我有一个.txt文件(从网站抓取为预先格式化的文本),其中的数据如下所示:
B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS
我想删除列之间的所有多余空格(它们实际上是不同数量的空格,而不是制表符)。然后,我还想用一些定界符(制表符或管道,因为数据中有逗号)来替换它,如下所示:
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
环顾四周,发现最好的选择是使用正则表达式或shlex进行拆分。两种类似的情况:
最佳答案
s = """B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS
"""
# Update
re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
In [71]: print re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS