我有一个.txt文件(从网站抓取为预先格式化的文本),其中的数据如下所示:

B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS

我想删除列之间的所有多余空格(它们实际上是不同数量的空格,而不是制表符)。然后,我还想用一些定界符(制表符或管道,因为数据中有逗号)来替换它,如下所示:
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS

环顾四周,发现最好的选择是使用正则表达式或shlex进行拆分。两种类似的情况:
  • Python Regular expression must strip whitespace except between quotes
  • Remove white spaces from dict : Python
  • 最佳答案

    s = """B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON
    ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS
    """
    
    # Update
    re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
    In [71]: print re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
    B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON
    ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
    

    10-04 10:41