我有一个看似简单的问题,却似乎无法解决。给定一个包含DOI的字符串,如果最后一个字符是标点符号,我需要删除它,直到最后一个字符是字母或数字。
例如,如果字符串是:

sampleDoi = "10.1097/JHM-D-18-00044.',"

我需要以下输出:
"10.1097/JHM-D-18-00044"

即删除.',
为此,我编写了以下脚本:
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
    if any(char in invalidChars for char in each):
        a = a[:i]
        i = i - 1
    else:
        print (a)
        break

但是,这会产生10.1097/JHM-D-18-00,但我希望它产生10.1097/JHM-D-18-00044。为什么44从末尾移除?

最佳答案

更正代码:

import string

invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
    if any(char in invalidChars for char in each):
        a = a[:i]
        i = i # Well Really this line can just be removed all together.
    else:
        print (a)
        break

这将提供所需的输出,同时保持原始代码基本相同。

09-20 10:32