我正在尝试创建一个算法,该算法遍历字符串列表,如果满足特定条件,则将字符串连接在一起,然后跳过其连接的字符串数,以避免重复计算同一连接字符串的部分。

我了解i = i + x或i + = x不会改变每个循环的迭代量,因此我在寻找一种替代方法,以跳过一个变量的多次迭代。

背景:我试图创建一个用于新闻文章的命名实体识别算法。我将文本('Prime Minister Jacinda Ardern is from New Zealand')标记为('Prime','Minister','Jacinda','Ardern','is'...)并在其上运行NLTK POS标签算法,得到:... (('Jacinda','NNP'),('Ardern','NNP'),('is','VBZ')...然后在后续单词也是'NNP'/专有名词时组合单词。

目标是将“ Jacinda Ardern总理”计为1个字符串,而不是4个字符串,然后跳过尽可能多的单词进行循环迭代,以避免下一个字符串为“ Minister Jacinda Ardern”和“ Jacinda Ardern”。

内容:
“文本”是通过标记化然后用POS标记我的文章而创建的列表的列表,格式为:[...('She', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('roughly', 'RB'), ('25-minute', 'JJ'), ('meeting', 'NN')...]
'NNP'=专有名词或地点/人员/组织等的名称。

for (i) in range(len(text)):

    print(i)

    #initialising wordcounter as a variable
    wordcounter = 0

    # if text[i] is a Proper Noun, make namedEnt = the word.
    # then increase wordcounter by 1
    if text[i][1] == 'NNP':
        namedEnt = text[i][0]
        wordcounter +=1

        # while the next word in text is also a Proper Noun,
        # increase wordcounter by 1. Initialise J as = 1
        while text[i + wordcounter][1] == 'NNP':
            wordcounter +=1
            j = 1


            # While J is less than wordcounter, join text[i+j] to
            # namedEnt. Increase J by 1. When that is no longer
            # the case append namedEnt to a namedEntity list
            while j < wordcounter:
                namedEnt = ' '.join([namedEnt,text[i+j][0]])
                j += 1
            InitialNamedEntity.append(namedEnt)

        i += wordcounter


如果我在开始时print(i),则每次上升1。当我打印由namedEnts组成的NamedEntity列表的Counter时,i结果如下:
 (...'New Zealand': 7, 'Zealand': 7, 'United': 4, 'Prime Minister Minister Jacinda Minister Jacinda Ardern': 3...)

因此,我不仅获得了像“新西兰”和“新西兰”那样的双重荣誉,而且还获得了像“总理哈辛达部长贾辛达·阿登”这样古怪的结果。

我想要的结果是('New Zealand':7, 'United States':4,'Prime Minister Jacinda Ardern':3)

任何帮助将不胜感激。干杯

最佳答案

如果需要调整for的递增方式,请不要使用i循环,因为它总是将其设置为范围中的下一个值。使用while循环:

i = 0
while i < len(text):
    ...
    i += wordcounter

关于python - 在Python循环中调整迭代量,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58478220/

10-12 20:09