我的文字中包含一些关键字,后跟句子,

var data = "Name The United States of America (USA), commonly referred to as the United States (U.S.) or America, is a federal republic composed of 50 states, the federal district of Washington, D.C., five major territories, and various possessions. **About** 48 contiguous states and Washington, D.C., are in central North America between Canada and Mexico. The state of Alaska is in the northwestern part of North America and the state of Hawaii is an archipelago in the mid-Pacific. The territories are scattered **about** the Pacific Ocean and the Caribbean Sea. At 3.8 million square miles and with over 320 million people, the country is the world's third largest by total area and the third most populous. It is one of the world's most ethnically diverse and multicultural nations, the product of large-scale immigration from many countries. Life The geography and climate of the United States are also extremely diverse, and the country is home to **about** a wide variety of wildlife. Rest USA is a diversified nation and Niagara is world famous.";


在上面的文本中,有4个关键字-名称,关于,生活,休息。我想将这些关键字后面的文本分隔成单独的字符串数组并填充它们。这些关键字在文本中出现的顺序始终相同。到目前为止,我已经尝试了以下代码:

var name = [];
var about = [];
var life = [];
function transform_report(data) {
    var keywords = ["Name", "About", "Life", "Rest"];
    var output_data = "Event ";
    var keyword_index = 0;
    var input_data = data.toString();
    var pos = -1;
    for (var i = 0; i < keywords.length; i++) {
        pos = input_data.indexOf(keywords[i]);
        if (pos != -1) {
            keyword_index = i;
            break;
        }
    }

    while (pos != -1) {
        output_data += keywords[keyword_index] + " : ";
        pos += keywords[keyword_index].length;
        var index = keyword_index;
        keyword_index = find_next_keyword(keywords, keyword_index, input_data, pos);
        var end_pos = input_data.indexOf(keywords[keyword_index]);
        var output_text = input_data.slice(pos, end_pos).replace(/:/, '');

        output_data += output_text.trim() + "\n";
        if (keywords[index] === "Name") {
            name.push(output_text.trim());
        }
        if ((keywords[index] === "About")) {
            about.push(output_text.trim());
        }
        if ((keywords[index] === "Life")) {
            life.push(output_text.trim());
        }
        pos = end_pos;
    }
    return output_data;
}

function find_next_keyword(keywords, index, input_data, pos) {
    var orig_index = index;
    var min_pos = input_data.length;
    var min_index = index;
    if (index == keywords.length - 1)
        return -1;
    for (var i = 0; i < keywords.length; i++) {
        if (i == orig_index)
            continue;
        var keyword = keywords[i];
        var next_keyword_pos = input_data.indexOf(keyword, pos);
        if (next_keyword_pos != -1 && next_keyword_pos < min_pos) {
            min_pos = next_keyword_pos;
            min_index = i;
        }
    }
    return min_index;
}


当关键字仅在数据中出现一次时,以上代码可以正常工作。但是在这种情况下,关键字“关于”也作为单词中的单词出现,应放在“关于数组”和“生命数组”中。输出应为:

name array contains :
The United States of America (USA), commonly referred to as the United States (U.S.) or America, is a federal republic composed of 50 states, the federal district of Washington, D.C., five major territories, and various possessions.

about array contains: 48 contiguous states and Washington, D.C., are in central North America between Canada and Mexico. The state of Alaska is in the northwestern part of North America and the state of Hawaii is an archipelago in the mid-Pacific. The territories are scattered about the Pacific Ocean and the Caribbean Sea. At 3.8 million square miles and with over 320 million people, the country is the world's third largest by total area and the third most populous. It is one of the world's most ethnically diverse and multicultural nations, the product of large-scale immigration from many countries.

life array contains:The geography and climate of the United States are also extremely diverse, and the country is home to about a wide variety of wildlife.


但是由于关键字是一个普通单词,因此无法获得所需的输出。有没有办法用Javascript做到这一点?非常感谢。

最佳答案

考虑到您的情况:


  “...。这些关键字在文本中出现的顺序始终是
  相同。”


使用String.splitString.replaceString.substringArray.indexOf函数的以下方法可以实现“主要目标”:

// data is the initial string(text)
var splitted = data.split(/\.\s/),  // splitting sentences
    keywords = ["Name", "About", "Life", "Rest"],
    currentKeyword = "",  // the last active keyword
    keysObject = {'name' : [], 'about' : [], 'life' : [], 'rest' : []};

splitted.forEach(function(v){
    var first = v.substring(0, v.indexOf(" ")).replace(/\W/g, "");
    if (keywords.indexOf(first) !== -1) {
        keysObject[first.toLowerCase()].push(v.substring(v.indexOf(" ") + 1));
        currentKeyword = first.toLowerCase();
    } else {
        keysObject[currentKeyword].push(v);
    }
});

console.log(JSON.stringify(keysObject, 0, 4));


输出:

{
    "name": [
        "The United States of America (USA), commonly referred to as the United States (U.S.) or America, is a federal republic composed of 50 states, the federal district of Washington, D.C., five major territories, and various possessions"
    ],
    "about": [
        "48 contiguous states and Washington, D.C., are in central North America between Canada and Mexico",
        "The state of Alaska is in the northwestern part of North America and the state of Hawaii is an archipelago in the mid-Pacific",
        "The territories are scattered **about** the Pacific Ocean and the Caribbean Sea",
        "At 3.8 million square miles and with over 320 million people, the country is the world's third largest by total area and the third most populous",
        "It is one of the world's most ethnically diverse and multicultural nations, the product of large-scale immigration from many countries"
    ],
    "life": [
        "The geography and climate of the United States are also extremely diverse, and the country is home to **about** a wide variety of wildlife"
    ],
    "rest": [
        "USA is a diversified nation and Niagara is world famous."
    ]
}

09-25 18:29
查看更多