问题描述
假设我有一个包含换行符和制表符的长字符串:
Suppose I've a long string containing newlines and tabs as:
var x = "This is a long string.\n\t This is another one on next line.";
那么我们如何使用正则表达式将这个字符串拆分为标记?
So how can we split this string into tokens, using regular expression?
我不想使用 .split(' ')
因为我想学习 Javascript 的 Regex.
I don't want to use .split(' ')
because I want to learn Javascript's Regex.
更复杂的字符串可能是这样的:
A more complicated string could be this:
var y = "This @is a #long $string. Alright, lets split this.";
现在我只想从这个字符串中提取有效的词,没有特殊字符和标点符号,即我想要这些:
Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:
var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];
var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];
推荐答案
以下是您提出的问题的 jsfiddle 示例:http://jsfiddle.net/ayezutov/BjXw5/1/
Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/
基本上,代码很简单:
var y = "This @is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"
var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
document.write(match[i]);
document.write('<br>');
}
更新:基本上你可以扩展分隔符列表:http://jsfiddle.net/ayezutov/BjXw5/2/
UPDATE:Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/
var regex = /[^\s\.,!?]+/g;
更新 2:一直只有字母:http://jsfiddle.net/ayezutov/BjXw5/3/
var regex = /\w+/g;
这篇关于在 Javascript 中使用正则表达式标记字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!