防止令牌在Stanford CoreNLP中包含空格

本文介绍了防止令牌在Stanford CoreNLP中包含空格的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

斯坦福CoreNLP 的 tokenizer 来防止令牌包含空格?

Is there an option in Stanford CoreNLP's tokenizer to prevent tokens from containing a space?

例如如果句子是我的电话是617 1555-6644"，则子字符串"617 1555"应为两个不同的标记.

E.g. if the sentence is "my phone is 617 1555-6644", the substring "617 1555" should be Into two different tokens.

我知道选项 normalizeSpace :

I am aware of the option normalizeSpace:

但是我不希望令牌包含任何空间，包括不间断的空间.

but I don't want tokens to contain any space, including non-breaking space.