问题描述
如何将文本分成句子数组?
How can I split a text into an array of sentences?
示例文字:
应输出:
0 => Fry me a Beaver.
1 => Fry me a Beaver!
2 => Fry me a Beaver?
3 => Fry me Beaver no. 4?!
4 => Fry me many Beavers...
5 => End
我尝试了一些通过搜索在SO上找到的解决方案,但是它们都失败了,尤其是在第4句话时.
I tried some solutions that I've found on SO through search, but they all fail, especially at the 4th sentence.
/(?<=[!?.])./
/\.|\?|!/
/((?<=[a-z0-9)][.?!])|(?<=[a-z0-9][.?!]\"))(\s|\r\n)(?=\"?[A-Z])/
/(?<=[.!?]|[.!?][\'"])\s+/ // <- closest one
推荐答案
由于您想拆分"句子,所以为什么要匹配它们?
Since you want to "split" sentences why are you trying to match them ?
在这种情况下,我们使用 preg_split().
For this case let's use preg_split().
代码:
$str = 'Fry me a Beaver. Fry me a Beaver! Fry me a Beaver? Fry me Beaver no. 4?! Fry me many Beavers... End';
$sentences = preg_split('/(?<=[.?!])\s+(?=[a-z])/i', $str);
print_r($sentences);
输出:
Array
(
[0] => Fry me a Beaver.
[1] => Fry me a Beaver!
[2] => Fry me a Beaver?
[3] => Fry me Beaver no. 4?!
[4] => Fry me many Beavers...
[5] => End
)
说明:
简单地说,我们是按分组空间 \ s + 进行拆分,然后做两件事:
Well to put it simply we are spliting by grouped space(s) \s+ and doing two things:
-
(?< = [.?!])肯定在声明之后,基本上我们在空间后面搜索是否有点或问号或感叹号.
(?<=[.?!]) Positive look behind assertion, basically we search if there is a point or question mark or exclamation mark behind the space.
(?= [az])肯定的前瞻性断言,搜索空格后是否有字母,这是解决no. 4
问题的一种方法.
(?=[a-z]) Positive look ahead assertion, searching if there is a letter after the space, this is kind of a workaround for the no. 4
problem.
这篇关于将文本拆分为句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!