问题描述
此函数接收一个字符串文本,并返回一个包含字符串列表的列表,该字符串文本中每个句子的一个列表.
This function takes in a string text, and returns a list which contains lists of strings, one list for each sentence in the string text.
句子之间用字符串.",?"或!"之一分隔.我们忽略了其他标点符号分隔句子的可能性.因此"X先生"将变成2个句子,不"将是两个单词.
Sentences are separated by one of the strings ".", "?", or "!". We ignore the possibility of other punctuation separating sentences. so 'Mr.X' will turn to 2 sentences, and 'don't' will be two words.
例如,文本为
Hello, Jack. How is it going? Not bad; pretty good, actually... Very very
good, in fact.
函数返回:
['hello', 'jack'],
['how', 'is', 'it', 'going'],
['not', 'bad', 'pretty', 'good', 'actually'],
['very', 'very', 'good', 'in', 'fact']]
最令人困惑的部分是如何使函数检测字符.!?以及如何使其成为包含每个句子中单词的列表列表.谢谢.
The most confusing part is how to make the function detect the characters , . ! ? and how to make it a list of lists contains words in each sentence.Thank you.
推荐答案
在我看来,这很像是一个作业问题,因此,我将提供一般性提示,而不是确切的代码.
This sounds very much like a homework problem to me, so I'll provide general tips instead of exact code.
一个字符串具有split(char)函数.您可以使用它根据特定字符分割字符串.但是,您将不得不使用循环并执行多次拆分.
a string has the split(char) function on it. You can use this to split your string based on a specific character. However, you will have to use a loop and perform the split multiple times.
您还可以使用正则表达式查找匹配项(这将是更好的解决方案.)这将使您可以立即查找所有匹配项.然后,您将遍历所有匹配项,并根据空格将其吐出,同时去除标点符号.
You could also use a regular expression to find matches (that would be a better solution.) That would let you find all matches at once. Then you would iterate over the matches and spit them based on spaces, while stripping out punctuation.
这是一个可用于一次获取所有句子组的正则表达式示例:
Here's an example of a regular expression you could use to get sentence groups all at once:
\s*([^.?!]+)\s*
括号中的\ s *会导致从结果中删除任何多余的空格,并且括号是捕获组.您可以使用re.findall()获取所有捕获结果的列表,然后可以遍历这些项目,并使用re.split()和一些条件逻辑将所有单词附加到新列表中.
The \s* surrounding the parenthesis causes any extra spaces to be removed from the result, and the parenthesis are a capture group. You can use re.findall() to get a list of all captured results, and then you can loop over these items and use re.split() and some conditional logic to append all the words to a new list.
让我知道您的相处方式,如果还有其他疑问,请向我们提供您到目前为止的代码.
Let me know how you get along with that, and if you have any other questions please provide us the code you have so far.
这篇关于在Python中,如何获取字符串文本,并返回包含字符串列表的列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!