问题描述
在这里,我有一个文学作品的例子,我想对其进行简单的分析.请注意不同的部分:
Here I have an example of a piece of literature that I would like to do a simple analysis on. Notice the different sections:
str = "Random info - at beginning-man. "+ ...
"Random info still continues. "+ ...
"CHAPTER 1. " + ...
"Random info in middle one, "+ ...
"Random info still continues. "+ ...
"1 This is sentence one of verse one, "+ ...
"This still sentence one of verse one. "+ ...
"2 This is sentence one of verse two. "+ ...
"This is sentence two of verse two. "+ ...
"3 This is sentence one of verse three; "+ ...
"this still sentence one of verse three. "+ ...
"CHAPTER 2. " + ...
"Random info in middle two. "+ ...
"Random info still continues. "+ ...
"1 This is sentence four? "+ ...
"2 This is sentence five, "+ ...
"3 this still sentence five but verse three!"+ ...
"Random info at end's end."+ ...
"Random info still continues. ";
我感兴趣的是,所有数据数据都可以称为中间的随机信息",它位于章节名称之后,诗句开始之前.
I'm interested all the data dat can be called "Random info in middle", which is after a Chapter name, and before a verse beginning.
我想使用功能"extractBetween"提取在章#"之间找到的信息.和"1"(第一句).
I would like to use the function "extractBetween" to extract the information found between "CHAPTER #" and "1"(First Verse).
我知道如何使用函数"extractBetween",但是如何确定"CHAPTER#"之前的位置.紧随"1"(第一节)之后的任何章节数量?
I know how to use the function "extractBetween", but how can I determine the locations just before "CHAPTER #" and just after "1"(First Verse), for any amount of Chapters?
最后,我想得到一个这样的答案,其中每个章节的随机信息都分配在一个表中:
At the end I would like to have such an answer, where the random information for each Chapter is allocated in a table:
我已经尝试过regexp()和findstr(),但是没有成功.所有帮助将不胜感激.谢谢!
I've tried, regexp() and findstr(), but have no success.All help will be appreciated. Thanks!
推荐答案
您可以将正则表达式与 regexp
以匹配文本.
You can use a regular expression with regexp
to match the text.
[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');
for k = 1:numel(tokens)
fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2));
% or: fprintf('%s\t%s\n', tokens{k});
end
将打印
CHAPTER 1 Random info in middle one, Random info still continues.
CHAPTER 2 Random info in middle two. Random info still continues.
解释正则表达式(CHAPTER \ d)\.\ s *(.*?)1
:
-
(CHAPTER \ d)
匹配任何数字的章,并且其周围的()括号将在tokens
变量中捕获该匹配项. -
\.
匹配时间段 -
\ s *
匹配任何可能的空格 -
(.*?)1
将捕获任何文本,直到文本中的下一个1.请注意问号以使其与惰性匹配,否则它将与所有文本匹配,直到str
中的最后1个字符.
(CHAPTER \d)
matches CHAPTER with any number, and the () brackets surrounding it will capture the match in thetokens
variable.\.
matches the period\s*
matches any possible whitespace(.*?)1
will capture any text till the next 1 in the text. Note the questionmark to make it match lazy, otherwise it will match all the text till the last 1 instr
.
这篇关于MATLAB:简单的字符串分析-查找位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!