MATLAB:简单的字符串分析-查找位置

本文介绍了MATLAB:简单的字符串分析-查找位置的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这里，我有一个文学作品的例子，我想对其进行简单的分析.请注意不同的部分:

Here I have an example of a piece of literature that I would like to do a simple analysis on. Notice the different sections:

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

我感兴趣的是，所有数据数据都可以称为中间的随机信息"，它位于章节名称之后，诗句开始之前.

I'm interested all the data dat can be called "Random info in middle", which is after a Chapter name, and before a verse beginning.

我想使用功能"extractBetween"提取在章#"之间找到的信息.和"1"(第一句).

I would like to use the function "extractBetween" to extract the information found between "CHAPTER #" and "1"(First Verse).

我知道如何使用函数"extractBetween"，但是如何确定"CHAPTER#"之前的位置.紧随"1"(第一节)之后的任何章节数量?

I know how to use the function "extractBetween", but how can I determine the locations just before "CHAPTER #" and just after "1"(First Verse), for any amount of Chapters?

最后，我想得到一个这样的答案，其中每个章节的随机信息都分配在一个表中:

At the end I would like to have such an answer, where the random information for each Chapter is allocated in a table:

我已经尝试过regexp()和findstr()，但是没有成功.所有帮助将不胜感激.谢谢！

I've tried, regexp() and findstr(), but have no success.All help will be appreciated. Thanks!

推荐答案

您可以将正则表达式与 regexp 以匹配文本.

You can use a regular expression with regexp to match the text.

[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2));
    % or: fprintf('%s\t%s\n', tokens{k});
end

将打印

CHAPTER 1   Random info in middle one, Random info still continues.
CHAPTER 2   Random info in middle two. Random info still continues.

解释正则表达式(CHAPTER \ d)\.\ s *(.*?)1 :

(CHAPTER \ d)匹配任何数字的章，并且其周围的()括号将在 tokens 变量中捕获该匹配项.
\.匹配时间段
\ s * 匹配任何可能的空格
(.*?)1 将捕获任何文本，直到文本中的下一个1.请注意问号以使其与惰性匹配，否则它将与所有文本匹配，直到 str 中的最后1个字符.

(CHAPTER \d) matches CHAPTER with any number, and the () brackets surrounding it will capture the match in the tokens variable.
\. matches the period
\s* matches any possible whitespace
(.*?)1 will capture any text till the next 1 in the text. Note the questionmark to make it match lazy, otherwise it will match all the text till the last 1 in str.

这篇关于MATLAB:简单的字符串分析-查找位置的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！