基于复杂规则识别子串

本文介绍了基于复杂规则识别子串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有如下所示的文本字符串:

Assume I have text strings that look something like this:

A-B-C-I1-I2-D-E-F-I1-I3-D-D-D-D-I1-I1-I2-I1-I1-I3-I3

这里我想识别导致A是一个标记，I3是一个标记等)> 到由 only 个 IX 标记(即 I1、I2 或 I3) 包含一个 I3.这个子序列的长度可以是 1(即是单个 I3 标记)，也可以是无限长度，但始终需要包含至少 1 个 I3 标记，并且只能包含 IX 标记.在通向IX 子序列的子序列中，可以包含I1 和I2，但不能包含I3.

Here I want to identify sequences of markers (A is a marker, I3 is a marker etc.) that leads up to a subsequence consisting only of IX markers (i.e. I1, I2, or I3) that contains an I3. This subsequence can have a length of 1 (i.e. be a single I3 marker) or it can be of unlimited length, but always needs to contain at least 1 I3 marker, and can only contain IX markers. In the subsequence that leads up to the IX subsequence, I1 and I2 can be included, but never I3.

在上面的字符串中我需要识别:

In the string above I need to identify:

A-B-C-I1-I2-D-E-F

导致包含 I3

和

D-D-D-D

导致 I1-I1-I2-I1-I1-I3-I3 子序列，其中至少包含 1 个 I3.

which leads up to the I1-I1-I2-I1-I1-I3-I3 subsequence that contains at least 1 I3.

这里有一些额外的例子:

Here are a few additional examples:

A-B-I3-C-I3

从这个字符串我们应该识别AB，因为它后面是一个包含I3的1的子序列，还有C，因为它后跟包含 I3 的 1 子序列.

from this string we should identify A-B because it is followed by a subsequence of 1 that contains I3, and also C, because it is followed by a subsequence of 1 that contains I3.

和:

I3-A-I3

这里应该标识A，因为它后面跟着一个包含I3的子序列1.第一个 I3 本身不会被识别，因为我们只对后面跟着包含 I3 的 IX 标记的子序列感兴趣.

here A should be identified because it is followed by a subsequence of 1 which contains I3. The first I3 itself will not be identified, because we are only interested in subsequences that are followed by a subsequence of IX markers that contains I3.

如何编写一个通用函数/正则表达式来完成这个任务?

How can I write a generic function/regex that accomplishes this task?

leads

基于复杂规则识别子串

问题描述

推荐答案