本文介绍了C# - 在分隔符之间删除文本字符串 - 正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑要求找到一对匹配的字符集,并删除任何字符,它们之间的以及的那些字符/分隔符。

Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.

下面是分隔符的集合:

[]  //square brackets
()  //parenthesis
""  //double quotes
''  //single quotes

下面是一些字符串应匹配的一些例子:

Here are some examples of some strings that should match:


**Given**                         **Results In**
 Hello "some" World           Hello World
 Give [Me Some] Purple        Give Purple
 Have Fifteen (Lunch Today)   Have Fifteen
 Have 'a good'day             Have day

Does Not Match:
 Hello "world
 Brown]co[w
 Cheese'factory

如果给定串中不包含匹配的一组定界符的,它不会被修改。输入字符串可以有许多匹配对分隔符。如果一套2分隔符是重叠(即他[110WORL] D),这会是一个边缘的情况下,我们可以在这里忽略。

If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"), that'd be an edge case that we can ignore here.

该算法将是这个样子:

  string myInput = "Give [Me Some] Purple (And More) Elephants";
  string pattern; //some pattern
  string output = Regex.Replace(myInput, pattern, string.Empty);

问:你会如何用C#实现这一目标?我倾向于一个正则表达式。

Question: How would you achieve this with C#? I am leaning towards a regex.

奖励:是匹配那些有简单的方式开始和结束分隔符的常量或某种类型的列表?我要寻找的解决方案是容易改变的情况下,业务分析人员拿出分隔符的新集的分隔符。

Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.

推荐答案

简单的正则表达式是:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\\[.*\\])|(\".*\")|('.*')|(\\(.*\\))";
string output = Regex.Replace(input, regex, "");

至于这样做,你想建立的正则表达式你只需要建立各部分的自定义方式:

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

('.*')  // example of the single quote check

然后让每个人的正则表达式部分连接而成OR(该|在正则表达式)在我原来的例子。一旦你有你的正则表达式的字符串内置只运行一次。关键是让正则表达式到一个单一的检查,因为执行上的一个项目一个多正则表达式匹配,然后通过大量的项目迭代可能会看到在性能显著下降。

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

在我的第一个例子中,将采取以下行的地方:

In my first example that would take the place of the following line:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");

我相信有人会发表一个很酷的LINQ EX pression根据分隔符对象的数组来匹配或东西来构建正则表达式。

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.

这篇关于C# - 在分隔符之间删除文本字符串 - 正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 17:05