Utf8在Perl中为CamelCase(WikiWord)正确的正则表达式

本文介绍了Utf8在Perl中为CamelCase(WikiWord)正确的正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是关于 CamelCase regex 的问题.与 tchrist post 的组合我想知道什么是正确的 utf-8 CamelCase .

Here was a question about the CamelCase regex. With the combination of tchrist post i'm wondering what is the correct utf-8 CamelCase.

以(brian d foy's)正则表达式开始:

Starting with (brian d foy's) regex:

/
    \b          # start at word boundary
    [A-Z]       # start with upper
    [a-zA-Z]*   # followed by any alpha

    (?:  # non-capturing grouping for alternation precedence
       [a-z][a-zA-Z]*[A-Z]   # next bit is lower, any zero or more, ending with upper
          |                     # or
       [A-Z][a-zA-Z]*[a-z]   # next bit is upper, any zero or more, ending with lower
    )

    [a-zA-Z]*   # anything that's left
    \b          # end at word
/x

并修改为:

/
    \b          # start at word boundary
    \p{Uppercase_Letter}     # start with upper
    \p{Alphabetic}*          # followed by any alpha

    (?:  # non-capturing grouping for alternation precedence
       \p{Lowercase_Letter}[a-zA-Z]*\p{Uppercase_Letter}   ### next bit is lower, any zero or more, ending with upper
          |                  # or
       \p{Uppercase_Letter}[a-zA-Z]*\p{Lowercase_Letter}   ### next bit is upper, any zero or more, ending with lower
    )

    \p{Alphabetic}*          # anything that's left
    \b          # end at word
/x

标有"###"的行有问题.

Have a problem with lines marked '###'.

此外，当假设数字大于和下划线等于小写字母时，如何修改正则表达式，因此W2X3是有效的CamelCase单词.

In addition, how to modify the regex when assuming than numbers and the underscore are equivalent to lowercase letters, so W2X3 is an valid CamelCase word.

已更新 :( ysth评论)

Updated: (ysth comment)

接下来的

any:表示大写或小写或数字或下划线"

any: mean "uppercase or lowercase or number or underscore"

正则表达式应与CamelWord，CaW相匹配

The regex should match CamelWord, CaW

以大写字母开头
可选任意
小写字母或数字或下划线
可选任意
大写字母
可选任意

请不要将其标记为重复项，因为并非如此. 原始问题(也有答案)只考虑了ascii.

Please, do not mark as duplicate, because it is not. The original question (and answers too) thought only ascii.

推荐答案

我真的无法告诉您您要做什么，但这应该更接近您最初的意图.不过，我仍然无法说出您的意思.

I really can’t tell what you’re trying to do, but this should be closer to what your original intent seems to have been. I still can’t tell what you mean to do with it, though.

m{
    \b
    \p{Upper}      #  start with uppercase code point (NOT LETTER)

    \w*            #  optional ident chars

    # note that upper and lower are not related to letters
    (?:  \p{Lower} \w* \p{Upper}
      |  \p{Upper} \w* \p{Lower}
    )

    \w*

    \b
}x

请勿使用[a-z].实际上，不要使用\p{Lowercase_Letter}或\p{Ll}，因为它们与更理想，更正确的\p{Lowercase}和\p{Lower}不同.

Never use [a-z]. And in fact, don’t use \p{Lowercase_Letter} or \p{Ll}, since those are not the same as the more desirable and more correct \p{Lowercase} and \p{Lower}.

请记住，\w实际上只是

[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Letter_Number}\p{Connector_Punctuation}]

这篇关于Utf8在Perl中为CamelCase(WikiWord)正确的正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！