本文介绍了php + vim-बंगलौर(班加罗尔)在最后一个字符î之前有休息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了 http://translate.google.com/#en|hi|Bangalore 获取班加罗尔和बंगलौर的北印度语.

I used http://translate.google.com/#en|hi|Bangalore to get the Hindi for Bangalore and बंगलौर.

但是当我将其粘贴到vim中时,最后一个字符र之前会有一个中断.
我使用正则表达式模式/[^ \ p {L} \ p {Nd} \ p {Mn} _]/u 的preg_replace来匹配单词.但这会将最后一个字符视为一个单独的单词.

But when I pasted it in vim there is a break before the last character र.
I am using preg_replace with the regex pattern /[^\p{L}\p{Nd}\p{Mn}_]/u for matching words. But this is treating the last character as a separate word.

这是我的输入字符串मैनेजमेंट,बंगलौर,我期望输出是preg_replace之后的मैनेजमेंटबंगलौर

This is my input string मैनेजमेंट, बंगलौर and I am expecting the output to be मैनेजमेंट बंगलौर after the preg_replace

$cleanedString = preg_replace('/[^\p{L}\p{Nd}\p{Mn}_]/u', ' ', $name);

但是我得到的输出是मैनेजमेंटरर.我在这里做错了什么?我想问题出在vim是如何处理我粘贴的文本的.

But the output I am getting is मैनेजमेंट बंगल र . What am I doing wrong here? I guess the problem starts from how vim handled the text I pasted.

推荐答案

尝试此正则表达式"/[^\p{L}\p{Nd}\p{Mn}\p{Mc}_]/u"

मै中的ae相比,लौ中的O符号占用了额外的水平空间. unicode类\p{Mn}仅匹配非空格标记.使用\p{Mc}匹配空格标记.您可以使用\p{M}匹配所有组合标记:"/[^\p{L}\p{Nd}\p{M}_]/u"

The O symbol in लौ takes extra horizontal space as opposed to the ae in मै. The unicode class \p{Mn} matches only non-spacing marks. Use \p{Mc} to match spacing-marks. You can use \p{M} to match all combining-marks: "/[^\p{L}\p{Nd}\p{M}_]/u"

来自 regular-expressions.info/unicode

  • \p{Mn}\p{Non_Spacing_Mark}:旨在与其他字符组合而又不占用额外空间(例如重音符号,变音符等)的字符.
  • \p{Mc}\p{Spacing_Combining_Mark}:旨在与另一个占用额外空间的字符结合使用的字符(许多东方语言中的元音符号).
  • \p{Me}\p{Enclosing_Mark}:将包含该字符的字符与(圆形,正方形,键帽等)组合在一起.
  • \p{Mn} or \p{Non_Spacing_Mark}: a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.).
  • \p{Mc} or \p{Spacing_Combining_Mark}: a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages).
  • \p{Me} or \p{Enclosing_Mark}: a character that encloses the character is is combined with (circle, square, keycap, etc.).

这篇关于php + vim-बंगलौर(班加罗尔)在最后一个字符î之前有休息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 08:32