如:

str = 'sdf344asfasf天地方益3権sdfsdf'

在中文和日语字符中添加():
strAfterConvert = 'sdfasfasf(天地方益)3(権)sdfsdf'

最佳答案

首先,您可以检查字符是否在以下unicode块之一中:

  • Unicode Block 'CJK Unified Ideographs'-U + 4E00至U + 9FFF
  • Unicode Block 'CJK Unified Ideographs Extension A' -U + 3400到U + 4DBF
  • Unicode Block 'CJK Unified Ideographs Extension B'-U + 20000到U + 2A6DF
  • Unicode Block 'CJK Unified Ideographs Extension C'-U + 2A700到U + 2B73F
  • Unicode Block 'CJK Unified Ideographs Extension D'-U + 2B740至U + 2B81F


  • 之后,您需要做的就是遍历字符串,检查char是中文,日文还是韩文(CJK)并相应地追加:
    # -*- coding:utf-8 -*-
    ranges = [
      {"from": ord(u"\u3300"), "to": ord(u"\u33ff")},         # compatibility ideographs
      {"from": ord(u"\ufe30"), "to": ord(u"\ufe4f")},         # compatibility ideographs
      {"from": ord(u"\uf900"), "to": ord(u"\ufaff")},         # compatibility ideographs
      {"from": ord(u"\U0002F800"), "to": ord(u"\U0002fa1f")}, # compatibility ideographs
      {'from': ord(u'\u3040'), 'to': ord(u'\u309f')},         # Japanese Hiragana
      {"from": ord(u"\u30a0"), "to": ord(u"\u30ff")},         # Japanese Katakana
      {"from": ord(u"\u2e80"), "to": ord(u"\u2eff")},         # cjk radicals supplement
      {"from": ord(u"\u4e00"), "to": ord(u"\u9fff")},
      {"from": ord(u"\u3400"), "to": ord(u"\u4dbf")},
      {"from": ord(u"\U00020000"), "to": ord(u"\U0002a6df")},
      {"from": ord(u"\U0002a700"), "to": ord(u"\U0002b73f")},
      {"from": ord(u"\U0002b740"), "to": ord(u"\U0002b81f")},
      {"from": ord(u"\U0002b820"), "to": ord(u"\U0002ceaf")}  # included as of Unicode 8.0
    ]
    
    def is_cjk(char):
      return any([range["from"] <= ord(char) <= range["to"] for range in ranges])
    
    def cjk_substrings(string):
      i = 0
      while i<len(string):
        if is_cjk(string[i]):
          start = i
          while is_cjk(string[i]): i += 1
          yield string[start:i]
        i += 1
    
    string = "sdf344asfasf天地方益3権sdfsdf".decode("utf-8")
    for sub in cjk_substrings(string):
      string = string.replace(sub, "(" + sub + ")")
    print string
    

    以上打印品
    sdf344asfasf(天地方益)3(権)sdfsdf
    

    为了适应 future 发展,您可能需要注意CJK Unified Ideographs ExtensionE。它将为ship with Unicode 8.0,即scheduled for release in June 2015。我已经将其添加到范围中,但是在Unicode 8.0发布之前,您不应该包括它。

    [编辑]

    添加了CJK compatibility ideographsJapanese KanaCJK radicals

    关于python - 如何用Python查找字符串中的中文或日语字符?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30069846/

    10-11 20:25