如:
str = 'sdf344asfasf天地方益3権sdfsdf'
在中文和日语字符中添加
()
:strAfterConvert = 'sdfasfasf(天地方益)3(権)sdfsdf'
最佳答案
首先,您可以检查字符是否在以下unicode块之一中:
之后,您需要做的就是遍历字符串,检查char是中文,日文还是韩文(CJK)并相应地追加:
# -*- coding:utf-8 -*-
ranges = [
{"from": ord(u"\u3300"), "to": ord(u"\u33ff")}, # compatibility ideographs
{"from": ord(u"\ufe30"), "to": ord(u"\ufe4f")}, # compatibility ideographs
{"from": ord(u"\uf900"), "to": ord(u"\ufaff")}, # compatibility ideographs
{"from": ord(u"\U0002F800"), "to": ord(u"\U0002fa1f")}, # compatibility ideographs
{'from': ord(u'\u3040'), 'to': ord(u'\u309f')}, # Japanese Hiragana
{"from": ord(u"\u30a0"), "to": ord(u"\u30ff")}, # Japanese Katakana
{"from": ord(u"\u2e80"), "to": ord(u"\u2eff")}, # cjk radicals supplement
{"from": ord(u"\u4e00"), "to": ord(u"\u9fff")},
{"from": ord(u"\u3400"), "to": ord(u"\u4dbf")},
{"from": ord(u"\U00020000"), "to": ord(u"\U0002a6df")},
{"from": ord(u"\U0002a700"), "to": ord(u"\U0002b73f")},
{"from": ord(u"\U0002b740"), "to": ord(u"\U0002b81f")},
{"from": ord(u"\U0002b820"), "to": ord(u"\U0002ceaf")} # included as of Unicode 8.0
]
def is_cjk(char):
return any([range["from"] <= ord(char) <= range["to"] for range in ranges])
def cjk_substrings(string):
i = 0
while i<len(string):
if is_cjk(string[i]):
start = i
while is_cjk(string[i]): i += 1
yield string[start:i]
i += 1
string = "sdf344asfasf天地方益3権sdfsdf".decode("utf-8")
for sub in cjk_substrings(string):
string = string.replace(sub, "(" + sub + ")")
print string
以上打印品
sdf344asfasf(天地方益)3(権)sdfsdf
为了适应 future 发展,您可能需要注意CJK Unified Ideographs ExtensionE。它将为ship with Unicode 8.0,即scheduled for release in June 2015。我已经将其添加到范围中,但是在Unicode 8.0发布之前,您不应该包括它。
[编辑]
添加了CJK compatibility ideographs,Japanese Kana和CJK radicals。
关于python - 如何用Python查找字符串中的中文或日语字符?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/30069846/