问题描述
我刚刚开始通过Python2使用日语进行文本聚类.但是,当我根据这些日语单词/术语创建字典时,字典键变为unicode而不是日语.代码如下:
I just started working on text clustering in Japanese through Python2. However, when I created the dictionary based on these Japanese words/terms, the dictionary keys become unicode instead of Japanese. The codes are as follows:
# load data
allWrdMat10 = pd.read_csv("../../data/allWrdMat10.csv.gz",
encoding='CP932')
## Set X as CSR Sparse Matrix
X = np.array(allWrdMat10)
X = sp.csr_matrix(X)
## create dictionary
dict_index = {t:i for i,t in enumerate(allWrdMat10.columns)}
freqrank = np.array(dict_index.values()).argsort()
X_transform = X[:, freqrank < 1000].transpose().toarray()
allWrdMat10.columns
的结果仍然是日语,如下所示:
The results of allWrdMat10.columns
are still Japanese as follows:
Index([u'?', u'.', u'・', u'%', u'0', u'1', u'10月', u'11月', u'12
月', u'1つ',
...
u'瀋陽', u'疆', u'盧', u'籠', u'絆', u'胚', u'諫早', u'趙', u'鉉', u'鎔
基'],dtype='object', length=8655)
但是,dict_index.keys()
的结果如下:
[u'\u77ed\u9283',
u'\u5efa\u3066',
u'\u4f0a',
u'\u5e73\u5b89',
u'\u6025\u9a30',
u'\u897f\u65e5\u672c',
u'\u5e03\u9663',
...]
有什么办法可以将日语单词/术语保留在字典键中?还是有什么办法可以将unicode转换回日语单词/词条?谢谢.
Is there any way I can keep the Japanese words/terms in the dictionary keys? Or is there any way I can convert the unicodes back to Japanese words/terms? Thanks.
推荐答案
您没有在字符串前面加上u,这在Python 2中是必需的. unicode_literals import unicode_literals
You did not prefix the string with u, which is needed in Python 2. Even better,unicode_literals import unicode_literals
这篇关于在字典密钥中将Unicode编码为日语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!