本文介绍了Python 2.7:从文本中检测表情符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够检测文本中的表情符号并查找其名称。

I'd like to be able to detect emoji in text and look up their names.

我使用unicodedata模块没有运气,我怀疑自己我不是
了解UTF-8约定。

I've had no luck using unicodedata module and I suspect that I'm notunderstanding the UTF-8 conventions.

我想我需要以utf-8格式加载文档,然后破坏unicode将字符串转换为unicode符号。遍历并查找它们。

I'd guess that I need to load my doc as as utf-8, then break the unicode "strings" into unicode symbols. Iterate over these and look them up.

#new example loaded using pandas and encoding UTF-8
'A man tried to get into my car\U0001f648'

type(test) = unicode

import unicodedata as uni
uni.name(test[0])
Out[89]: 'LATIN CAPITAL LETTER A'

uni.name(test[-3])
Out[90]: 'LATIN SMALL LETTER R'

uni.name(test[-1])
ValueError                                Traceback (most recent call last)
<ipython-input-105-417c561246c2> in <module>()
----> 1 uni.name(test[-1])
ValueError: no such name

# just to be clear
uni.name(u'\U0001f648')
ValueError: no such name

我通过Google查找了unicode符号,这是一个合法的符号。
也许unicodedata模块不是很全面...?

I looked up the unicode symbol via google and it's a legit symbol.Perhaps the unicodedata module isn't very comprehensive...?

我正在考虑通过。
对其他想法感兴趣……这似乎是可行的。

I'm considering making my own look up table from here.Interested in other ideas...this one seems do-able.

推荐答案

我的问题是使用Python2。 unicodedata模块为7。
使用Conda我创建了一个python 3.3环境,现在unicodedata可以按预期的方式运行
,而我已经放弃了我正在研究的所有怪异技巧。

My problem was in using Python2.7 for the unicodedata module.using Conda I created a python 3.3 environment and now unicodedata worksas expected and I've given up on all weird hacks I was working on.

# using python 3.3
import unicodedata as uni

In [2]: uni.name('\U0001f648')
Out[2]: 'SEE-NO-EVIL MONKEY'

感谢Mark Ransom指出我最初是从不是
正确导入我的数据开始的Mojibake的。再次感谢您的帮助。

Thanks to Mark Ransom for pointing out that I originally had Mojibake from notcorrectly importing my data. Thanks again for your help.

这篇关于Python 2.7:从文本中检测表情符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-31 08:18