本文介绍了sqlite删除非utf-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个sqlite数据库,其中包含一些疯狂的ascii字符,我想删除它们,但是我不知道如何去做.我在Google上搜索了一些东西,发现有人说要在MySQL上使用REGEXP,但这引发了一个错误,说REGEXP无法识别.

I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized.

这是我得到的错误:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...

感谢您的帮助

推荐答案

好吧,如果您真的想将丰富的unicode字符串换成普通的ascii字符串(不要介意一些蠢事),您可以使用以下方法:

Well, if you really want to shoehorn a rich unicode string into a plain ascii string(and don't mind some goofs), you could use this:

import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
    # This removes accents, but also other things, like ß‘’""
    return ud.normalize('NFKD', s).encode('ascii','ignore')

要获得更完整的解决方案(使用更少的钉子,但需要第三方模块 unidecode ),.

For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode), see this answer.

尽管如此,最好的解决方案是尽可能在整个代码中使用unicode数据,并仅在必要时才使用编码.

Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary.

这篇关于sqlite删除非utf-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 23:17