python - utf-8编码/解码有问题

我正在读取.csv编码的UTF-8。
我想创建一个索引并重写csv。
索引被创建为一个持续编号和一个单词的首字母。
Python 2.7.10，Ubuntu服务器

#!/usr/bin/env python
# -*- coding: utf-8 -*-
counter = 0
tempDict = {}
with open(modifiedFile, "wb") as newFile:
    with open(originalFile, "r") as file:
        for row in file:
            myList = row.split(",")
            toId = str(myList[0])

            if toId not in tempDict:
                tempDict[toId] = counter
                myId = str(toId[0]) + str(counter)
                myList.append(myId)
                counter += 1
            else:
                myId = str(toId[0]) + str(tempDict[toId])
                myList.append(myId)

            # and then I write everything into the csv
            for i, j in enumerate(myList):
                if i < 6:
                    newFile.write(str(j).strip())
                    newFile.write(",")

                else:
                    newFile.write(str(j).strip())
                    newFile.write("\n")

问题如下。
当单词以花哨字母开头时，例如

C
É
一种
...

我创建的ID以?开头，但不以单词的字母开头。
奇怪的是，随着我创建的csv，带有花哨字母的单词被正确书写。没有?或其他表示错误编码的符号。

这是为什么？

最佳答案

在python 2.x中，字符串默认为非Unicode-str()返回非Unicode字符串。请使用unicode()。

此外，必须通过utf-8使用codecs.open()编码而不是内置的open()打开文件。

关于python - utf-8编码/解码有问题，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/41812892/