是否有一个ASCII扩展编码列表？

本文介绍了是否有一个ASCII扩展编码列表？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我需要决定何时（不是）根据已知的文件编码和所需的输出编码转换文本文件。如果文本是US-ASCII ，我不需要转换它，如果输出编码是ASCII，UTF-8，Latin1，... 显然，我需要将US-ASCII文件转换为UTF-16或UTF-32 标准编码列表存在于 http://www.iana.org/assignments/character-sets/character-sets.xml 如果符合以下条件，则必须进行转换：最小字符大小> 1字节或前127个代码点与US-ASCII不相同。我想知道：是否有类似的列表，包含有关每个编码的实现的详细信息（bytelenght，ASCII兼容性）？我很高兴一个只包含 Qt5支持的编解码器。 EDIT 我已经找到问题的答案是否所有8位或8位的编解码器都是ASCII的超集？字词：US-ASCII可以解释为任何8或8位编码吗？此处：字符集这不是ASCII的超集相反，这将有助于知道：有一个字符集列表，它们是ASCII的超集。这看起来很有前景： mime.charsets - 是ASCII超集的字符集列表，，但我找不到实际的mime.charsets档案。解决方案解码给定编码中的字节0x00 - 0x7F，并检查字符是否与ASCII匹配。例如，在Python 3.x中： def is_ascii_superset（encoding）：代码范围： if bytes（[codepoint]）。decode（encoding，'ignore'）！= chr（codepoint）： return False return True 这给出： > > is_ascii_superset（'US-ASCII'） true >>> is_ascii_superset（'windows-1252'） True >>> is_ascii_superset（'ISO-8859-15'） True >>> is_ascii_superset（'UTF-8'） True >>> is_ascii_superset（'UTF-16'） False >>> is_ascii_superset（'IBM500'）＃EBCDIC的变体 False EDIT：获取C ++中的Qt版本支持的每个编码的US-ASCII兼容性： code> #include< QTextCodec> #include< QMap> typedef enum { eQtCodecUndefined， eQtCodecAsciiIncompatible， eQtCodecAsciiCompatible，} tQtCodecType; QMap< QByteArray，tQtCodecType> QtCodecTypes（） { QMap< QByteArray，tQtCodecType> CodecTypes; //如何测试Qt对ASCII数据的解释？ QList< QByteArray> available = QTextCodec :: availableCodecs（）; QTextCodec * referenceCodec = QTextCodec :: codecForName（UTF-8）; //因为Qt没有US-ASCII，但我们只测试字节0-127和UTF-8是US-ASCII的超集 if（referenceCodec == 0） { qDebug （Unable to get reference codec'UTF-8'）; return CodecTypes; } for（int i = 0; i { const QByteArray name = available.at（i）; QTextCodec * currCodec = QTextCodec :: codecForName（name）; if（currCodec == NULL） { qDebug（Unable to get codec for'％s'，qPrintable（QString（name）））; CodecTypes.insert（name，eQtCodecUndefined）; continue; } tQtCodecType type = eQtCodecAsciiCompatible; for（uchar j = 0; j { const char c = ; // character to test< 2 ^ 8 QString sRef，sTest; sRef = referenceCodec-> toUnicode（& c，1）; //将字符转换为UTF-16（QString内部），假设它是ASCII（通过UTF-8） sTest = currCodec-> toUnicode（& c，1）; //将字符转换为UTF-16，假设它是类型[currCodec] if（sRef！= sTest）//比较两个UTF-16表示 - >如果它们相等，这些编解码器对于Qt { type = eQtCodecAsciiIncompatible; break; } } CodecTypes.insert（name，type）; } return CodecTypes; } I need to decide when (not) to convert a text file based on the known file encoding and the desired output encoding.If the text is US-ASCII, I don't need to convert it if the output encoding is ASCII, UTF-8, Latin1, ...Obviously I need to convert a US-ASCII file to UTF-16 or UTF-32.A list of standard encodings exists athttp://www.iana.org/assignments/character-sets/character-sets.xmlA conversion is necessary if:the minimal character size is > 1 byte orthe first 127 code points are not the same as US-ASCII.I'd like to know:Is there a similar list with details (bytelenght, ASCII-compatibility) about the implementation of each encoding?I'd be happy about a list containing only codecs supported by Qt5.EDITI already found an answer to the questionAre all 8-or-variable8-bit-based codecs a superset of ASCII?In other words: Can US-ASCII be interpreted as any 8-or-variable8-bit-based encoding?here: Character set that is not a superset of ASCIIInstead, it would be helpful to know:Is there a list of character sets which are supersets of ASCII?This looks promising:mime.charsets - list of character sets which are ASCII supersets,but I couldn't find an actual mime.charsets file. 解决方案 An alternative approach is to decode the bytes 0x00 - 0x7F in the given encoding, and check that the characters match ASCII. For example, in Python 3.x:def is_ascii_superset(encoding): for codepoint in range(128): if bytes([codepoint]).decode(encoding, 'ignore') != chr(codepoint): return False return TrueThis gives:>>> is_ascii_superset('US-ASCII')True>>> is_ascii_superset('windows-1252')True>>> is_ascii_superset('ISO-8859-15')True>>> is_ascii_superset('UTF-8')True>>> is_ascii_superset('UTF-16')False>>> is_ascii_superset('IBM500') # a variant of EBCDICFalseEDIT: Get US-ASCII compatibility for each encoding supported by your Qt version in C++:#include <QTextCodec>#include <QMap>typedef enum{ eQtCodecUndefined, eQtCodecAsciiIncompatible, eQtCodecAsciiCompatible,} tQtCodecType;QMap<QByteArray, tQtCodecType> QtCodecTypes(){ QMap<QByteArray, tQtCodecType> CodecTypes; // How to test Qt's interpretation of ASCII data? QList<QByteArray> available = QTextCodec::availableCodecs(); QTextCodec *referenceCodec = QTextCodec::codecForName("UTF-8"); // because Qt has no US-ASCII, but we only test bytes 0-127 and UTF-8 is a superset of US-ASCII if(referenceCodec == 0) { qDebug("Unable to get reference codec 'UTF-8'"); return CodecTypes; } for(int i = 0; i < available.count(); i++) { const QByteArray name = available.at(i); QTextCodec *currCodec = QTextCodec::codecForName(name); if(currCodec == NULL) { qDebug("Unable to get codec for '%s'", qPrintable(QString(name))); CodecTypes.insert(name, eQtCodecUndefined); continue; } tQtCodecType type = eQtCodecAsciiCompatible; for(uchar j = 0; j < 128; j++) // UTF-8 == US-ASCII in the lower 7 bit { const char c = (char)j; // character to test < 2^8 QString sRef, sTest; sRef = referenceCodec->toUnicode(&c, 1); // convert character to UTF-16 (QString internal) assuming it is ASCII (via UTF-8) sTest = currCodec->toUnicode(&c, 1); // convert character to UTF-16 assuming it is of type [currCodec] if(sRef != sTest) // compare both UTF-16 representations -> if they are equal, these codecs are transparent for Qt { type = eQtCodecAsciiIncompatible; break; } } CodecTypes.insert(name, type); } return CodecTypes;} 这篇关于是否有一个ASCII扩展编码列表？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！