本文介绍了C CSV API的unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个C API来处理可以使用unicode的CSV数据。我知道libcs​​v(sourceforge.net/projects/libcs​​v),但我不认为这将工作为unicode(请纠正我,如果我错了),因为没有看到wchar_t被使用。

I need a C API for manipulating CSV data that can work with unicode. I am aware of libcsv (sourceforge.net/projects/libcsv), but I don't think that will work for unicode (please correct me if I'm wrong) because don't see wchar_t being used.

请指教。

推荐答案

看起来libcs​​v不使用C字符串函数做它的工作,所以它几乎是开箱即用,尽管它的mbcs / ws无知。它将字符串视为具有显式长度的字节数组。这可能主要用于某些宽字符编码,填充ASCII字节以填充宽度(因此,换行可能编码为\0\\\
,空格为\0)。您还可以将宽数据编码为UTF-8,这应该使事情更容易一些。但是两种方法可能在libcs​​v标识空间和行终止符标记的方式上创建者:它希望您能够以字节到字节为基础来告诉它是否正在查看空格或终止符,这不允许多字节空间/ term编码。你可以通过修改库将指针传递到字符串中,并将字符串中剩余的长度传递给空间/ term测试函数来修复这个问题,这将是非常简单的。

It looks like libcsv does not use the C string functions to do its work, so it almost works out of the box, in spite of its mbcs/ws ignorance. It treats the string as an array of bytes with an explicit length. This might mostly work for certain wide character encodings that pad out ASCII bytes to fill the width (so newline might be encoded as "\0\n" and space as "\0 "). You could also encode your wide data as UTF-8, which should make things a bit easier. But both approaches might founder on the way libcsv identifies space and line terminator tokens: it expects you to tell it on a byte-to-byte basis whether it's looking at a space or terminator, which doesn't allow for multibyte space/term encodings. You could fix this by modifying the library to pass a pointer into the string and the length left in the string to its space/term test functions, which would be pretty straightforward.

这篇关于C CSV API的unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:58