本文介绍了使用ICU剥离变音标记的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
有人可以提供一些示例代码来剥离变音标记(即,用具有重音符号,变音符号等等的字符替换它们的未标记,未变音的等等,例如每个重音é 使用C ++中的ICU库,将成为一个
UnicodeString
的纯ASCII e
例如:
Can somebody please provide some sample code to strip diacritical marks (i.e., replace characters having accents, umlauts, etc., with their unaccented, unumlauted, etc., character equivalents, e.g., every accented é
would become a plain ASCII e
) from a UnicodeString
using the ICU library in C++? E.g.:
UnicodeString strip_diacritics( UnicodeString const &s ) {
UnicodeString result;
// ...
return result;
}
假设 s
已经正常化。
推荐答案
在其他地方进行更多搜索后:
After more searching elsewhere:
UErrorCode status = U_ZERO_ERROR;
UnicodeString result;
// 's16' is the UTF-16 string to have diacritics removed
Normalizer::normalize( s16, UNORM_NFKD, 0, result, status );
if ( U_FAILURE( status ) )
// complain
// code to convert UTF-16 's16' to UTF-8 std::string 's8' elided
string buf8;
buf8.reserve( s8.length() );
for ( string::const_iterator i = s8.begin(); i != s8.end(); ++i ) {
char const c = *i;
if ( isascii( c ) )
buf8.push_back( c );
}
// result is in buf8
。
这篇关于使用ICU剥离变音标记的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!