如何在Java中检查字符串的字符集

如何在Java中检查字符串的字符集

本文介绍了如何在Java中检查字符串的字符集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我从 LDAP 获取用户信息,有时完整的用户名使用错误的字符集.例如:

In my application I'm getting the user info from LDAP and sometimes the full username comes in a wrong charset. For example:

ТеÑÑ61 ТеÑÑовиÑ61

它也可以是英文或俄文,并且可以正确显示.如果用户名更改,它会在数据库中更新.即使我更改数据库中的值也无法解决问题.

It can also be in English or in Russian and displayed correctly. If the username changes it's updated in database. Even if I change the value in the db it wont solve the problem.

我可以通过这样做在保存之前修复它

I can fix it before saving by doing this

new String(incorrect.getBytes("ISO-8859-1"), "UTF-8");

但是,如果我将它用于包含俄语字符的字符串(例如,Тест61 Тестович61"),我会得到类似????61 ????????61"的内容.

However, if I will use it for the string including characters in Russian (for ex., "Тест61 Тестович61") I get something like this "????61 ????????61".

你能建议一些可以确定字符串字符集的东西吗?

Can you please suggest something that can determine the charset of string?

推荐答案

java 中的字符串,AFAIK,不保留其原始编码 - 它们始终以某种 Unicode 形式在内部存储.您想检测原始流/字节的字符集 - 这就是为什么我认为您的 String.toBytes() 调用为时已晚.

Strings in java, AFAIK, do not retain their original encoding - they are always stored internally in some Unicode form.You want to detect the charset of the original stream/bytes - this is why I think your String.toBytes() call is too late.

理想情况下,如果您可以获得正在读取的输入流,您可以通过以下方式运行它:http://code.google.com/p/juniversalchardet/

Ideally if you could get the input stream you are reading from, you can run it through something like this: http://code.google.com/p/juniversalchardet/

还有很多其他的字符集检测器

There are plenty of other charset detectors out there as well

这篇关于如何在Java中检查字符串的字符集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:38