本文介绍了如何删除Java中的代理字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我面临的情况是我在文本中获取Surrogate字符,我将保存到MySql 5.1。由于这不支持UTF-16,我想在将它保存到数据库之前通过java方法手动删除这些代理对。

I am facing a situation where i get Surrogate characters in text that i am saving to MySql 5.1. As the UTF-16 is not supported in this, I want to remove these surrogate pairs manually by a java method before saving it to the database.

我写了以下内容现在的方法,我很想知道是否有一种直接和最佳的方法来处理这个问题。

I have written the following method for now and I am curious to know if there is a direct and optimal way to handle this.

预先感谢您的帮助。

public static String removeSurrogates(String query) {
    StringBuffer sb = new StringBuffer();
    for (int i = 0; i < query.length() - 1; i++) {
        char firstChar = query.charAt(i);
        char nextChar = query.charAt(i+1);
        if (Character.isSurrogatePair(firstChar, nextChar) == false) {
            sb.append(firstChar);
        } else {
            i++;
        }
    }
    if (Character.isHighSurrogate(query.charAt(query.length() - 1)) == false
            && Character.isLowSurrogate(query.charAt(query.length() - 1)) == false) {
        sb.append(query.charAt(query.length() - 1));
    }

    return sb.toString();
}


推荐答案

以下是一些事情:


当且仅当它是低代理代码单元或高代理代码单元时,char值是代理代码单元。


  • 检查配对似乎毫无意义,为什么不删除所有代理?

  • Checking for pairs seems pointless, why not just remove all surrogates?

    x == false 相当于!x

    更好在你不需要同步的情况下(比如一个永远不会离开本地范围的变量)。

    StringBuilder is better in cases where you don't need synchronization (like a variable that never leaves local scope).

    我建议这个:

    public static String removeSurrogates(String query) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < query.length(); i++) {
            char c = query.charAt(i);
            // !isSurrogate(c) in Java 7
            if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
                sb.append(firstChar);
            }
        }
        return sb.toString();
    }
    



    如果 >声明



    您询问此声明:

    Breaking down the if statement

    You asked about this statement:

    if (!(Character.isHighSurrogate(c) || Character.isLowSurrogate(c))) {
        sb.append(firstChar);
    }
    

    理解它的一种方法是将每个操作分解为自己的函数,所以你可以看到这个组合符合你的期望:

    One way to understand it is to break each operation into its own function, so you can see that the combination does what you'd expect:

    static boolean isSurrogate(char c) {
        return Character.isHighSurrogate(c) || Character.isLowSurrogate(c);
    }
    
    static boolean isNotSurrogate(char c) {
        return !isSurrogate(c);
    }
    
    ...
    
    if (isNotSurrogate(c)) {
        sb.append(firstChar);
    }
    

    这篇关于如何删除Java中的代理字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

  • 10-27 21:37