本文介绍了JLine的编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是一个模块,用于在用户按下<$之前在控制台上拦截用户输入。 c $ c>输入。它使用JNA或类似的向导。

Jline is a module for intercepting user input at a console before the user presses Enter. It uses JNA or similar wizardry.

我正在做一些实验,当我输入更多奇异 Unicode字符时遇到编码问题。这里的操作系统是W10,我正在使用Cygwin。

I'm doing a few experiments with it and I'm getting encoding problems when I input more "exotic" Unicode characters. The OS here is W10 and I'm using Cygwin. Also this is in Groovy but should be obvious to Java people.

def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()
terminal.enterRawMode()
// NB the Terminal I get is class org.jline.terminal.impl.PosixSysTerminal
def reader = terminal.reader()

def bytes = [] // NB class ArrayList
int readInt = -1
while( readInt != 13 && readInt != 10 ) {
    readInt = reader.read()
    byte convertedByte = (byte)readInt
    // see what the binary looks like:
    String binaryString = String.format("%8s", Integer.toBinaryString( convertedByte & 0xFF)).replace(' ', '0')
    println "binary |$binaryString|"
    bytes << (byte)readInt // NB means "append to list"
    println ">>> read |$readInt| byte |$convertedByte|"
}
// strip final byte (13 or 10)
bytes = bytes[0..-2]
println "z bytes $bytes, class ${bytes.class.name}"
def response = new String( (byte[])bytes.toArray(), 'UTF-8' )
// to get proper out encoding for Cygwin I then need to do this (I have no idea why!)
def psOut = new PrintStream(System.out, true, 'UTF-8' )
psOut.print( "using PrintStream: |$response|" )

使用一字节Unicode可以很好地工作,并且像é(2字节)这样的字母也可以很好地处理。但是ẃ却出错了:

This works fine with one-byte Unicode, and letters like "é" (2-bytes) get handled fine. But it goes wrong with "ẃ":

ẃ --> Unicode U+1E83 
    UTF-8 HEX: 0xE1 0xBA 0x83 (e1ba83) 
    BINARY: 11100001:10111010:10000011

实际上,当您输入ẃ时,它输出的二进制文件是11100001:10111010: 10010010

Actually the binary it puts out when you enter "ẃ" is 11100001:10111010:10010010.

这将转换为U + 1E92,这是另一个波兰语字符Ẓ。确实是在响应 字符串中打印出来的内容。

This translates to U+1E92, which is another Polish character, "Ẓ". And that is indeed what gets printed out in the response String.

不幸的是,JLine软件包将这个阅读器交给了您,它是类 org.jline.utils.NonBlocking $ NonBlockingInputStreamReader ...所以我真的不知道该怎么做才能研究其编码(我假定为UTF-8)或以某种方式对其进行修改...谁能解释这个问题是什么?

Unfortunately the JLine package hands you this reader, which is class org.jline.utils.NonBlocking$NonBlockingInputStreamReader... So I don't really know what I can do to investigate its encoding (I presume UTF-8) or somehow modify it... Can anyone explain what the problem is?

推荐答案

据我所知,这与特定于Cygwin的问题有关,然后再问到。

As far as I can tell this relates to a Cygwin-specific problem, as asked and then answered by me a year ago.

在我在此问题之后直接问到的问题...即使使用JLine,即使在基本多语言平面之外,也能正确处理Unicode输入...并希望使用Cygwin控制台...。

There is a solution, in my answer to the question I asked directly after this one... which correctly deals with Unicode input, even when outside the Basic Multilingual Plane, using JLine, ... and using a Cygwin console ... hopefully.

这篇关于JLine的编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 17:55