本文介绍了Windows上的Haskell中的Unicode控制台I / O的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 在Windows下,使用Haskell中的Unicode字符来获取控制台I / O似乎相当困难。这里是悲伤的故事: (Preliminary。)在你甚至考虑在Windows下的控制台中执行Unicode I /以确保您使用的控制台字体,可以呈现您想要的字符。光栅字体(默认)的覆盖范围非常差(不允许复制粘贴他们不能表示的字符),MS提供的truetype选项(consolas,lucida控制台)没有很大的覆盖(虽然这些将允许复制/粘贴他们不能表示的字符)。您可以考虑安装DejaVu Sans Mono(按照底部的说明这里;您可能必须重新启动之前它的工作)。直到这被排序,没有应用程序将能够做许多Unicode I / O;不仅仅是Haskell。 完成后,您会注意到一些应用程序将能够在Windows下执行控制台I / O。但是让它工作仍然相当复杂。基本上有两种方式在Windows下写入控制台。 (下面是任何语言都是真的,不只是Haskell;不要担心,Haskell会进入图片有点!)... 选项A是使用通常的c基于字库的i / o函数;希望操作系统将根据一些编码解释这些字节,这可以编码所有你想要的奇怪和精彩的字符。例如,在Mac OS X上使用等效技术,其中标准系统编码通常是UTF8,这样做效果很好;你发送utf8输出,你看到漂亮的符号。 在窗口上,它的效果不太好。窗口期望的默认编码通常不是覆盖所有Unicode符号的编码。所以如果你想以这种方式看到漂亮的符号,你需要更改编码。一种可能性是您的程序使用 / code>。这里你直接发送UTF16到窗口,这令人高兴:没有编码不匹配的危险,因为窗口总是期望与这些功能的UTF16。 不幸的是,这两个选项在Haskell中都不是很好。首先,没有图书馆,我知道使用选项B,所以这不是很容易。这留下选项A.如果你使用Haskell的I / O库( putStrLn 等等),这是库将做的。在Haskell的现代版本中,它会仔细询问窗口当前代码页是什么,并以正确的编码输出您的字符串。这种方法有两个问题: 一个不是showstopper,但是很讨厌。如上所述,默认编码几乎不会编码您想要的字符:您是用户需要更改为一个编码。因此,您的用户在运行程序之前需要 chcp cp65001 (您可能会发现它令人讨厌,强迫您的用户这样做)。或者你需要绑定到 SetConsoleCP 并在你的程序中做同等的事情(然后使用 hSetEncoding ,以便Haskell库将使用新的编码发送输出),这意味着您需要包装win32库的相关部分以使它们成为Haskell可见。 更严重的是,有一个在Windows中的错误(分辨率:不会修复),导致 Haskell中的错误,这意味着如果您选择了任何代码页,如cp65001,可以覆盖所有的Unicode,Haskell的I / O例程将会故障和失败。因此,即使,您(或您的用户)将编码正确设置为覆盖所有精彩的Unicode字符的某种编码,然后告诉Haskell使用该编码输出内容 上述错误仍未解决,列为低优先级;基本结论是,选项A(在我上面的分类)是不可行的,需要切换到选项B以获得可靠的结果。 问题是:在此期间,任何人都可以建议在Windows下允许在Haskell中使用Unicode控制台I / O的解决方法。 另请参阅this python错误跟踪器数据库条目,解决Python 3中的相同问题(修复建议,但尚未接受到代码库)和这个stackoverflow答案,给出这个问题的解决方法在Python中(基于'选项B'我的分类)。解决方案我想我会回答自己的问题,回答,以下,这是我目前正在做的。很有可能,一个人可以做得更好,这就是为什么我问的问题!但我认为将下面的内容提供给人们是有意义的。它基本上是一个从Python到Haskell的这个 python解决方法同一问题。它使用问题中提到的选项B。 基本思想是创建一个模块IOUtil.hs,包含以下内容,您可以 import 到您的代码中: { - #LANGUAGE ForeignFunctionInterface# - } { - #LANGUAGE CPP# - } { - #LANGUAGE NoImplicitPrelude# - } 模块IOUtil( IOUtil.interact, IOUtil.putChar,IOUtil.putStr,IOUtil。 putStrLn,IOUtil.print, IOUtil.getChar,IOUtil.getLine,IOUtil.getContents,IOUtil.readIO, IOUtil.readLn, ePutChar,ePutStr,ePutStrLn,ePrint, trace,traceIO )其中 #ifdef mingw32_HOST_OS import System.Win32.Types(BOOL,HANDLE,DWORD,LPDWORD,LPWSTR,LPCWSTR,LPVOID) import Foreign.C.Types(CWchar) import Foreign import Prelude hiding(getContents,putStr,putStrLn) - (IO,Read,Show,String) --import合格的System.IO 导入合格的System.IO(getContents) import System.IO隐藏(getContents,putStr,putStrLn) import Data.Char(ord) $ b b { - < http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx> HANDLE WINAPI GetStdHandle(DWORD nStdHandle); 返回INVALID_HANDLE_VALUE,NULL或有效句柄 - } 外部导入stdcall不安全GetStdHandlewin32GetStdHandle :: DWORD - > IO(HANDLE) std_OUTPUT_HANDLE = -11 :: DWORD - 所有DWORD算术运算模2 ^ n std_ERROR_HANDLE = -12 :: DWORD { - < http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx> DWORD WINAPI GetFileType(HANDLE hFile); - } 外部导入stdcall不安全GetFileTypewin32GetFileType :: HANDLE - > IO(DWORD) _FILE_TYPE_CHAR = 0x0002 :: DWORD _FILE_TYPE_REMOTE = 0x8000 :: DWORD { - < http://msdn.microsoft.com/en-us/文库/ ms683167(VS.85).aspx> BOOL WINAPI GetConsoleMode(HANDLE hConsole,LPDWORD lpMode); - } foreign import stdcall unsafeGetConsoleModewin32GetConsoleMode :: HANDLE - > LPDWORD - > IO(BOOL) _INVALID_HANDLE_VALUE =(intPtrToPtr $ -1):: HANDLE is_a_console :: HANDLE - > IO(Bool) is_a_console handle = if(handle == _INVALID_HANDLE_VALUE)then return False else do ft if((ft。& complement _FILE_TYPE_REMOTE)/ = _FILE_TYPE_CHAR)then return False else do ptr< - malloc cm< - win32GetConsoleMode handle ptr free ptr return cm $ b b real_stdout :: IO(Bool) real_stdout = is_a_console =< win32GetStdHandle std_OUTPUT_HANDLE real_stderr :: IO(Bool) real_stderr = is_a_console =< win32GetStdHandle std_ERROR_HANDLE { - BOOL WINAPI WriteConsoleW(HANDLE hOutput,LPWSTR lpBuffer,DWORD nChars, LPDWORD lpCharsWritten,LPVOID lpReserved); - } 外部导入stdcall不安全WriteConsoleWwin32WriteConsoleW :: HANDLE - > LPWSTR - > DWORD - > LPDWORD - > LPVOID - > IO(BOOL) data ConsoleInfo = ConsoleInfo Int(Ptr CWchar)(Ptr DWORD)HANDLE writeConsole :: ConsoleInfo - > [Char] - > IO() writeConsole(ConsoleInfo bufsize buf write handle)string = let fillbuf :: Int - > [Char] - > IO() fillbuf i [] = emptybuf buf i [] fillbuf i remain @(first:rest) | i + 1<大小&& ordf< = 0xffff = do pokeElemOff buf i asWord fillbuf(i + 1)rest | i + 1<大小&& ordf> 0xffff = do pokeElemOff buf i word1 pokeElemOff buf(i + 1)word2 fillbuf(i + 2)rest |否则= emptybuf buf i remain 其中ordf = ord first asWord = fromInteger(toInteger ordf):: CWchar sub = ordf - 0x10000 word1'=((shiftR sub 10 )& 0x3ff)+ 0xD800 word2'=(sub。& 0x3FF)+ 0xDC00 word1 = fromInteger。 toInteger $ word1' word2 = fromInteger。 toInteger $ word2' emptybuf ::(Ptr CWchar) - > Int - > [Char] - > IO() emptybuf _ 0 [] = return() emptybuf _ 0 remaining = fillbuf 0 remaining emptybuf ptr nLeft remaining = do let nLeft'= fromInteger。 toInteger $ nLeft ret< - win32WriteConsoleW handle ptr nLeft'written nullPtr nWritten< - peek written let nWritten'= fromInteger。 toInteger $ nWritten if ret&& (nWritten> 0) then emptybuf(ptr`plusPtr`(nWritten'* szWChar))(nLeft - nWritten')保持否则失败WriteConsoleW failed.\\\ in fillbuf 0 string szWChar = sizeOf(0 :: CWchar) makeConsoleInfo :: DWORD - >句柄 - > IO(ConsoleInfo Handle) makeConsoleInfo nStdHandle fallback = do handle< - win32GetStdHandle nStdHandle is_console< - is_a_console handle let bufsize = 10000 如果不是is_console然后返回$ Right fallback else do buf< - mallocBytes(szWChar * bufsize) written< - malloc return。 left $ ConsoleInfo bufsize buf write handle { - #NOINLINE stdoutConsoleInfo# - } stdoutConsoleInfo :: ConsoleInfo Handle stdoutConsoleInfo = unsafePerformIO $ makeConsoleInfo std_OUTPUT_HANDLE stdout { - #NOINLINE stderrConsoleInfo# - } stderrConsoleInfo :: ConsoleInfo Handle stderrConsoleInfo = unsafePerformIO $ makeConsoleInfo std_ERROR_HANDLE stderr interact ::(String - > String) - > IO() interact f = do s< - getContents putStr(f s) conPutChar ci = writeConsole ci。复制1 conPutStr = writeConsole conPutStrLn ci = writeConsole ci。 (++\\\) putChar :: Char - > IO() putChar =(conPutChar hPutChar)stdoutConsoleInfo putStr :: String - > IO() putStr =(conPutStr hPutStr)stdoutConsoleInfo putStrLn :: String - > IO() putStrLn =(conPutStrLn hPutStrLn)stdoutConsoleInfo print :: Show a => a - > IO() print = putStrLn。 show getChar = System.IO.getChar getLine = System.IO.getLine getContents = System.IO.getContents readIO :: Read a =>字符串 - > IO a readIO = System.IO.readIO readLn :: Read a => IO a readLn = System.IO.readLn ePutChar :: Char - > IO() ePutChar =(conPutChar hPutChar)stderrConsoleInfo ePutStr :: String - > IO() ePutStr =(conPutStr hPutStr)stderrConsoleInfo ePutStrLn :: String - > IO() ePutStrLn =(conPutStrLn hPutStrLn)stderrConsoleInfo ePrint :: Show a => a - > IO() ePrint = ePutStrLn。 show #else import qualified System.IO import Prelude(IO,Read,Show,String) interact = System .IO.interact putChar = System.IO.putChar putStr = System.IO.putStr putStrLn = System.IO.putStrLn getChar = System.IO.getChar getLine = System.IO.getLine getContents = System.IO.getContents ePutChar = System.IO.hPutChar System.IO.stderr ePutStr = System.IO.hPutStr System.IO .stderr ePutStrLn = System.IO.hPutStrLn System.IO.stderr print :: Show a => a - > IO() print = System.IO.print readIO :: Read a =>字符串 - > IO a readIO = System.IO.readIO readLn :: Read a => IO a readLn = System.IO.readLn ePrint :: Show a => a - > IO() ePrint = System.IO.hPrint System.IO.stderr #endif trace :: String - > a - > a trace string expr = unsafePerformIO $ do traceIO string return expr traceIO :: String - > IO() traceIO = ePutStrLn 而不是标准库。他们将检测输出是否重定向;如果不是(即如果我们写一个真正的控制台),那么我们将绕过通常的Haskell I / O函数,并直接写入win32控制台使用 WriteConsoleW ,unicode感知win32控制台功能。在非Windows平台上,条件编译意味着这里的函数只调用标准库。 如果您需要打印到stderr,您应该使用(例如) ePutStrLn ,而不是 hPutStrLn stderr ;我们不定义 hPutStrLn 。 (定义一个是读者的练习!) It seems rather difficult to get console I/O to work with Unicode characters in Haskell under windows. Here is the tale of woe:(Preliminary.) Before you even consider doing Unicode I/O in the console under windows, you need to make sure that you're using a console font which can render the characters you want. The raster fonts (the default) have infinitely poor coverage (and don't allow copy pasting of characters they can't represent), and the truetype options MS provides (consolas, lucida console) have not-great coverage (though these will allow copy/pasting of characters they cannot represent). You might consider installing DejaVu Sans Mono (follow the instructions at the bottom here; you may have to reboot before it works). Until this is sorted, no apps will be able to do much Unicode I/O; not just Haskell.Having done this, you will notice that some apps will be able to do console I/O under windows. But getting it to work remains quite complicated. There are basically two ways to write to the console under windows. (What follows is true for any language, not just Haskell; don't worry, Haskell will enter the picture in a bit!)...Option A is to use the usual c-library style byte-based i/o functions; the hope is that the OS will interpret these bytes according to some encoding which can encode all the weird and wonderful characters you want. For instance, using the equivalent technique on Mac OS X, where the standard system encoding is usually UTF8, this works great; you send out utf8 output, you see pretty symbols.On windows, it works less well. The default encoding that windows expects will generally not be an encoding covering all the Unicode symbols. So if you want to see pretty symbols this way, one way or another, you need to change the encoding. One possibility would be for your program to use the SetConsoleCP win32 command. (So then you need to bind to the Win32 library.) Or, if you'd rather not do that, you can expect your program's user to change the code page for you (they would then have to call the chcp command before they run your program).Option B is to use the Unicode-aware win32 console API commands like WriteConsoleW. Here you send UTF16 direct to windows, which renders it happily: there's no danger of an encoding mismatch because windows always expects UTF16 with these functions.Unfortunately, neither of these options works very well from Haskell. First, there are no libraries that I know of that use Option B, so that's not very easy. This leaves option A. If you use Haskell's I/O library (putStrLn and so on), this is what the library will do. In modern versions of Haskell, it will carefully ask windows what the current code page is, and output your strings in the proper encoding. There are two problems with this approach:One is not a showstopper, but is annoying. As mentioned above, the default encoding will almost never encode the characters you want: you are the user need to change to an encoding which does. Thus your user needs to chcp cp65001 before they run your program (you may find it distasteful to force your users to do this). Or you need to bind to SetConsoleCP and do the equivalent inside your program (and then use hSetEncoding so that the Haskell libraries will send output using the new encoding), which means you need to wrap the relevant part of the win32 libraries to make them Haskell-visible.Much more seriously, there is a bug in windows (resolution: won't fix) which leads to a bug in Haskell which means that if you have selected any code page like cp65001 which can cover all of Unicode, Haskell's I/O routines will malfunction and fail. So essentially, even if you (or your user) set the encoding properly to some encoding which covers all the wonderful Unicode characters, and then 'do everything right' in telling Haskell to output things using that encoding, you still lose.The bug listed above is still unresolved and listed as low priority; the basic conclusion there is that Option A (in my classification above) is unworkable and one needs to switch to Option B to get reliable results. It is not clear what the timeframe will be for this being resolved, as it looks like some considerable work.The question is: in the meantime, can anyone suggest a workaround to allow the use of Unicode console I/O in Haskell under windows.See also this python bug tracker database entry, grappling with the same problem in Python 3 (fix proposed, but not yet accepted into the codebase), and this stackoverflow answer, giving a workaround for this problem in Python (based on 'option B' in my classification). 解决方案 I thought I would answer my own question, and list as one possible answer, the following, which is what I'm actually doing at the moment. It is quite possible that one can do better, which is why I'm asking the question! But I thought it would make sense to make the following available to people. It's basically a translation from Python to Haskell of this python workaround for the same issue. It uses 'option B' mentioned in the question.The basic idea is that you create a module IOUtil.hs, with the following content, which you can import into your code:{-# LANGUAGE ForeignFunctionInterface #-}{-# LANGUAGE CPP #-}{-# LANGUAGE NoImplicitPrelude #-}module IOUtil ( IOUtil.interact, IOUtil.putChar, IOUtil.putStr, IOUtil.putStrLn, IOUtil.print, IOUtil.getChar, IOUtil.getLine, IOUtil.getContents, IOUtil.readIO, IOUtil.readLn, ePutChar, ePutStr, ePutStrLn, ePrint, trace, traceIO ) where#ifdef mingw32_HOST_OSimport System.Win32.Types (BOOL, HANDLE, DWORD, LPDWORD, LPWSTR, LPCWSTR, LPVOID)import Foreign.C.Types (CWchar)import Foreignimport Prelude hiding (getContents, putStr, putStrLn) --(IO, Read, Show, String)--import qualified System.IOimport qualified System.IO (getContents)import System.IO hiding (getContents, putStr, putStrLn)import Data.Char (ord) {- <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx> HANDLE WINAPI GetStdHandle(DWORD nStdHandle); returns INVALID_HANDLE_VALUE, NULL, or a valid handle -}foreign import stdcall unsafe "GetStdHandle" win32GetStdHandle :: DWORD -> IO (HANDLE)std_OUTPUT_HANDLE = -11 :: DWORD -- all DWORD arithmetic is performed modulo 2^nstd_ERROR_HANDLE = -12 :: DWORD {- <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx> DWORD WINAPI GetFileType(HANDLE hFile); -}foreign import stdcall unsafe "GetFileType" win32GetFileType :: HANDLE -> IO (DWORD)_FILE_TYPE_CHAR = 0x0002 :: DWORD_FILE_TYPE_REMOTE = 0x8000 :: DWORD {- <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx> BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode); -}foreign import stdcall unsafe "GetConsoleMode" win32GetConsoleMode :: HANDLE -> LPDWORD -> IO (BOOL)_INVALID_HANDLE_VALUE = (intPtrToPtr $ -1) :: HANDLEis_a_console :: HANDLE -> IO (Bool)is_a_console handle = if (handle == _INVALID_HANDLE_VALUE) then return False else do ft <- win32GetFileType handle if ((ft .&. complement _FILE_TYPE_REMOTE) /= _FILE_TYPE_CHAR) then return False else do ptr <- malloc cm <- win32GetConsoleMode handle ptr free ptr return cmreal_stdout :: IO (Bool)real_stdout = is_a_console =<< win32GetStdHandle std_OUTPUT_HANDLEreal_stderr :: IO (Bool)real_stderr = is_a_console =<< win32GetStdHandle std_ERROR_HANDLE {- BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars, LPDWORD lpCharsWritten, LPVOID lpReserved); -}foreign import stdcall unsafe "WriteConsoleW" win32WriteConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO (BOOL)data ConsoleInfo = ConsoleInfo Int (Ptr CWchar) (Ptr DWORD) HANDLEwriteConsole :: ConsoleInfo -> [Char] -> IO ()writeConsole (ConsoleInfo bufsize buf written handle) string = let fillbuf :: Int -> [Char] -> IO () fillbuf i [] = emptybuf buf i [] fillbuf i remain@(first:rest) | i + 1 < bufsize && ordf <= 0xffff = do pokeElemOff buf i asWord fillbuf (i+1) rest | i + 1 < bufsize && ordf > 0xffff = do pokeElemOff buf i word1 pokeElemOff buf (i+1) word2 fillbuf (i+2) rest | otherwise = emptybuf buf i remain where ordf = ord first asWord = fromInteger (toInteger ordf) :: CWchar sub = ordf - 0x10000 word1' = ((shiftR sub 10) .&. 0x3ff) + 0xD800 word2' = (sub .&. 0x3FF) + 0xDC00 word1 = fromInteger . toInteger $ word1' word2 = fromInteger . toInteger $ word2' emptybuf :: (Ptr CWchar) -> Int -> [Char] -> IO () emptybuf _ 0 [] = return () emptybuf _ 0 remain = fillbuf 0 remain emptybuf ptr nLeft remain = do let nLeft' = fromInteger . toInteger $ nLeft ret <- win32WriteConsoleW handle ptr nLeft' written nullPtr nWritten <- peek written let nWritten' = fromInteger . toInteger $ nWritten if ret && (nWritten > 0) then emptybuf (ptr `plusPtr` (nWritten' * szWChar)) (nLeft - nWritten') remain else fail "WriteConsoleW failed.\n" in fillbuf 0 stringszWChar = sizeOf (0 :: CWchar)makeConsoleInfo :: DWORD -> Handle -> IO (Either ConsoleInfo Handle)makeConsoleInfo nStdHandle fallback = do handle <- win32GetStdHandle nStdHandle is_console <- is_a_console handle let bufsize = 10000 if not is_console then return $ Right fallback else do buf <- mallocBytes (szWChar * bufsize) written <- malloc return . Left $ ConsoleInfo bufsize buf written handle{-# NOINLINE stdoutConsoleInfo #-}stdoutConsoleInfo :: Either ConsoleInfo HandlestdoutConsoleInfo = unsafePerformIO $ makeConsoleInfo std_OUTPUT_HANDLE stdout{-# NOINLINE stderrConsoleInfo #-}stderrConsoleInfo :: Either ConsoleInfo HandlestderrConsoleInfo = unsafePerformIO $ makeConsoleInfo std_ERROR_HANDLE stderrinteract :: (String -> String) -> IO ()interact f = do s <- getContents putStr (f s)conPutChar ci = writeConsole ci . replicate 1conPutStr = writeConsoleconPutStrLn ci = writeConsole ci . ( ++ "\n")putChar :: Char -> IO ()putChar = (either conPutChar hPutChar ) stdoutConsoleInfoputStr :: String -> IO ()putStr = (either conPutStr hPutStr ) stdoutConsoleInfoputStrLn :: String -> IO ()putStrLn = (either conPutStrLn hPutStrLn) stdoutConsoleInfoprint :: Show a => a -> IO ()print = putStrLn . showgetChar = System.IO.getChargetLine = System.IO.getLinegetContents = System.IO.getContentsreadIO :: Read a => String -> IO areadIO = System.IO.readIOreadLn :: Read a => IO areadLn = System.IO.readLnePutChar :: Char -> IO ()ePutChar = (either conPutChar hPutChar ) stderrConsoleInfoePutStr :: String -> IO ()ePutStr = (either conPutStr hPutStr ) stderrConsoleInfoePutStrLn :: String -> IO ()ePutStrLn = (either conPutStrLn hPutStrLn) stderrConsoleInfoePrint :: Show a => a -> IO ()ePrint = ePutStrLn . show#elseimport qualified System.IOimport Prelude (IO, Read, Show, String)interact = System.IO.interactputChar = System.IO.putCharputStr = System.IO.putStrputStrLn = System.IO.putStrLngetChar = System.IO.getChargetLine = System.IO.getLinegetContents = System.IO.getContentsePutChar = System.IO.hPutChar System.IO.stderrePutStr = System.IO.hPutStr System.IO.stderrePutStrLn = System.IO.hPutStrLn System.IO.stderrprint :: Show a => a -> IO ()print = System.IO.printreadIO :: Read a => String -> IO areadIO = System.IO.readIOreadLn :: Read a => IO areadLn = System.IO.readLnePrint :: Show a => a -> IO ()ePrint = System.IO.hPrint System.IO.stderr#endiftrace :: String -> a -> atrace string expr = unsafePerformIO $ do traceIO string return exprtraceIO :: String -> IO ()traceIO = ePutStrLnthen, you use the I/O functions therein contained instead of the standard library ones. They will detect whether output is redirected; if not (i.e. if we're writing to a 'real' console) then we'll bypass the usual Haskell I/O functions and write directly to the win32 console using WriteConsoleW, the unicode-aware win32 console function. On non-windows platforms, conditional compilation means that the functions here just call the standard-library ones.If you need to print to stderr, you should use (e.g.) ePutStrLn, not hPutStrLn stderr; we don't define a hPutStrLn. (Defining one is an exercise for the reader!) 这篇关于Windows上的Haskell中的Unicode控制台I / O的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-28 23:06