问题描述
我想要实现的目标应该相当简单,尽管 Powershell 试图让它变得困难.
What I'm trying to achieve should be rather straightforward although Powershell is trying to make it hard.
我想显示文件的完整路径,有些文件名中包含阿拉伯文、中文、日文和俄文
I want to display the full path of files, some with Arabic, Chinese, Japanese and Russian characters in their names
我总是得到一些无法破译的输出,如下图所示
I always get some undecipherable output, such as the one shown below
控制台中看到的输出正被另一个脚本使用.输出包含 ? 而不是实际字符.
The output seen in console is being consumed as is by another script.The output contains ? instead of the actual characters.
执行的命令是
(Get-ChildItem -Recurse -Path "D: est" -Include *unicode* | Get-ChildItem -Recurse).FullName
是否有任何简单的方法可以启动 powershell(通过命令行或任何可以写入脚本的方式),以便正确查看输出.
Is there any easy way to launch powershell (via command line or in any fashion that can be written into a script) such that the output is seen correctly.
附言我在 Stack Overflow 上遇到过很多类似的问题,但除了将其称为 Windows 控制台子系统问题之外,没有一个有太多的输入.
P.S. I've gone through many similar questions on Stack Overflow but none of them have much input other than calling it a Windows Console Subsystem issue.
推荐答案
PowerShell Core (v6+) 透视图(请参阅下一部分 Windows PowerShell),不考虑字符呈现 问题(也将在下一节中介绍),关于与外部程序的通信:
The PowerShell Core (v6+) perspective (see next section for Windows PowerShell), irrespective of character rendering issues (also covered in the next section), with respect to communicating with external programs:
在类Unix平台上,PowerShell Core 使用UTF-8默认(通常,现在,鉴于现代类 Unix 平台使用基于 UTF-8 的语言环境).
On Unix-like platforms, PowerShell Core uses UTF-8 by default (typically, these days, given that modern Unix-like platforms use UTF-8-based locales).
在Windows上,它是传统系统区域设置,通过其OEM代码页em>,这决定了所有控制台(包括 Windows PowerShell 和 PowerShell Core 控制台窗口)中的默认编码,尽管 Windows 10 的最新版本现在允许将系统区域设置为代码页 65001 (UTF-8)
;请注意,在撰写本文时,该功能仍处于测试阶段,使用它会产生深远的后果 - 请参阅这个答案.
On Windows, it is the legacy system locale, via its OEM code page, that determines the default encoding in all consoles, including both Windows PowerShell and PowerShell Core console windows, though recent versions of Windows 10 now allow setting the system locale to code page
65001
(UTF-8); note that the feature is still in beta as of this writing, and using it has far-reaching consequences - see this answer.
如果您确实使用了该功能,PowerShell Core 控制台窗口将自动识别 UTF-8,但在 Windows PowerShell 中,您仍将拥有也将
$OutputEncoding
设置为 UTF-8(在 Core 中已经默认为 UTF-8),如下所示.
If you do use that feature, PowerShell Core console windows will then automatically be UTF-8-aware, though in Windows PowerShell you'll still have to set
$OutputEncoding
to UTF-8 too (which in Core already defaults to UTF-8), as shown below.
否则 - 特别是在较旧的 Windows 版本上 - 您可以使用与下面详述的 Windows PowerShell 相同的方法.
Otherwise - notably on older Windows versions - you can use the same approach as detailed below for Windows PowerShell.
使您的 Windows PowerShell 控制台窗口识别 Unicode (UTF-8):
Making your Windows PowerShell console window Unicode (UTF-8) aware:
选择一种 TrueType (TT) 字体,它支持您想要输入的特定脚本(书写系统、字母)在控制台中正确显示:
Pick a TrueType (TT) font that supports the specific scripts (writing systems, alphabets) whose characters you want to display properly in the console:
重要:虽然所有 TrueType 字体原则上都支持 Unicode,但它们通常只支持一个子集 的所有 Unicode 字符,即那些对应于特定scripts(书写系统)的字符,例如拉丁字母、西里尔字母(俄语)字母、...
在您的特定情况下 - 如果您必须支持阿拉伯语以及中文、日语和俄语字符 - 您唯一的选择是SimSun-ExtB
,它可在 Windows 上使用仅限 10 个.
请参阅 维基百科,了解 Windows 字体针对哪些脚本(字母)的列表.
Important: While all TrueType fonts support Unicode in principle, they usually only support a subset of all Unicode characters, namely those corresponding to specific scripts (writing systems), such as the Latin script, the Cyrillic (Russian) script, ...
In your particular case - if you must support Arabic as well as Chinese, Japanese and Russian characters - your only choice isSimSun-ExtB
, which is available on Windows 10 only.
See Wikipedia for a list of what Windows fonts target what scripts (alphabets).
要更改字体,请单击窗口左上角的图标并选择
Properties
,然后切换到 Fonts
选项卡并选择感兴趣的 TrueType 字体.
To change the font, click on the icon in the top-left corner of the window and select
Properties
, then change to the Fonts
tab and select the TrueType font of interest.
请参阅此SU答案">not2quibit 了解如何使其他字体可用.
See this SU answer by not2quibit for how to make additional fonts available.
此外,为了与外部程序正确沟通:
Additionally, for proper communication with external programs:
控制台窗口的代码页必须切换到
65001
,即 UTF-8 代码页(通常使用chcp 65001 完成)code>,但是不能直接在 PowerShell 会话中使用,但下面的 PowerShell 命令具有相同的效果).
The console window's code page must be switched to
65001
, the UTF-8 code page (which is usually done withchcp 65001
, which, however, cannot be used directly from within a PowerShell session, but the PowerShell command below has the same effect).
Windows PowerShell 也必须被指示使用 UTF-8 与外部实用程序进行通信,两者在向发送管道输入时都是如此外部程序,通过它
$OutputEncoding
首选项变量(在解码输出 from 外部程序时,它是存储在 [console]::OutputEncoding
中的编码应用).
Windows PowerShell must be instructed to use UTF-8 to communicate with external utilities too, both when sending pipeline input to external programs, via it
$OutputEncoding
preference variable (on decoding output from external programs, it is the encoding stored in [console]::OutputEncoding
that is applied).
Windows PowerShell 中的以下魔法咒语执行此操作(如前所述,此隐式执行
chcp 65001
):
The following magic incantation in Windows PowerShell does this (as stated, this implicitly performs
chcp 65001
):
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding =
New-Object System.Text.UTF8Encoding
要保留这些设置,即默认情况下使您未来的交互式 PowerShell 会话能够识别 UTF-8,将上面的命令添加到您的
$PROFILE
文件.
To persist these settings, i.e., to make your future interactive PowerShell sessions UTF-8-aware by default, add the command above to your
$PROFILE
file.
注意:Windows 10 的最新版本现在允许将系统区域设置设置为代码页
65001
(UTF-8)(从 Window 10 版本 1903 开始,该功能仍处于测试阶段),这使得 所有 控制台窗口默认为 UTF-8,包括 Windows PowerShell.
如果您确实使用该功能,则不再严格需要设置 [console]::InputEncoding
/[console]::OutputEncoding
,但您仍然必须设置$OutputEncoding
(在 PowerShell Core 中不需要,其中 $OutputEncoding
已经默认为 UTF-8).
Note: Recent versions of Windows 10 now allow setting the system locale to code page
65001
(UTF-8) (the feature is still in beta as of Window 10 version 1903), which makes all console windows default to UTF-8, including Windows PowerShell's.
If you do use that feature, setting [console]::InputEncoding
/ [console]::OutputEncoding
is then no longer strictly necessary, but you'll still have to set $OutputEncoding
(which is not necessary in PowerShell Core, where $OutputEncoding
already defaults to UTF-8).
重要:
这些设置假设您与之通信的任何外部实用程序都需要 UTF-8 编码的输入并产生 UTF-8 输出.
例如,用 Node.js 编写的 CLI 满足该标准.
Python 脚本 - 如果在编写时考虑到 UTF-8 支持 - 也可以处理 UTF-8.
相比之下,这些设置可以破坏(旧的)实用程序,它们只需要单字节编码,正如系统的旧 OEM 代码页所暗示的那样.
By contrast, these settings can break (older) utilities that only expect a single-byte encoding as implied by the system's legacy OEM code page.
直到 Windows 8.1,这甚至包括标准的 Windows 实用程序,例如
find.exe
和findstr.exe
,它们已在 Windows 10 中得到修复.有关如何通过临时、按需切换到 UTF-8 以调用给定实用程序来绕过此问题的信息,请参阅本文底部.
Up to Windows 8.1, this even included standard Windows utilities such as
find.exe
andfindstr.exe
, which have been fixed in Windows 10.See the bottom of this post for how to bypass this problem by switching to UTF-8 temporarily, on demand for invoking a given utility.
这些设置仅适用于外部程序,与 PowerShell 的 cmdlet 在输出上使用的编码无关:
These settings apply to external programs only and are unrelated to the encodings that PowerShell's cmdlets use on output:
有关 PowerShell cmdlet 使用的默认字符编码,请参阅此答案;简而言之:如果您希望 Windows PowerShell 中的 cmdlet 默认为 UTF-8(PowerShell [Core] v6+ 无论如何),请添加
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
到您的$PROFILE
,但请注意,这将影响会话中所有带有-Encoding
参数的 cmdlet 调用,除非显式使用该参数;另请注意,在Windows PowerShell 中,您总是会获得带有 BOM 的 UTF-8 文件;相反,在 PowerShell [Core] v6+ 中,默认为 BOM-less UTF-8(在没有-Encoding
和-Encoding utf8
,你必须使用'utf8BOM'
.
See this answer for the default character encodings used by PowerShell cmdlets; in short: If you want cmdlets in Windows PowerShell to default to UTF-8 (which PowerShell [Core] v6+ does anyway), add
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
to your$PROFILE
, but note that this will affect all calls to cmdlets with an-Encoding
parameter in your sessions, unless that parameter is used explicitly; also note that in Windows PowerShell you'll invariably get UTF-8 files with BOM; conversely, in PowerShell [Core] v6+, which defaults to BOM-less UTF-8 (both in the absence of-Encoding
and with-Encoding utf8
, you'd have to use'utf8BOM'
.
当 TrueType 字体处于活动状态时,控制台窗口缓冲区会正确保留(非 ASCII)Unicode 字符.即使他们没有渲染正确;也就是说,即使它们可能出现一般为
?
,以表明当前字体缺乏支持,您可以复制&将这些字符粘贴到别处而不会丢失信息,正如 eryksun 所指出的那样.
While a TrueType font is active, the console-window buffer correctly preserves (non-ASCII) Unicode chars. even if they don't render correctly; that is, even though they may appear generically as
?
, so as to indicate lack of support by the current font, you can copy & paste such characters elsewhere without loss of information, as eryksun notes.
PowerShell 能够将 Unicode 字符输出到控制台,即使没有先切换到代码页
65001
.
但是,这本身不能保证其他程序可以正确处理此类输出 - 见下文.
PowerShell is capable of outputting Unicode characters to the console even without having switched to code page
65001
first.
However, that by itself does not guarantee that other programs can handle such output correctly - see below.
在通过stdout(管道)与外部程序通信时,PowersShell 使用
$OutputEncoding
首选项变量中指定的字符编码,在 Windows PowerShell 中默认为 ASCII(!),这意味着任何非 ASCII字符被音译为literal ?
字符,导致信息丢失.(相比之下,值得称赞的是,PowerShell Core (v6+) 现在始终使用(无 BOM)UTF-8 作为默认编码.)
When it comes to communicating with external programs via stdout (piping), PowersShell uses the character encoding specified in the
$OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, which means that any non-ASCII characters are transliterated to literal ?
characters, resulting in information loss. (By contrast, commendably, PowerShell Core (v6+) now uses (BOM-less) UTF-8 as the default encoding, consistently.)
然而,相比之下,将非 ASCII 参数(而不是标准输出(管道)输出)传递给外部程序似乎不需要no 特殊配置(我不清楚为什么会这样);例如,即使使用默认配置,以下 Node.js 命令也能正确返回
€: 1
:node -pe "process.argv[1] + ':' + process.argv[1].length";€
By contrast, however, passing non-ASCII arguments (rather than stdout (piped) output) to external programs seems to require no special configuration (it is unclear to me why that works); e.g., the following Node.js command correctly returns
€: 1
even with the default configuration:node -pe "process.argv[1] + ': ' + process.argv[1].length" €
[Console]::OutputEncoding
:
控制当控制台将程序输出转换为控制台显示字符时采用的字符编码.
还告诉PowerShell在从外部程序捕获输出时采用什么编码.
结果是,如果您需要从 UTF-8 生成程序中捕获输出,您还需要将[Console]::OutputEncoding
设置为 UTF-8;设置$OutputEncoding
仅涵盖输入(到外部程序)方面.
controls what character encoding is assumed when the console translates program output into console display characters.
also tells PowerShell what encoding to assume when capturing output from an external program.
The upshot is that if you need to capture output from an UTF-8-producing program, you need to set[Console]::OutputEncoding
to UTF-8 as well; setting$OutputEncoding
only covers the input (to the external program) aspect.
[Console]::InputEncoding
将键盘输入的编码设置到控制台 并且还决定了 PowerShell 的 CLI 解释它通过 stdin(标准输入)接收的数据.
[Console]::InputEncoding
sets the encoding for keyboard input into a console and also determines how PowerShell's CLI interprets data it receives via stdin (standard input).
如果在整个会话中将控制台切换到 UTF-8 不是一个选项,您可以针对给定的调用暂时这样做:
If switching the console to UTF-8 for the entire session is not an option, you can do so temporarily, for a given call:
# Save the current settings and temporarily switch to UTF-8.
$oldOutputEncoding = $OutputEncoding; $oldConsoleEncoding = [Console]::OutputEncoding
$OutputEncoding = [Console]::OutputEncoding = New-Object System.Text.Utf8Encoding
# Call the UTF-8 program, using Node.js as an example.
# This should echo '€' (`U+20AC`) as-is and report the length as *1*.
$captured = '€' | node -pe "require('fs').readFileSync(0).toString().trim()"
$captured; $captured.Length
# Restore the previous settings.
$OutputEncoding = $oldOutputEncoding; [Console]::OutputEncoding = $oldConsoleEncoding
旧版 Windows(W10 之前)的问题:
65001
的一个活动chcp
值破坏了一些外部程序的控制台输出,甚至在旧版本的 Windows 中通常可能最终源于一个错误WriteFile()
Windows API 函数(也被标准 C 库使用),它错误地报告了字符而不是字节代码页65001
有效,如 这篇博文.
An active
chcp
value of65001
breaking the console output of some external programs and even batch files in general in older versions of Windows may ultimately have stemmed from a bug in theWriteFile()
Windows API function (as also used by the standard C library), which mistakenly reported the number of characters rather than bytes with code page65001
in effect, as discussed in this blog post.
根据 bobince 对 2008 年的这个答案 是:我的理解是,返回字节数(例如 fread/fwrite/etc)的调用实际上返回一个数字- 字符.这会导致各种各样的症状,例如输入读取不完整、fflush 挂起、批处理文件损坏等."
The resulting symptoms, according to a comment by bobince on this answer from 2008, are: "My understanding is that calls that return a number-of-bytes (such as fread/fwrite/etc) actually return a number-of-characters. This causes a wide variety of symptoms, such as incomplete input-reading, hangs in fflush, the broken batch files and so on."
eryksun 建议使用两种 本地 Windows 控制台窗口的替代方案(
conhost.exe
),它们提供更好、更快的 Unicode 字符渲染,由于使用现代的、GPU 加速的 DirectWrite/DirectX API 而不是旧的 GDI 实现[无法处理复杂的脚本、非 BMP 字符或自动回退字体".
eryksun suggests two alternatives to the native Windows console windows (
conhost.exe
), which provider better and faster Unicode character rendering, due to using the modern, GPU-accelerated DirectWrite/DirectX API instead of the "old GDI implementation [that] cannot handle complex scripts, non-BMP characters, or automatic fallback fonts."
Microsoft 自己的开源 Windows 终端,它通过 Windows 10 中的 Microsoft Store 分发和更新 - 请参阅此处介绍.
历史悠久的第三方替代方案 ConEmu,它具有也可以使用旧版 Windows 的优势.
Long-established third-party alternative ConEmu, which has the advantage of working on older Windows versions too.
这篇关于在 Powershell 中显示 Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!