问题描述
我正在尝试将文件内容通过管道传输到我制作的一个简单的 ASCII 对称加密程序.这是一个简单的程序,它从 STDIN 读取输入并对输入的每个字节添加或减去某个值 (224).例如:如果第一个字节是4,我们要加密,那么它就变成了228.如果超过255,程序就只是做一些取模.
I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input.For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.
这是我用 cmd 得到的输出(test.txt 包含这是一个测试"):
This is the output I get with cmd (test.txt contains "this is a test"):
type . est.txt | .Crypt.exe --encrypt | .Crypt.exe --decrypt
this is a test
反之亦然,是一种对称加密算法
It also works the other way, thus it is a symmetrical encryption algorithm
type . est.txt | .Crypt.exe --decrypt | .Crypt.exe --encrypt
this is a test
但是,PowerShell 上的行为是不同的.首先加密时,我得到:
But, the behaviour on PowerShell is different. When encrypting first, I get:
type . est.txt | .Crypt.exe --encrypt | .Crypt.exe --decrypt
this is a test_*
这就是我首先解密时得到的:
And that is what I get when decrypting first:
可能是编码问题.提前致谢.
Maybe is an encoding problem. Thanks in advance.
推荐答案
tl;dr:
从 PowerShell 7.2 开始,如果您需要原始字节处理和/或需要防止 PowerShell 在您的文本数据中添加尾随换行符,请避免PowerShell 管道.
As of PowerShell 7.2, if you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
- 未来支持在外部程序和文件重定向之间传递原始字节数据是GitHub 问题 #1908.
- Future support for passing raw byte data between external programs and to-file redirections is the subject of GitHub issue #1908.
对于原始字节处理,使用 /c
转至 cmd
(在 Windows 上;在类 Unix 平台/类 Unix 的 Windows 上)子系统,使用 sh
或 bash
和 -c
):
For raw byte handling, shell out to cmd
with /c
(on Windows; on Unix-like platforms / Unix-like Windows subsystems, use sh
or bash
with -c
):
cmd /c 'type . est.txt | .Crypt.exe --encrypt | .Crypt.exe --decrypt'
使用类似的技术将原始字节输出保存在文件 - 不要不要使用PowerShell>
运算符:
Use a similar technique to save raw byte output in a file - do not use PowerShell's >
operator:
cmd /c 'someexe > file.bin'
请注意,如果您想在 PowerShell 变量中捕获外部程序的文本输出,您需要确保 [Console]::OutputEncoding
匹配您程序的输出字符编码(通常是活动的 OEM 代码页),在这种情况下默认为 true;详情请参阅下一节.
Note that if you want to capture an external program's text output in a PowerShell variable, you need to make sure that [Console]::OutputEncoding
matches your program's output character encoding (the active OEM code page, typically), which should be true by default in this case; see the next section for details.
但是,通常最好避免对文本数据进行字节操作.
Generally, however, byte manipulation of text data is best avoided.
有两个单独的问题,其中只有一个有一个简单的解决方案:
There are two separate problems, only one of which has a simple solution:
问题 1:确实存在字符编码问题,正如您所怀疑的:
Problem 1: There is indeed a character encoding problem, as you suspected:
PowerShell 隐形 将自身作为管道中的中介插入,即使在向外部程序发送数据和从其接收数据时:它将数据从 .NET 转换为 .NET字符串 (System.String
),它们是 UTF-16 代码单元的序列.
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String
), which are sequences of UTF-16 code units.
- 顺便说一句:即使只使用 PowerShell 原生命令,这意味着从 文件 读取输入并再次保存它们 可能会导致不同的字符编码,因为一旦(字符串)数据被读入内存,关于原始字符编码的信息就不会保留,并且在保存时使用 cmdlet 的默认字符编码;虽然此默认编码在 PowerShell (Core) 6+ 中始终是无 BOM 的 UTF-8,但它因 Windows PowerShell 中的 cmdlet 而异 - 请参阅 这个答案.
- As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell (Core) 6+, it varies by cmdlet in Windows PowerShell - see this answer.
为了向外部程序发送和接收数据(例如在您的情况下Crypt.exe
),您需要匹配他们的 字符编码;在您的情况下,对于使用原始字节处理的 Windows 控制台应用程序,隐含的编码是系统的活动 OEM 代码页.
In order to send to and receive data from external programs (such as Crypt.exe
in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
在发送数据时,PowerShell使用
$OutputEncoding
首选项变量的编码来编码(总是被视为文本的)数据,在 Windows PowerShell 中默认为 ASCII(!),在 PowerShell (Core) 中默认为(无 BOM)UTF-8.
On sending data, PowerShell uses the encoding of the
$OutputEncoding
preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
接收端被默认覆盖:PowerShell使用[Console]::OutputEncoding
(它本身反映 chcp
报告的代码页)用于解码接收到的数据,在 Windows 上,这默认反映活动的 OEM 代码页,在 Windows PowerShell 和 PowerShell [Core].
The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding
(which itself reflects the code page reported by chcp
) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core].
要解决您的主要问题,因此您需要将 $OutputEncoding
设置为活动的 OEM 代码页:
To fix your primary problem, you therefore need to set $OutputEncoding
to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
问题 2:PowerShell 在将数据通过管道传输到外部程序时,总是向没有换行符的数据追加一个尾随换行符:
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
也就是说,"foo"|.Crypt.exe
不发送($OutputEncoding
编码的字节表示)"foo"
到 .Crypt.exe
的标准输入,它在 Windows 上发送 "foo`r`n"
;即,一个(适用于平台的)换行符序列(Windows 上的 CRLF)会自动且不变地附加(除非字符串已经碰巧有一个尾随换行符).
That is, "foo" | .Crypt.exe
doesn't send (the $OutputEncoding
-encoded bytes representing) "foo"
to .Crypt.exe
's stdin, it sends "foo`r`n"
on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
GitHub 问题 #5974 和 这个答案.
在您的特定情况下,隐式附加的 "`r`n"
也受字节值移位的影响,这意味着第一个 Crypt.exe
> 调用将其转换为 -*
,导致 另一个 "`r`n"
在数据发送到第二个 Crypt.exe
调用.
In your specific case, the implicitly appended "`r`n"
is also subject to the byte-value-shifting, which means that the 1st Crypt.exe
calls transforms it to -*
, causing another "`r`n"
to be appended when the data is sent to the 2nd Crypt.exe
call.
最终结果是一个额外的往返换行符(中间-*
),加上一个加密换行符,结果为φΩ
).
The net result is an extra newline that is round-tripped (the intermediate -*
), plus an encrypted newline that results in φΩ
).
简而言之:如果您的输入数据没有尾随换行符,您必须从结果中截去最后 4 个字符(代表往返的和无意中加密的换行序列):
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content . est.txt | .Crypt.exe --decrypt | .Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
鉴于调用 cmd/c
如答案顶部所示也有效,这似乎不值得.
Given that calling cmd /c
as shown at the top of the answer works too, that hardly seems worth it.
与 cmd
(或类似 POSIX 的 shell,例如 bash
)不同:
Unlike cmd
(or POSIX-like shells such as bash
):
- PowerShell 不支持管道中的原始字节数据.
- 当与外部程序交谈时,它只知道文本(而在交谈时它通过.NET对象PowerShell 自己的命令,这是其大部分功能的来源).
- PowerShell doesn't support raw byte data in pipelines.
- When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
具体来说,其工作原理如下:
Specifically, this works as follows:
当您通过管道(到其标准输入流)将数据发送到外部程序时:
When you send data to an external program via the pipeline (to its stdin stream):
它使用
$OutputEncoding
首选项变量中指定的字符编码转换为文本(字符串),在 Windows PowerShell 中默认为 ASCII(!),在 PowerShell (Core) 中默认为(无 BOM)UTF-8.
It is converted to text (strings) using the character encoding specified in the
$OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
警告:如果您为
$OutputEncoding
分配带有 BOM 的编码,PowerShell(从 v7.0 开始)将发送 BOM 作为发送到外部程序的第一行输出的一部分;因此,例如,不要在 Windows PowerShell 中使用[System.Text.Encoding]::Utf8
(发出 BOM),而使用[System.Text.Utf8Encoding]::new($false)
(没有)代替.
Caveat: If you assign an encoding with a BOM to
$OutputEncoding
, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use[System.Text.Encoding]::Utf8
(which emits a BOM) in Windows PowerShell, and use[System.Text.Utf8Encoding]::new($false)
(which doesn't) instead.
如果数据没有被 PowerShell 捕获或重定向,编码问题可能并不总是很明显,即如果外部程序以使用 Windows Unicode 的方式实现控制台 API 打印到显示器.
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
使用 PowerShell 的默认输出格式(与您打印到控制台时看到的格式相同)将不是文本(字符串)的内容字符串化,并带有重要警告:
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
- 如果(最后一个)输入对象已经是一个本身没有尾随换行符的字符串,则总是附加一个em>(甚至现有的尾随换行符也被替换为平台原生的换行符,如果不同的话).
- 此行为可能会导致问题,如 GitHub 问题 #5974 中所述,以及在这个答案中.
- If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
- This behavior can cause problems, as discussed in GitHub issue #5974 and also in this answer.
当您捕获/重定向数据来自外部程序(来自其标准输出流)时,它总是解码为行text(字符串),基于
[Console]::OutputEncoding
中指定的编码,默认为 Windows 上的活动 OEM 代码页(令人惊讶的是,在 两个 PowerShell 版本中,从 v7.0-preview6 开始.When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in
[Console]::OutputEncoding
, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6).PowerShell 内部文本使用 .NET
System.String
类型,它基于 UTF-16 代码单元(通常松散但错误地称为Unicode").> 当管道数据在外部程序之间,
when piping data between external programs,
PowerShell-internally text is represented using the .NET
System.String
type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode").以上同样适用:
当数据重定向到文件时;也就是说,无论数据的来源及其原始字符编码如何,PowerShell 在将数据发送到文件时都使用其默认编码;在 Windows PowerShell 中,
>
生成 UTF-16LE 编码的文件(带有 BOM),而 PowerShell(核心)明智地默认为无 BOM 的 UTF-8(一致地,跨文件写入 cmdlet).when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell,
>
produces UTF-16LE-encoded files (with BOM), whereas PowerShell (Core) sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).这篇关于在 CMD 和 PowerShell 中管道时的不同行为和输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!