问题描述
我想使用重定向附加>>
或写>
来写入txt文件,但是当我这样做时,我会收到奇怪的格式"\x00a\x00p..."
.
I want to use the redirect append >>
or write >
to write to a txt file, but when I do, I receive a weird format "\x00a\x00p..."
.
我成功使用了Set-Content
和Add-Content
,为什么它们能按预期运行,但>>
和>
重定向运算符却没有运行?
I successfully use Set-Content
and Add-Content
, why do they function as expected, but not the >>
and >
redirect operators?
使用PowerShell cat
和简单的Python打印显示输出.
Showing the output using PowerShell cat
as well as simple Python print.
rocket_brain> new-item test.txt
rocket_brain> "appended using add-content" | add-content test.txt
rocket_brain> cat test.txt
appended using add-content
但是如果我使用重定向附加>>
but then if I use redirect append >>
rocket_brain> "appended using redirect" >> test.txt
rocket_brain> cat test.txt
appended using add-content
a p p e n d e d u s i n g r e d i r e c t
简单的Python脚本:read_test.py
Simple Python script: read_test.py
with open("test.txt", "r") as file: # open test.txt in readmode
data = file.readlines() # append each line to the list data
print(data) # output list with each input line as an item
使用read_test.py,我发现格式有所不同
Using read_test.py I see a difference in formatting
rocket_brain> python read_test.txt
['appended using add-content\n', 'a\x00p\x00p\x00e\x00n\x00d\x00e\x00d\x00 \x00u\x00s\x00i\x00n\x00g\x00 \x00r\x00e\x00d\x00i\x00r\x00e\x00c\x00t\x00\r\x00\n', '\x00']
注意:如果我仅使用重定向附加>>
(或写>
)而不先使用Add-Content
,则cat
输出看起来很正常(而不是间隔开),但是我会得到每一行的c17>格式(包括从>
运算符开始的任何Add-Content
命令).在记事本(或VS等)中打开文件,文本始终看起来像预期的那样.在cmd
(而不是PS)中使用>>
或>
也会以预期的ascii格式存储文本.
NOTE: If I use only the redirect append >>
(or write >
) without first using Add-Content
, the cat
output looks normal (instead of spaced out), but I will then get the /x00p
format for every line when using the Python script (including any Add-Content
command after starting with >
operators). Opening the file in Notepad (or VS etc), the text always looks as expected. Using >>
or >
in cmd
(instead of PS) also stores text in expected ascii format.
推荐答案
注意:问题最终在于,在 Windows PowerShell 中,不同的cmdlet/运算符使用不同的默认编码.此问题已在PowerShell Core (v6 +)中得到了解决,在该问题中,始终使用无BOM的UTF-8.
Note: The problem is ultimately that in Windows PowerShell different cmdlets / operators use different default encodings. This problem has been resolved in PowerShell Core(v6+), where BOM-less UTF-8 is consistently used.
- 在附加到现有文件时,
-
>>
盲目地应用Out-File
的默认编码(实际上,>
的行为类似于Out-File
,而>>
的行为类似于),在 Windows PowerShell 中是名为Unicode
的编码,即UTF-16LE ,其中大多数字符都编码为2字节序列,即使是ASCII范围;后者的高字节为0x0
(NUL
).
>>
blindly appliesOut-File
's default encoding when appending to an existing file (in effect,>
behaves likeOut-File
and>>
likeOut-File -Append
), which in Windows PowerShell is the encoding namedUnicode
, i.e., UTF-16LE, where most characters are encoded as 2-byte sequences, even those in the ASCII range; the latter have a0x0
(NUL
) as the high byte.
- 因此,除非目标文件的现有内容使用相同的编码,否则您最终将得到不同编码的 mix ,这就是您所遇到的情况.
Add-Content
确实尝试检测文件的现有编码,则在空文件上使用了该文件,在这种情况下,将应用 Set-Content
的默认编码,在 Windows中PowerShell 是名为 Default
的编码,指的是系统的活动ANSI代码页.
While Add-Content
, by contrast, does try to detect a file's existing encoding, you used it on an empty file, in which case Set-Content
's default encoding is applied, which in Windows PowerShell is the encoding named Default
, which refers to your system's active ANSI code page.
因此,为了在添加更多内容时匹配Add-Content
调用最初创建的单字节ANSI编码,使用Out-File -Append -Encoding Default
代替>>
,或者直接使用Add-Content
.
Therefore, to match the single-byte ANSI encoding initially created by your Add-Content
call when appending further content, use Out-File -Append -Encoding Default
instead of >>
, or simply keep using Add-Content
.
- 或者,用
Add-Content -Encoding ...
选择一种不同的编码,然后在Out-File -Append
调用中进行匹配;通常,UTF-8是最佳选择,但是请注意,当您在Windows PowerShell中创建UTF-8文件时,它将以BOM表(将文件标识为UTF-8的伪字节顺序标记)开头,类似于Unix平台通常不期望).Alternatively, pick a different encoding with
Add-Content -Encoding ...
and match it in theOut-File -Append
call; UTF-8 is generally the best choice, though note that when you create a UTF-8 file in Windows PowerShell, it will start with a BOM (a pseudo byte-order mark identifying the file as UTF-8, which Unix-like platforms typically do not expect).
在PowerShell v5.1 +中,您还可以全局更改默认编码,包括>
和>>
的默认编码(在早期版本中是不可能的).例如,要更改为UTF-8,请使用:$PSDefaultParameterValues['*:Encoding']='UTF8'
In PowerShell v5.1+ you may also change the default encoding globally, including for >
and >>
(which isn't possible in earlier versions). To change to UTF-8, for instance, use:$PSDefaultParameterValues['*:Encoding']='UTF8'
除了使用不同的默认编码(在Windows PowerShell中)外,重要的是要注意一方面 Set-Content
/Add-Content
以及另一方面>
/>>
/Out-File [-Append]
与非字符串输入完全不同:
Aside from different default encodings (in Windows PowerShell), it is important to note that Set-Content
/ Add-Content
on the one hand and >
/ >>
/ Out-File [-Append]
on the other behave fundamentally differently with non-string input:
简而言之:前者对输入对象应用简单的.ToString()
格式,而后者执行与控制台相同的输出格式-请参见以获取详细信息.
In short: the former apply simple .ToString()
-formatting to the input objects, whereas the latter perform the same output formatting you would see in the console - see this answer for details.
这篇关于为什么PowerShell重定向>>更改文本内容的格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!