问题描述
我目前正在开发一个小型 PowerShell 脚本,该脚本应该将带有翻译的 CSV 文件拆分为相应语言的单个文件.为此,我使用 Import-Csv
cmdlet 导入源文件,其格式如下:
I am currently working on a small PowerShell script that is supposed to split a CSV file with translations into individual files for the respective language. For this I use the Import-Csv
cmdlet to import the source file, which has about this format:
ResourceId;GermanTranslation;EnglishTranslation;Check
0;Deutscher Text;English text;OK
1; mit Leerzeichen ; with spaces ;OK
目标是以 ResourceId|EnglishTranslation|
格式获得翻译的逐行表示.为此,我构建了以下脚本:
The goal is to get a line-by-line representation of the translations in the format ResourceId|EnglishTranslation|
. For this I have built the following script:
Set-Variable SOURCE_FILE -Option Constant -Value ".\sourceFile.csv"
Set-Variable RESULT_FILE -Option Constant -Value ".\resultFile.csv"
foreach ($row in (Import-Csv -Path $SOURCE_FILE -Delimiter ";")) {
Add-Content -Path $RESULT_FILE -Value ($row.RessourceId + "|" + $row.EnglishTranslation + "|")
}
基本上,一切都按预期工作,但是当我检查结果时,我注意到结果中不再存在某些元素开头的空格:
Basically, everything works as desired, but when I examined the results, I noticed that the spaces with which some elements begin were no longer present in the result:
0|English text|
1|with spaces |
很遗憾我在 解决这个问题的 MS 文档,所以我一开始并不确定.之后,我查看了RFC 4180,它描述了CSV文件格式或多或少准确.它指出空格应该被视为字段的一部分而不是被忽略.应该意味着不是必须,所以很可能真的没有选择.
Unfortunately I didn't find a parameter in the MS documentation that addresses this problem, so I was unsure at first.After that, I took a look at RFC 4180, which describes the CSV file format more or less exactly. It states that spaces should be considered as part of a field and not ignored. Should means not must, so it may well be that there really is no option for this.
是否有可能保留空格而不必自己解析整个文件?
Is there a possibility to preserve the spaces without me having to parse the whole file on my own?
推荐答案
这是一个比替换 CSV 文件中的字符更可靠(并且可能更快)的解决方案.
Here is a solution that should be more robust (and propably faster) than replacing characters from the CSV file.
它使用 .NET TextFieldParser 类来自 Microsoft.VisualBasic 程序集.它有一个 TrimWhiteSpace
属性,当设置为 $false
时,会保留每个字段的任何前导和尾随空格,即使该字段没有用双引号括起来.
It uses the .NET TextFieldParser class from Microsoft.VisualBasic assembly. It has a TrimWhiteSpace
attribute which, when set to $false
, preserves any leading and trailing whitespace of each field, even if the field is not enclosed in double quotation marks.
我已将 .NET 代码封装在名为 Import-CustomCsv
的函数中.它通过各种参数支持 TextFieldParser
的一些附加选项.
I've encapsulated the .NET code in a function named Import-CustomCsv
. It supports some additional options of TextFieldParser
through various parameters.
Function Import-CustomCsv {
[CmdletBinding()]
param (
[Parameter(Mandatory, ValueFromPipeline)] [String] $Path,
[String[]] $Delimiter = ',',
[String[]] $CommentTokens = '#',
[switch] $TrimWhiteSpace,
[switch] $HasFieldsEnclosedInQuotes
)
# Load Visual Basic assembly if necessary. Seems to be required only for PS 5.
if (-not ([System.Management.Automation.PSTypeName]'Microsoft.VisualBasic.FileIO.TextFieldParser').Type) {
Add-Type -AssemblyName 'Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
}
# Create a CSV parser
$csvParser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $Path
try {
# Set CSV parser options
$csvParser.SetDelimiters( $Delimiter )
$csvParser.CommentTokens = $CommentTokens
$csvParser.TrimWhiteSpace = $TrimWhiteSpace
$csvParser.HasFieldsEnclosedInQuotes = $HasFieldsEnclosedInQuotes
# Read the header
$header = $csvParser.ReadFields()
while( -not $csvParser.EndOfData ) {
# Read current line fields, pointer moves to the next line.
$fields = $csvParser.ReadFields()
# Associate each field with its name from the header by storing it in an
# ordered hashtable.
$namedFields = [ordered]@{}
for( $i = 0; $i -lt $header.Count; $i++ ) {
$namedFields[ $header[ $i ] ] = $fields[ $i ]
}
# Finally convert fields to PSCustomObject and output (implicitly)
[PSCustomObject] $namedFields
}
}
finally {
# Make sure to close the file even in case of an exception.
$csvParser.Close()
}
}
用法示例:解析 CSV,保留空格:
Import-CustomCsv test.csv -Delimiter ';'
输出:
ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0 Deutscher Text English text OK
1 mit Leerzeichen with spaces OK
用法示例:解析CSV,修剪空格(如Import-Csv
):
Usage example: Parse CSV, trimming whitespace (like Import-Csv
):
Import-CustomCsv test.csv -Delimiter ';' -TrimWhiteSpace
输出:
ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0 Deutscher Text English text OK
1 mit Leerzeichen with spaces OK
注意:
以上两个示例在输出中保留了包含字段的双引号(如果有).要删除字段周围的双引号,请传递参数 -HasFieldsEnclosedInQuotes
.
The above two samples keep field-enclosing double-quotation marks (if any) in the output. To remove double-quotation marks around fields, pass parameter -HasFieldsEnclosedInQuotes
.
这篇关于如何在使用 powershell 脚本导入 CSV 后保留空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!