如何在使用 powershell 脚本导入 CSV 后保留空格?

本文介绍了如何在使用 powershell 脚本导入 CSV 后保留空格?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在开发一个小型 PowerShell 脚本，该脚本应该将带有翻译的 CSV 文件拆分为相应语言的单个文件.为此，我使用 Import-Csv cmdlet 导入源文件，其格式如下:

I am currently working on a small PowerShell script that is supposed to split a CSV file with translations into individual files for the respective language. For this I use the Import-Csv cmdlet to import the source file, which has about this format:

ResourceId;GermanTranslation;EnglishTranslation;Check
0;Deutscher Text;English text;OK
1;  mit Leerzeichen  ;  with spaces  ;OK

目标是以 ResourceId|EnglishTranslation| 格式获得翻译的逐行表示.为此，我构建了以下脚本:

The goal is to get a line-by-line representation of the translations in the format ResourceId|EnglishTranslation|. For this I have built the following script:

Set-Variable SOURCE_FILE -Option Constant -Value ".\sourceFile.csv"
Set-Variable RESULT_FILE -Option Constant -Value ".\resultFile.csv"

foreach ($row in (Import-Csv -Path $SOURCE_FILE -Delimiter ";")) {
    Add-Content -Path $RESULT_FILE -Value ($row.RessourceId + "|" + $row.EnglishTranslation + "|")
}

基本上，一切都按预期工作，但是当我检查结果时，我注意到结果中不再存在某些元素开头的空格:

Basically, everything works as desired, but when I examined the results, I noticed that the spaces with which some elements begin were no longer present in the result:

0|English text|
1|with spaces  |

很遗憾我在解决这个问题的 MS 文档，所以我一开始并不确定.之后，我查看了RFC 4180，它描述了CSV文件格式或多或少准确.它指出空格应该被视为字段的一部分而不是被忽略.应该意味着不是必须，所以很可能真的没有选择.

Unfortunately I didn't find a parameter in the MS documentation that addresses this problem, so I was unsure at first.After that, I took a look at RFC 4180, which describes the CSV file format more or less exactly. It states that spaces should be considered as part of a field and not ignored. Should means not must, so it may well be that there really is no option for this.

是否有可能保留空格而不必自己解析整个文件?

Is there a possibility to preserve the spaces without me having to parse the whole file on my own?

推荐答案

这是一个比替换 CSV 文件中的字符更可靠(并且可能更快)的解决方案.

Here is a solution that should be more robust (and propably faster) than replacing characters from the CSV file.

它使用 .NET TextFieldParser 类来自 Microsoft.VisualBasic 程序集.它有一个 TrimWhiteSpace 属性，当设置为 $false 时，会保留每个字段的任何前导和尾随空格，即使该字段没有用双引号括起来.

It uses the .NET TextFieldParser class from Microsoft.VisualBasic assembly. It has a TrimWhiteSpace attribute which, when set to $false, preserves any leading and trailing whitespace of each field, even if the field is not enclosed in double quotation marks.

我已将 .NET 代码封装在名为 Import-CustomCsv 的函数中.它通过各种参数支持 TextFieldParser 的一些附加选项.

I've encapsulated the .NET code in a function named Import-CustomCsv. It supports some additional options of TextFieldParser through various parameters.

Function Import-CustomCsv {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory, ValueFromPipeline)] [String] $Path,
      [String[]] $Delimiter = ',',
      [String[]] $CommentTokens = '#',
      [switch] $TrimWhiteSpace,
      [switch] $HasFieldsEnclosedInQuotes
   )

   # Load Visual Basic assembly if necessary. Seems to be required only for PS 5.
   if (-not ([System.Management.Automation.PSTypeName]'Microsoft.VisualBasic.FileIO.TextFieldParser').Type) {
      Add-Type -AssemblyName 'Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
   }

   # Create a CSV parser
   $csvParser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $Path

   try {
      # Set CSV parser options
      $csvParser.SetDelimiters( $Delimiter )
      $csvParser.CommentTokens = $CommentTokens
      $csvParser.TrimWhiteSpace = $TrimWhiteSpace
      $csvParser.HasFieldsEnclosedInQuotes = $HasFieldsEnclosedInQuotes

      # Read the header
      $header = $csvParser.ReadFields()

      while( -not $csvParser.EndOfData ) {
         # Read current line fields, pointer moves to the next line.
         $fields = $csvParser.ReadFields()

         # Associate each field with its name from the header by storing it in an
         # ordered hashtable.
         $namedFields = [ordered]@{}
         for( $i = 0; $i -lt $header.Count; $i++ ) {
            $namedFields[ $header[ $i ] ] = $fields[ $i ]
         }

         # Finally convert fields to PSCustomObject and output (implicitly)
         [PSCustomObject] $namedFields
      }
   }
   finally {
      # Make sure to close the file even in case of an exception.
      $csvParser.Close()
   }
}

用法示例:解析 CSV，保留空格:

Import-CustomCsv test.csv -Delimiter ';'

输出:

ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

用法示例:解析CSV，修剪空格(如Import-Csv):

Usage example: Parse CSV, trimming whitespace (like Import-Csv):

Import-CustomCsv test.csv -Delimiter ';' -TrimWhiteSpace

输出:

ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0          Deutscher Text    English text       OK
1          mit Leerzeichen   with spaces        OK

注意:

以上两个示例在输出中保留了包含字段的双引号(如果有).要删除字段周围的双引号，请传递参数 -HasFieldsEnclosedInQuotes.

The above two samples keep field-enclosing double-quotation marks (if any) in the output. To remove double-quotation marks around fields, pass parameter -HasFieldsEnclosedInQuotes.

这篇关于如何在使用 powershell 脚本导入 CSV 后保留空格?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！