问题描述
根据该线程的建议,我使用 powershell 进行了 UTF-8 转换,现在我遇到了另一个问题,我有一个大约 18 GB 的非常大的文件,我正在尝试在一台机器上转换它大约有 50GB 的可用内存,但是这个转换过程消耗了所有内存并且编码失败,有没有办法限制内存使用或分块进行转换?
Based on the suggestion from this thread, i have used powershell to do the UTF-8 conversion, now i am running into another problem, i have a very huge file around 18 gb which i am trying to convert on a machine with around 50GB RAM free, but this conversion process eats up all the ram and encoding fails, is there a way to limit the RAM usage or to do the conversion in chunks?
使用 PowerShell 编写一个没有 BOM 的 UTF-8 文件
顺便说一句,下面是确切的代码
BTW below is exact code
foreach ($file in ls -name $Path\CM*.csv)
{
$file_content = Get-Content "$Path\$file";
[System.IO.File]::WriteAllLines("$Path\$file", $file_content);
echo "encoding done : $file"
}
推荐答案
您可以使用 StreamReader 和 StreamWriter 进行转换.
You can use a StreamReader and StreamWriter to do the conversion.
StreamWriter 默认输出 UTF8NoBOM.
The StreamWriter by default outputs UTF8NoBOM.
这将需要大量磁盘操作,但会占用内存.
This will take a lot of disk actions, but will be lean on memory.
请记住,.Net 需要完整的绝对路径.
Bear in mind that .Net needs full absolute paths.
$sourceFile = 'D:\Test\Blah.txt' # enter your own in- and output files here
$destinationFile = 'D:\Test\out.txt'
$reader = [System.IO.StreamReader]::new($sourceFile, [System.Text.Encoding]::UTF8)
$writer = [System.IO.StreamWriter]::new($destinationFile)
while ($null -ne ($line = $reader.ReadLine())) {
$writer.WriteLine($line)
}
# clean up
$writer.Flush()
$reader.Dispose()
$writer.Dispose()
上面的代码将在输出文件中添加一个最后的换行符.如果这是不需要的,请改为执行此操作:
The above code will add a final newline to the output file. If that is unwanted, do this instead:
$sourceFile = 'D:\Test\Blah.txt'
$destinationFile = 'D:\Test\out.txt'
$reader = [System.IO.StreamReader]::new($sourceFile, [System.Text.Encoding]::UTF8)
$writer = [System.IO.StreamWriter]::new($destinationFile)
while ($null -ne ($line = $reader.ReadLine())) {
if ($reader.EndOfStream) {
$writer.Write($line)
}
else {
$writer.WriteLine($line)
}
}
# clean up
$writer.Flush()
$reader.Dispose()
$writer.Dispose()
这篇关于大文件的 UTF-8 BOM 到 UTF-8 转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!