问题描述
我们收到了一些文件,这些文件已被并置另一方。在这些文件中的中间是一些 BOM
字符。
We receive some files, which have been concatenated by another party. In the middle of these files are some BOM
characters.
有没有一种方法能检测出这些3字符并删除它们?我看过很多有关如何删除 BOM
从文件的-start- ......但没有中间的例子。
Is there a way we can detect these 3 chars and remove them? I've seen plenty of examples about how to remove the BOM
from the -start- of a file ... but not the middle.
推荐答案
假设你的文件足够小,以保留在内存中,并且你有一个 Enumerable.Replace
更换子序列扩展方法,那么你可以使用:
Assuming that your file is small enough to hold in memory, and that you have an Enumerable.Replace
extension method for replacing subsequences, then you could use:
var bytes = File.ReadAllBytes(filePath);
var bom = new byte[] { 0xEF, 0xBB, 0xBF };
var empty = Enumerable.Empty<byte>();
bytes = bytes.Replace(bom, empty).ToArray();
File.WriteAllBytes(filePath, bytes);
下面是一个简单的(低效率)实施替换
扩展方法:
Here is a simple (inefficient) implementation of the Replace
extension method:
public static IEnumerable<TSource> Replace<TSource>(
this IEnumerable<TSource> source,
IEnumerable<TSource> match,
IEnumerable<TSource> replacement)
{
return Replace(source, match, replacement, EqualityComparer<TSource>.Default);
}
public static IEnumerable<TSource> Replace<TSource>(
this IEnumerable<TSource> source,
IEnumerable<TSource> match,
IEnumerable<TSource> replacement,
IEqualityComparer<TSource> comparer)
{
int sLength = source.Count();
int mLength = match.Count();
if (sLength < mLength || mLength == 0)
return source;
int[] matchIndexes = (
from sIndex in Enumerable.Range(0, sLength - mLength + 1)
where source.Skip(sIndex).Take(mLength).SequenceEqual(match, comparer)
select sIndex
).ToArray();
var result = new List<TSource>();
int sPosition = 0;
foreach (int mPosition in matchIndexes)
{
var sPart = source.Skip(sPosition).Take(mPosition - sPosition);
result.AddRange(sPart);
result.AddRange(replacement);
sPosition = mPosition + mLength;
}
var sLastPart = source.Skip(sPosition).Take(sLength - sPosition);
result.AddRange(sLastPart);
return result;
}
这篇关于如何删除存在-within-一些文本,而不是在一些文本的开始任何UTF-8 BOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!