问题描述
我有一个小的JavaScript应用程序,它将解析用户放入浏览器的文件。最近我发现了一些非英语字符的问题。此处删除的文件类型使用的是Windows-1252字符集,因此ñ
等字符实际上是以¡±
我必须将它们全部转换为正确的字符。
I've got a small JavaScript application that will parse files the user drops into the browser. Recently I've discovered an issue with some non-english characters. The file types that are dropped on here are using the Windows-1252 character set, so characters such as ñ
, are actually coming through as ñ
and I must convert them all to the proper characters.
例如,我得到Señ或
哪个应该是Señor
西班牙语。
For example, I get Señor
which should be Señor
in Spanish.
我发现,包含角色的集合,以及我需要转换为的对应物。
I've found an extremely useful website with the collection of the characters, and their counterparts that I need to convert to.
我已将其浓缩下来分成两个JavaScript数组:
I've condensed that down into two JavaScript arrays:
var toReplace = ["À", "Ã", "Â", "Ã", "Ä", "Ã…", "Æ", "Ç", "È", "É", "Ê", "Ë", "ÃŒ", "Ã", "ÃŽ", "Ã", "Ã", "Ñ", "Ã’", "Ã"", "Ã"", "Õ", "Ö", "×", "Ø", "Ù", "Ú", "Û", "Ãœ", "Ã", "Þ", "ß", "Ã", "á", "â", "ã", "ä", "Ã¥", "æ", "ç", "è", "é", "ê", "ë", "ì", "Ã", "î", "ï", "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ"];
var replaceWith = ["À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", "Ø", "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", "à", "á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ê", "ë", "ì", "í", "î", "ï", "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ"];
在中替换段落中所有字符的最有效方法是什么? toReplace
与中的对应物(相同的索引)替换
?
我希望这不会太沉重,因为将100多个文件放入此应用程序中已经做了一些沉重的循环并且这种情况并不少见。解析。
I'm hoping this won't be too loop-heavy since it's not uncommon to drop over 100 files into this application that already does some heavy looping & parsing.
也许有更好的方法来做这个而不是将这些字符保存在数组中?
Perhaps there is a better way to do this instead of keeping these characters in arrays?
EDIT - 我刚刚意识到我可能需要替换unicode eqivilent。以下是相同顺序的unicode字符数组:
EDIT - I just realized I might need to replace with the unicode eqivilent instead. Here's an array of the unicode characters in the same order:
var unicodeReplaceWith= ["\u00C0", "\u00C1", "\u00C2", "\u00C3", "\u00C4", "\u00C5", "\u00C6", "\u00C7", "\u00C8", "\u00C9", "\u00CA", "\u00CB", "\u00CC", "\u00CD", "\u00CE", "\u00CF", "\u00D0", "\u00D1", "\u00D2", "\u00D3", "\u00D4", "\u00D5", "\u00D6", "\u00D7", "\u00D8", "\u00D9", "\u00DA", "\u00DB", "\u00DC", "\u00DD", "\u00DE", "\u00DF", "\u00E0", "\u00E1", "\u00E2", "\u00E3", "\u00E4", "\u00E5", "\u00E6", "\u00E7", "\u00E8", "\u00E9", "\u00EA", "\u00EB", "\u00EC", "\u00ED", "\u00EE", "\u00EF", "\u00F0", "\u00F1", "\u00F2", "\u00F3", "\u00F4", "\u00F5", "\u00F6", "\u00F7", "\u00F8", "\u00F9", "\u00FA", "\u00FB", "\u00FC", "\u00FD", "\u00FE", "\u00FF"];
推荐答案
我对JavaScript的速度知之甚少,或者为什么无法在服务器上正确配置,但这是一种方法。
I don't know much about speed in JavaScript, or why this can't be configured correctly on the server, but here's one way to do it.
首先我们将所有内容都变成一个对象,这样我们就可以查找翻译。
First we turn everything into an object, so we can look up translations.
var map = {};
for (var i=0; i<toReplace.length; i++) {
map[toReplace[i]] = replaceWith[i];
}
然后我们将我们的密钥加入正则表达式
Then we join our keys into a regular expression
var expression = new RegExp(toReplace.join("|"), "g");
在替换功能中,我们可以替换匹配结果。这很简单,只需在我们的地图中查找
。
In the replace function, we can subsitute matches for results. This is as simple as looking them up in our map
.
function doReplace(source) {
return source.replace(expression, function(m) {
return map[m];
});
}
var result = doReplace("Señor");
这篇关于JavaScript中的巨大字符串替换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!