问题描述
我正在尝试将iconv指向一个目录,所有文件将被转换为UTF-8,而不考虑当前的编码。
I am trying to point iconv to a directory and all files will be converted UTF-8 regardless of the current encoding
我正在使用此脚本,但必须指定要进行哪些编码。如何让它自动检测当前的编码?
I am using this script but you have to specify what encoding you are going FROM. How can I make it autdetect the current encoding?
dir_iconv.sh
dir_iconv.sh
#!/bin/bash
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
then
echo "$0 dir from_charset to_charset"
exit
fi
for f in $1/*
do
if test -f $f
then
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
else
echo -e "\nSkipping $f - not a regular file";
fi
done
终端线
sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8
推荐答案
也许您正在寻找:
Maybe you are looking for enca
:
目前,它支持白俄罗斯,保加利亚,克罗地亚,捷克,爱沙尼亚,匈牙利,拉脱维亚,立陶宛,波兰,俄罗斯,斯洛伐克,斯洛文尼亚,乌克兰,中文和一些多语言编码,独立于语言。
Currently it supports Belarusian, Bulgarian, Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian, Chinese, and some multibyte encodings independently on language.
注意,一般而言,当前编码的自动检测是一个困难的过程(相同的字节序列可以是多个编码中的正确文本)。 enca
根据你所说的语言检测(限制编码的数量)使用启发式算法。您可以使用 enconv
将转换为单个编码。
Note that in general, autodetection of current encoding is a difficult process (the same byte sequence can be correct text in multiple encodings). enca
uses heuristics based on the language you tell it to detect (to limit the number of encodings). You can use enconv
to convert text files to a single encoding.
这篇关于iconv对UTF-8的任何编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!