iconv对UTF-8的任何编码

本文介绍了iconv对UTF-8的任何编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将iconv指向一个目录，所有文件将被转换为UTF-8，而不考虑当前的编码。

I am trying to point iconv to a directory and all files will be converted UTF-8 regardless of the current encoding

我正在使用此脚本，但必须指定要进行哪些编码。如何让它自动检测当前的编码？

I am using this script but you have to specify what encoding you are going FROM. How can I make it autdetect the current encoding?

dir_iconv.sh

#!/bin/bash

ICONVBIN='/usr/bin/iconv' # path to iconv binary

if [ $# -lt 3 ]
then
    echo "$0 dir from_charset to_charset"
    exit
fi

for f in $1/*
do
    if test -f $f
    then
        echo -e "\nConverting $f"
        /bin/mv $f $f.old
        $ICONVBIN -f $2 -t $3 $f.old > $f
    else
        echo -e "\nSkipping $f - not a regular file";
    fi
done

终端线

sudo convert/dir_iconv.sh convert/books CURRENT_ENCODING utf8

推荐答案

也许您正在寻找：

Maybe you are looking for enca:

目前，它支持白俄罗斯，保加利亚，克罗地亚，捷克，爱沙尼亚，匈牙利，拉脱维亚，立陶宛，波兰，俄罗斯，斯洛伐克，斯洛文尼亚，乌克兰，中文和一些多语言编码，独立于语言。

Currently it supports Belarusian, Bulgarian, Croatian, Czech, Estonian, Hungarian, Latvian, Lithuanian, Polish, Russian, Slovak, Slovene, Ukrainian, Chinese, and some multibyte encodings independently on language.

注意，一般而言，当前编码的自动检测是一个困难的过程（相同的字节序列可以是多个编码中的正确文本）。 enca 根据你所说的语言检测（限制编码的数量）使用启发式算法。您可以使用 enconv 将转换为单个编码。

Note that in general, autodetection of current encoding is a difficult process (the same byte sequence can be correct text in multiple encodings). enca uses heuristics based on the language you tell it to detect (to limit the number of encodings). You can use enconv to convert text files to a single encoding.

这篇关于iconv对UTF-8的任何编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！