问题描述
如果我使用 wget
下载 page:
If I use wget
to download this page:
wget http://www.aqr.com/ResearchDetails.htm -O page.html
然后尝试在中查看页面
,少将文件报告为二进制文件。
and then attempt to view the page in less
, less reports the file as being a binary.
less page.html
"page.html" may be a binary file. See it anyway?
以下是回复标题:
Accept-Ranges:bytes
Cache-Control:private
Content-Encoding:gzip
Content-Length:8295
Content-Type:text/html
Cteonnt-Length:44064
Date:Sun, 25 Sep 2011 12:15:53 GMT
ETag:"c0859e4e785ecc1:6cd"
Last-Modified:Fri, 19 Aug 2011 14:00:09 GMT
Server:Microsoft-IIS/6.0
X-Powered-By:ASP.NET
在vim中打开文件工作正常。
Opening the file in vim works fine.
为什么少有人无法处理它的任何线索?
Any clues as to why less can not handle it?
推荐答案
这是一个UTF-16编码文件。 ()。您可以使用以下命令将其转换为UTF-8:
It's an UTF-16 encoded file. (Check with W3C Validator). You can convert it to UTF-8 with this command:
wget http://www.aqr.com/ResearchDetails.htm -q -O - | iconv -f utf-16 -t utf-8 > page.html
less
通常知道UTF- 8。
less
usally knows UTF-8.
编辑:
正如@Stephen C报道的那样,<$ c Red Hat中的$ c> less 支持UTF-16。在我看来,。在上,UTF-16支持目前是一个未解决的问题(参考编号) 282)。
As @Stephen C reported, less
in Red Hat supports UTF-16. It looks to me that Red Hat patched less for UTF-16 support. On the official site of the less UTF-16 support currently is an open issue (ref number 282).
这篇关于使用'wget'获取的HTML文件由'less'报告为二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!