问题描述
我想下载一些PDF文件自动地给出一个网址列表
下面是我的代码:
HttpWebRequest的要求=(HttpWebRequest的)WebRequest.Create(URL);
request.Method =GET;
变种编码=新UTF8Encoding();
request.Headers.Add(HttpRequestHeader.AcceptLanguage,EN-GB,连接; Q = 0.5);
request.Headers.Add(HttpRequestHeader.AcceptEncoding,gzip的,放气);
request.Accept =text / html的,是application / xhtml + xml的,应用/ XML; Q = 0.9 * / *; Q = 0.8;
request.UserAgent =Mozilla的/ 5.0(Windows NT的6.1; WOW64; RV:12.0)的Gecko / 20100101火狐/ 12.0;
HttpWebResponse RESP =(HttpWebResponse)request.GetResponse();
BinaryReader读者=新BinaryReader(resp.GetResponseStream());
的FileStream流=新的FileStream(输出/+与Date.toString(YYYY-MM-DD)+.PDF,FileMode.Create);
的BinaryWriter作家=新的BinaryWriter(流);
在
{
writer.Write(reader.Read())(reader.PeekChar()!= -1);
}
writer.Flush();
writer.Close();
所以,我知道的第一部分作品。原本我是得到它,并使用TextReader的阅读它 - 但是这给了我损坏的PDF文件(因为PDF文件是二进制文件)
现在如果我运行它,阅读器.PeekChar()始终是-1并没有任何反应 - 我得到一个空文件
在调试它,我注意到,reader.Read()实际上是给不同的号码当我调用它 - 所以也许皮克坏
所以,我想的东西很肮脏
试
{
,而(真)
{
writer.Write(reader.Read());
}
}
抓
{
}
writer.Flush();
writer.Close();
现在我越来越与它的一些垃圾一个非常小的文件,但它仍然不是我?在寻找
所以,任何人都可以点我在正确的方向
其他信息:
头并不表明其压缩或其他任何东西。
HTTP / 1.1 200 OK
内容类型:应用程序/ PDF
服务器:Microsoft-IIS / 7.5
的X技术,通过:ASP.NET
日期:星期五,8月10日2012 GMT 11时15分48秒
的Content-Length:109809
跳过 BinaryReader
和的BinaryWriter
,只输入流复制到输出的FileStream
。简单地说
VAR文件名=输出/+与Date.toString(YYYY-MM-DD)+.PDF ;使用
(VAR流= File.Create(文件名))
resp.GetResponseStream()CopyTo从(流)。
I'm trying to download a number of pdf files automagically given a list of urls.
Here's the code I have:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
var encoding = new UTF8Encoding();
request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-gb,en;q=0.5");
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0";
HttpWebResponse resp = (HttpWebResponse)request.GetResponse();
BinaryReader reader = new BinaryReader(resp.GetResponseStream());
FileStream stream = new FileStream("output/" + date.ToString("yyyy-MM-dd") + ".pdf",FileMode.Create);
BinaryWriter writer = new BinaryWriter(stream);
while (reader.PeekChar() != -1)
{
writer.Write(reader.Read());
}
writer.Flush();
writer.Close();
So, I know the first part works. I was originally getting it and reading it using a TextReader - but that gave me corrupted pdf files (since pdfs are binary files).
Right now if I run it, reader.PeekChar() is always -1 and nothing happens - I get an empty file.
While debugging it, I noticed that reader.Read() was actually giving different numbers when I was invoking it - so maybe Peek is broken.
So I tried something very dirty
try
{
while (true)
{
writer.Write(reader.Read());
}
}
catch
{
}
writer.Flush();
writer.Close();
Now I'm getting a very tiny file with some garbage in it, but its still not what I'm looking for.
So, anyone can point me in the right direction?
Additional Information:
The header doesn't suggest its compressed or anything else.
HTTP/1.1 200 OK
Content-Type: application/pdf
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 10 Aug 2012 11:15:48 GMT
Content-Length: 109809
Skip the BinaryReader
and BinaryWriter
and just copy the input stream to the output FileStream
. Briefly
var fileName = "output/" + date.ToString("yyyy-MM-dd") + ".pdf";
using (var stream = File.Create(fileName))
resp.GetResponseStream().CopyTo(stream);
这篇关于下载使用WebRequests PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!