问题描述
我有一个用于下载PDF文件的代码.现在,当我执行下一个任务时遇到了一个问题,但是最后一个文件的下载尚未完成.执行完当前代码后,最后一个文件约为650 Mb,应为1300 Mb.此外,由于无法完全下载,因此无法将其打开,这就是为什么损坏了.
I have a code for downloading PDF files. Now I have run into a problem when I am executing next task but download of last file is not yet finished. After execution of my current code last file is something like 650 Mb and it should be 1300 Mb. Also it is not possible to open it as it is not fully downloaded and that's why broken.
如何确保下载文件?
HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com/");
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
foreach (HtmlNode src in ProductListPage)
{
htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
if (LinkTester != null)
{
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.Attributes["href"].Value;
Console.WriteLine(LinkURL);
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
var DLClient = new WebClient();
// Thread.Sleep(5000); // wait some time
DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
}
}
}
我的下一个过程是重命名下载的文件:
My next process is to rename downloaded files:
var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
// string prefix = "SomePrefix";
foreach (var file in files)
{
string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
File.Move(file, newFileName);
}
重命名可以顺利进行,直到最后一个文件没有完全下载,这就是我遇到错误的地方.
Renaming goes smooth until last file that is not completely downloaded and that's where I am getting an error.
我添加了 Thread.Sleep(5000);//在这两者之间等待一段时间
,但这可能不是最好的解决方案,因为当前的等待时间还不够,而且可以根据互联网的连接而改变?
I have added Thread.Sleep(5000); // wait some time
between these two, but that's maybe not the best solution as current waiting time is not enough and it can change according to internet connection?
这是完整的代码:
using System;
using System.Net;
using HtmlAgilityPack;
using System.IO;
using System.Threading;
namespace Crawler
{
class Program
{
static void Main(string[] args)
{
{
HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com");
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
foreach (HtmlNode src in ProductListPage)
{
htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
if (LinkTester != null)
{
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.Attributes["href"].Value;
Console.WriteLine(LinkURL);
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
var DLClient = new WebClient();
// Thread.Sleep(5000); // wait some time
DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
}
}
}
}
Thread.Sleep(5000); // wait some time
var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
// string prefix = "SomePrefix";
foreach (var file in files)
{
string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
File.Move(file, newFileName);
}
}
}
}
推荐答案
您当然不希望使用 WebClient.DownloadFileAsync
,但要使用其较新的继任者 WebClient.DownloadFileTaskAsync
.这样使用:
You most certainly do not want to use WebClient.DownloadFileAsync
but its newer successor WebClient.DownloadFileTaskAsync
. This would be used like this:
await DLClient.DownloadFileTaskAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
这是一个 async
进程,因此您的调用方法也必须是 async
.通过 await
(等待)来确保您的程序仅在下载完成(或失败)之后才继续.
This is an async
process, so your calling method will need to be async
as well. By await
ing it, you make sure that your program continues only after the download is complete (or has failed).
这篇关于等到最后一个文件下载完毕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!