问题描述
使用:Delphi 2010,最新版本的Indy
Using: Delphi 2010, latest version of Indy
我正在尝试从Googles Adsense网页上删除数据,目的是获取报告。但是迄今为止我还没有成功。它在第一个请求后停止,并且不会继续。
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
使用Fiddler调试Google Adsense网站的流量/请求,以及一个Web浏览器来加载Adsense页面,I可以看到(从webbrowser)的请求生成一些重定向,直到页面被加载。
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
但是,我的Delphi应用程序只生成了几个请求
However, my Delphi application is only generating a couple of requests before it stops.
以下是我遵循的步骤:
- 删除IdHTTP和一个IdSSLIOHandlerSocketOpenSSL1组件。
- 将IdHTTP组件属性AllowCookies和HandleRedirects设置为True,将IOHandler属性设置为IdSSLIOHandlerSocketOpenSSL1。
- 设置IdSSLIOHandlerSocketOpenSSL1组件属性方法:='sslvSSLv23'
最后我有这个代码:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
但是,它没有达到预期的登录页面。我希望它的行为就像是一个webbrowser,并通过重定向进行,直到找到最后一页。
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
这是从Fiddler的标题的输出: / p>
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found
Location: https://encrypted.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com
Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly
Date: Tue, 28 Dec 2010 21:29:43 GMT
Server: gws
Content-Length: 226
X-XSS-Protection: 1; mode=block
首先,这个输出有什么问题吗?
Firstly, is there anything wrong with this output?
有什么更多的东西,我应该做的是让IdHTTP组件继续追求重定向到最后一页?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?
推荐答案
发出呼叫前的IdHTTP组件属性值:
IdHTTP component property values prior to making the call:
Name := 'IdHTTP1';
IOHandler := IdSSLIOHandlerSocketOpenSSL1;
AllowCookies := True;
HandleRedirects := True;
RedirectMaximum := 35;
Request.UserAgent :=
'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
'0b8';
HTTPOptions := [hoForceEncodeParams];
OnRedirect := IdHTTP1Redirect;
CookieManager := IdCookieManager1;
重定向事件处理程序:
procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
Handled := True;
end;
拨打电话:
FURL := 'https://www.google.com';
GetUrlToFile( (FURL + '/adsense/'), 'a.html');
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
try
IdHTTP1.Get(AURL, Output);
IdHTTP1.Disconnect;
except
end;
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
以下是来自Fiddler的(请求和响应标头)输出:
Here's the (request and response headers) output from Fiddler:
这篇关于Indy - IdHttp如何处理页面重定向?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!