我正在尝试使用httpclient以编程方式将http post请求发送到http://ojp.nationalrail.co.uk/en/s/planjourney/query,但它不喜欢我发送的请求。我从chrome浏览器发送的内容中复制了头部和正文,因此它是相同的,但它不喜欢我发送的内容,因为html提到有一个错误。
<div class="padding">
<h1 class="sifr"><strong>Sorry</strong>, something went wrong</h1>
<div class="error-message">
<div class="error-message-padding">
<h2>There is a problem with the page you are trying to access.</h2>
<p>It is possible that it was either moved, it doesn't exist or we are experiencing some technical difficulties.</p>
<p>We are sorry for the inconvenience.</p>
</div>
</div>
</div>
下面是我使用httpclient的java程序:
package com.tixsnif;
import org.apache.http.*;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import java.io.*;
import java.util.*;
import java.util.zip.GZIPInputStream;
public class WebScrapingTesting {
public static void main(String[] args) throws Exception {
String target = "http://ojp.nationalrail.co.uk/en/s/planjourney/query";
HttpClient client = new DefaultHttpClient();
HttpPost httpPost = new HttpPost(target);
BasicNameValuePair[] params = {
new BasicNameValuePair("jpState", "single"),
new BasicNameValuePair("commandName", "journeyPlannerCommand"),
new BasicNameValuePair("from.searchTerm", "Basingstoke"),
new BasicNameValuePair("to.searchTerm", "Reading"),
new BasicNameValuePair("timeOfOutwardJourney.arrivalOrDeparture", "DEPART"),
new BasicNameValuePair("timeOfOutwardJourney.monthDay", "Today"),
new BasicNameValuePair("timeOfOutwardJourney.hour", "10"),
new BasicNameValuePair("timeOfOutwardJourney.minute", "15"),
new BasicNameValuePair("timeOfReturnJourney.arrivalOrDeparture", "DEPART"),
new BasicNameValuePair("timeOfReturnJourney.monthDay", "Today"),
new BasicNameValuePair("timeOfReturnJourney.hour", "18"),
new BasicNameValuePair("timeOfReturnJourney.minute", "15"),
new BasicNameValuePair("_includeOvertakenTrains", "on"),
new BasicNameValuePair("viaMode", "VIA"),
new BasicNameValuePair("via.searchTerm", "Station name / code"),
new BasicNameValuePair("offSetOption", "0"),
new BasicNameValuePair("_reduceTransfers", "on"),
new BasicNameValuePair("operatorMode", "SHOW"),
new BasicNameValuePair("operator.code", ""),
new BasicNameValuePair("_lookForSleeper", "on"),
new BasicNameValuePair("_directTrains", "on")};
httpPost.setHeader("Host", "ojp.nationalrail.co.uk");
httpPost.setHeader("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.231 Safari/534.10");
httpPost.setHeader("Accept-Encoding", "gzip,deflate,sdch");
httpPost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,**/*//*;q=0.8");
httpPost.setHeader("Accept-Language", "en-us,en;q=0.8");
httpPost.setHeader("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
httpPost.setHeader("Origin", "http://www.nationalrail.co.uk/");
httpPost.setHeader("Referer", "http://www.nationalrail.co.uk/");
httpPost.setHeader("Content-Type", "application/x-www-form-urlencoded");
httpPost.setHeader("Cookie", "JSESSIONID=B2A3419B79C5D999CA4806B459675CCD.app201; Path=/");
UrlEncodedFormEntity urlEncodedFormEntity = new UrlEncodedFormEntity(Arrays.asList(params));
urlEncodedFormEntity.setContentEncoding(HTTP.UTF_8);
httpPost.setEntity(urlEncodedFormEntity);
HttpResponse response = client.execute(httpPost);
InputStream input = response.getEntity().getContent();
GZIPInputStream gzip = new GZIPInputStream(input);
InputStreamReader isr = new InputStreamReader(gzip);
BufferedReader br = new BufferedReader(isr);
String line = null;
while((line = br.readLine()) != null) {
System.out.printf("\n%s", line);
}
client.getConnectionManager().shutdown();
}
}
如果jsession id过期,我会保持更新,但似乎还有另一个问题我看不到。我是不是漏掉了一些很明显的东西?
他
最佳答案
访问上面的链接并查看html源代码,看起来目标路径应该是/en/s/planjourney/plan
。