使用jsoup查询搜索窗口小部件

使用jsoup查询搜索窗口小部件

本文介绍了使用jsoup查询搜索窗口小部件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想查询以下网站,并将所有结果保存到一个csv文件中:

I want to query the below site and get all the result in to a csv file:

我已经有一个用于此的程序(它是由以前的程序员编写的,由于我是jsoup和网络爬网的初学者,因此我试图理解代码),但是现在该网站已更新,查询不再起作用.我想我需要更新URL.以下是我当前使用的网址字符串:

I already have a program for this(which was written by the previous programmer and I am trying to understand the code as I am a beginner in jsoup and web crawling) , but now the site is updated and the query no longer works. I think I need to update the URL. Below is the url string I am currently using:

private final static String URL = "http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget?"
        + "client=default"
        + "&proxystylesheet=default"
        + "&output=xml_no_dtd"
        + "&Process=continue"
        + "&FLAT_TYPE=%s"
        + "&NME_NEWTOWN=%s"
        + "&NME_STREET="
        + "&NUM_BLK_FROM="
        + "&NUM_BLK_TO="
        + "&AMT_RESALE_PRICE_FROM="
        + "&AMT_RESALE_PRICE_TO="
        + "&DTE_APPROVAL_FROM=%s"
        + "&DTE_APPROVAL_TO=%s";

我这样连接:

Document doc = Jsoup.connect(url).get();

我想更新它以使用新的URL.我检查了页面源,但找不到它.谁能帮我找到我需要在此处传递的网址吗?

I want to update it to use the new URL. I checked in the page source, but could not find it. Can anybody please help me to find the URL f the that I need to pass here.

推荐答案

要弄清楚网站的工作方式,您可以打开 Firebug Chrome开发者工具并检查网络流量.在那里,您可以检查通过网络发送的内容(数据, GET POST ,Cookie等).

To figure out the way a site works you can open Firebug or Chrome Developer Tools and inspect the network traffic. There you can inspect what is sent over the wire (data, GET or POST, cookies, ...).

对于该站点,您将需要发布数据,但同时还需要设置一些cookie,否则该站点将不接受您的 POST 请求.您只需先发送 GET 请求并阅读cookie,即可完成此操作:

For this site you will need to post the data, but you will also need to have a couple of cookies set, else the site won't accept your POST request. You can do this by simply sending a GET request first and read the cookies:

Response res = Jsoup
    .connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
    .timeout(10000) // edit: set timeout to 10 seconds
    .method(GET)
    .execute();

Map<String,String> cookies = res.cookies();

现在,您可以使用cookies发送的 POST 请求:

Now you can send your POST request using the cookies:

Document doc = Jsoup
   .connect("http://services2.hdb.gov.sg/webapp/BB33RTIS/BB33SSearchWidget")
   .timeout(10000) // edit: set timeout to 10 seconds
   .data("FLAT_TYPE", "02")
   .data("NME_NEWTOWN", "BD      Bedok")
   .data("NME_STREET", "")
   .data("NUM_BLK_FROM", "")
   .data("NUM_BLK_TO", "")
   .data("dteRange", "12")
   .data("DTE_APPROVAL_FROM", "Apr 2015")
   .data("DTE_APPROVAL_TO", "Apr 2016")
   .data("AMT_RESALE_PRICE_FROM", "")
   .data("AMT_RESALE_PRICE_TO", "")
   .data("Process", "continue")
   .cookies(cookies)
   .post();

并使用doc抓取搜索结果.

注意:发送带有 URL编码数据的 GET 请求对我不起作用

Note: sending a GET request with the URL-encoded data didn't work for me

这篇关于使用jsoup查询搜索窗口小部件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:09