java - JSOUP在IMDB 403错误

我需要解析imdb页面才能显示结果。我正在为此目的使用Jsoup。下面是我为此目的编写的代码。运行代码时，我看到403错误。我重新验证了网址，该网址似乎正确。

import java.io.IOException;
import java.net.URLEncoder;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class ParseIMDB {

    public static void parse() throws IOException{
        Document doc = Jsoup.connect("http://imdb.com/search/title?count=100&genres=action&languages=en&release_date=2010,2016&title_type=feature").get();
        Elements newsHeadlines = doc.select("#main > table.results tbody");
    }

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        try {
        parse();
        } catch (Exception e){
            System.out.println("Exception found!");
            e.printStackTrace();
        }
    }
}

我尝试使用URLEncode.encode对url进行编码，但它既有帮助也有帮助。

上面的代码的堆栈跟踪如下：

  发现异常！ org.jsoup.HttpStatusException：HTTP错误获取
  网址。状态= 403，URL = http://www.imdb.com/search/title/，位于
  org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:537）
    在
  org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:534）
    在
  org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:493）
    在org.jsoup.helper.HttpConnection.execute（HttpConnection.java:205）
    在org.jsoup.helper.HttpConnection.get（HttpConnection.java:194）处
  ParseIMDB.parse（ParseIMDB.java:13）在
  ParseIMDB.main（ParseIMDB.java:20）

最佳答案

我相信，如果您在请求中添加User-Agent标头，它将起作用。您可以这样做：

 Document doc = Jsoup.connect("http://imdb.com/search/title?count=100&genres=action&languages=en&release_date=2010,2016&title_type=feature")
                .userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36")
                .get();

该解决方案已经过测试，可以正常工作，并返回电影列表。