java - 使用JSoup遇到错误。为什么？

我正在尝试从Fantasyfootball网站登录并提取数据。

我收到以下错误，

  2015年7月24日8:01:12 PM StatsCollector main
  严重：null
  org.jsoup.HttpStatusException：提取URL时发生HTTP错误。状态= 403，URL = http://fantasy.premierleague.com/
      在org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:537）
      在org.jsoup.helper.HttpConnection $ Response.execute（HttpConnection.java:493）
      在org.jsoup.helper.HttpConnection.execute（HttpConnection.java:205）
      在StatsCollector.main（StatsCollector.java:26）

每当我尝试此代码。我要去哪里错了？

    public class StatsCollector {

    public static void main (String [] args){

        try {
            String url = "http://fantasy.premierleague.com/";
            Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();

            Response res= Jsoup
                    .connect(url)
                    .data("ismEmail", "[email protected]", "id_password", "examplepassword")
                    .method(Method.POST)
                    .execute();


            Map<String, String> loginCookies = res.cookies();

            Document doc = Jsoup.connect("http://fantasy.premierleague.com/transfers")
                    .cookies(loginCookies)
                    .get();

            String title = doc.title();
            System.out.println(title);
        }

        catch (IOException ex) {
            Logger.getLogger(StatsCollector.class.getName()).log(Level.SEVERE,null,ex);
        }
    }

}

最佳答案

Response res= Jsoup
                .connect(url)
                .data("ismEmail", "[email protected]", "id_password", "examplepassword")
                .method(Method.POST)
                .execute();

您是否要执行此实际代码？这似乎是带有占位符而不是登录凭据的示例代码。这将解释您收到的错误HTTP 403。

编辑1

我的错。我查看了该站点上的登录表单，在我看来，您将输入元素的id（“ ismEmail”和“ id_password”）与随表单（“ email “，” password“）。这对您有用吗？

Response res= Jsoup
                .connect(url)
                .data("email", "[email protected]", "password", "examplepassword")
                .method(Method.POST)
                .execute();

编辑2

好的，这一直困扰着我，因为使用JSoup登录网站应该不那么困难。我在那里建立了一个帐户，并为自己尝试。代码优先：

 String url = "https://users.premierleague.com/PremierUser/j_spring_security_check";

        Response res = Jsoup
                .connect(url)
                .followRedirects(false)
                .timeout(2_000)
                .data("j_username", "<USER>")
                .data("j_password", "<PASSWORD>")
                .method(Method.POST)
                .execute();

        Map<String, String> loginCookies = res.cookies();

        Document doc = Jsoup.connect("http://fantasy.premierleague.com/squad-selection/")
                .cookies(loginCookies)
                .get();

那么，这是怎么回事？首先，我意识到登录表单的目标是错误的。该页面似乎建立在spring之上，因此表单属性和目标使用spring的默认值name，j_spring_security_check和j_username。然后，我发生了读取超时，直到我将标志设置为j_password。我只能猜测为什么这样做有帮助，但是也许这是对爬虫的保护？

最后，我尝试连接到小队选择页面，解析的响应包含我的个人观点和数据。该代码似乎对我有用，您可以尝试一下吗？