问题描述
I'm这里一个奇怪的问题,从来就一直使用的 Jsoup 1.7.2 的一段时间,没有问题,只是现在,当我尝试检索从本网站的主要头条 www.jornaldamarinha.pt 的,使用这种code:
I´m with an odd issue here, I´ve been using Jsoup 1.7.2 for a while, with no issues, only now, when I try to retrieve the main headlines from this website: www.jornaldamarinha.pt, using this code:
// Connecting...
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
.timeout(0)
.get();
// "*[class*=zincontent-wrap]" in "Jsoup idiom", means:
// Select all tags that contains classes with "zincontent-wrap" on its name.
Elements elems = doc.select("*[class*=zincontent-wrap]"); // Retrieves 0 results!
int t = elems.size();
Log.w("INFO", "Total Headlines: " + t);
// Loop trought all retrieved headlines:
for (Element e : elems) {
String headline = e.select("a").text().toString();
Log.w("HEADLINE", headline);
};
它失败!... 0检索结果。的(如果检索〜8)的
有机会,这个问题的原因是:
- 外国人... (类似机器人,但丑陋......)的
- 网站编码。的(我试图连接code传入HTML符合ISO-8859-15,处理葡萄牙的特殊字符,但问题依然存在)的
- 玛格式的HTML进来。的(我怀疑这可能是问题,因为选择器正常工作的尝试jsoup在线网页,并Jsoup通常处理损坏的HTML非常好)的
- 在类名( - )使用减号的与Jsoup搞乱。的(好像对我来说,是主要的(或至少,问题之一)的原因)的
- 别的东西... (很有可能!)的
- Aliens... (Similar to androids, but uglier...)
- Website encoding. (I tried to encode incoming HTML with ISO-8859-15, to handle portuguese special characters, but the issue remains)
- Mal-formatted incoming HTML. (I doubt this could be the issue, since the selector works fine on "Try jsoup online webpage", and Jsoup usually handles broken HTML very well)
- The use of the minus symbol in the class name ("-") is messing with Jsoup. (Seems, to me, to be the main (or at least, one) cause of the issue)
- Something else... (Very probably!)
BUT ...在我取的网址:使用这个CSS查询:
BUT... at http://try.jsoup.org if I fetch the URL: http://www.jornaldamarinha.pt using this CSS Query:
*[class*=zincontent-wrap]
一切工作好了,有!的(检索所有〜8正确的结果!)的
SO ...恢复,我需要的是做的正是网页做了什么,但使用code。
SO... to resume, all I need is to do exactly what that webpage does, but using code.
谢谢,提前,对于任何光线或解决办法,这个! :)
THANKS, in advance, for any light or workaround, about this! :)
推荐答案
解决方案!... 毕竟,一切都在上面code,被工作正常,我怀疑,除了......那CSS查询打破上Android's默认用户代理。我只是认为设置的userAgent 来Jsoup's连接方法的非常重要!所以,从来就编辑以下方式我的code和...现在就像一个魅力!的(正好与相同的结果,如网页)
SOLUTION!... After all, everything in the above code, was working correctly, as I suspected, except... That CSS Query breaks on Android´s "default user agent". I just figured that setting "userAgent" to Jsoup´s connection method is VERY important! So, I´ve edited my code on the following way, and... Works like a charm now !! (Exactly with same results, as in http://try.jsoup.org webpage)
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
.userAgent("Mozilla/5.0 Gecko/20100101 Firefox/21.0")
.timeout(0)
.get();
希望这有助于其他人呢! :)
Hope this helps anyone else too! :)
这篇关于Jsoup - CSS选择器的查询问题(?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!