我想从这个网站上得到一些信息:
http://www.131500.com.au/plan-your-trip/trip-planner?itd_name_origin=hurstville&itd_name_destination=town+hall
表格结构为:
<td headers="header2">
Take the Eastern Suburbs and Illawarra train (CityRail)
<br />
<b>Dep: 12:35pm Hurstville Station Platform 3</b>
<br />
<b>Arr: 1:06pm Town Hall Station Platform 5, Sydney</b>
<br />
</td>
我的代码:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import android.app.Activity;
import android.os.Bundle;
public class JsouptestActivity extends Activity {
/** Called when the activity is first created. */
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
jsouptest();
}
public void jsouptest() {
Document doc = null;
try {
doc = Jsoup
.connect(
"http://www.131500.com.au/plan-your-trip/trip-planner?itd_name_origin=hurstville&itd_name_destination=town+hall")
.get();
} catch (IOException e) {
Elements tables = doc.select("div#boxbody");
System.out.println(tables.get(0).text().toString());
}
}
}
我希望看到的是:
Take the Eastern Suburbs and Illawarra train (CityRail)
Dep: 12:35pm ; Hurstville Station Platform 3
Arr: 1:06pm ; Town Hall Station Platform 5, Sydney
我试过的:
Elements tables = doc.select("div#boxbody table#dataTbl");
Elements tables = doc.select("div#boxbody table#dataTbl+widthcol2and3");
因为数据实际上
<table class="dataTbl widthcol2and3" cellspacing="0" style="margin:0px ! important;border-right:0px none;" summary="Search Results Details">
所以我想我不能只用这个(datatbl和widthcol2和3之间的空格):
Elements tables = doc.select("div#boxbody table#dataTbl widthcol2and3");
所以我尝试:
Elements tables = doc.select("div#boxbody iewfix"); // and div#boxbody+iewfix
但每次我尝试在模拟器中运行测试应用程序时
The application has stopped unexpectedly. Please try again.
日志如下所示:
05-29 15:58:42.575: W/dalvikvm(755): threadid=3: thread exiting with uncaught exception (group=0x4001b188)
05-29 15:58:42.575: E/AndroidRuntime(755): Uncaught handler: thread main exiting due to uncaught exception
05-29 15:58:42.585: E/AndroidRuntime(755): java.lang.NoClassDefFoundError: org.jsoup.Jsoup
05-29 15:58:42.585: E/AndroidRuntime(755): at com.yeasiz.jsouptest.JsouptestActivity.jsouptest(JsouptestActivity.java:25)
05-29 15:58:42.585: E/AndroidRuntime(755): at com.yeasiz.jsouptest.JsouptestActivity.onCreate(JsouptestActivity.java:18)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1047)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2459)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2512)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.ActivityThread.access$2200(ActivityThread.java:119)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1863)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.os.Handler.dispatchMessage(Handler.java:99)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.os.Looper.loop(Looper.java:123)
05-29 15:58:42.585: E/AndroidRuntime(755): at android.app.ActivityThread.main(ActivityThread.java:4363)
05-29 15:58:42.585: E/AndroidRuntime(755): at java.lang.reflect.Method.invokeNative(Native Method)
05-29 15:58:42.585: E/AndroidRuntime(755): at java.lang.reflect.Method.invoke(Method.java:521)
05-29 15:58:42.585: E/AndroidRuntime(755): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:860)
05-29 15:58:42.585: E/AndroidRuntime(755): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:618)
05-29 15:58:42.585: E/AndroidRuntime(755): at dalvik.system.NativeStart.main(Native Method)
05-29 15:58:42.595: I/dalvikvm(755): threadid=7: reacting to signal 3
05-29 15:58:42.595: E/dalvikvm(755): Unable to open stack trace file '/data/anr/traces.txt': Permission denied
似乎jsoup找不到正确的类。
我认为我的选择器语法是错误的,但是我看看Use selector-syntax to find elements,我仍然不能解决这个问题。
请在这个问题上帮助我。
最佳答案
你有什么例外,在哪一行?
首先,不要在catch()中这样做:
org.jsoup.select.Elements tables = doc.select("div#boxbody");
System.out.println(tables.get(0).text().toString());
只有在连接过程中出错时才会执行,如果有,则此时doc始终为空。
其次,您提供的代码在我尝试时引发连接超时异常。
试试这个(对我有效):
Document doc = null;
InputStream is = null;
String url = "http://www.131500.com.au/plan-your-trip/trip-planner?itd_name_origin=hurstville&itd_name_destination=town+hall";
is =new URL(url).openStream();
doc = org.jsoup.Jsoup.parse(is , "utf-8", url);
is.close();
此外,您还尝试按id选择元素:“div boxbody”,其中“boxbody”是类名的一部分。我打开了您提供的链接,有多个div元素的类名包含单词“boxbody”,而不是类的全名。我想你感兴趣的类名是“boxbody-iewfix”。也许可以,但我注意到有时候jsoup对空格的反应很奇怪(getelementsbyclass(“boxbody iewfix”)对我不起作用)。
我不喜欢select,我在使用它时通常会犯很多错误,所以我会这样做:
Elements tables = doc.getElementsByAttributeValueStarting("class", "boxbody"); //I checked, it works
然后
tables.get(2).text(); // because the you're interested in third element which class name starts with "boxbody"
它将返回:
“模式详情乘坐东郊和伊拉瓦拉火车(城市铁路)Dep:5:05pm赫斯维尔站站台3 Arr:5:31pm市政厅站站台5,悉尼地图本次行程路线图备选时间”