机器人与HTML解析jsoup

机器人与HTML解析jsoup

本文介绍了机器人与HTML解析jsoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图解析HTML页面如在Android上。试图用htmlcleaner做到这一点,但它因此未工作。试图用jsoup做到这一点。在开始时是我的code复杂。这里是最短code。同样的事情适用于Java的请帮帮忙。我的日志
这里是我的类:

Trying to parse an html pages like http://www.ts.kg/serials/ on android. Tried to do it with htmlcleaner, but it didnot work. Trying to do it with jsoup. In the begining was my code to complex. Here is the shortest code. The same thing works on java Please help. My Logs http://smartpics.kz/imgs/1361209668WW5O.JPGHere is my class:

public class MainActivity extends Activity {

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    String[] names= {};
    String url = "http://www.ts.kg/mults/";

    try {
        Document doc = Jsoup.connect(url).get();
        Element e = doc.body();
        Elements ggg = e.getElementsByAttributeValue("class", "categoryblocks");
        for (int i =0;i<ggg.size();i++) {
            Element linkk = ggg.get(i);
            if(linkk.getElementsByTag("a")!=null){
                Element atom = linkk.getElementsByTag("a").first();
                String n = atom.getElementsByTag("span").first().text();
                names[i] = n;
            }

        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    ListView lvMain = (ListView) findViewById(R.id.listViewData);

    // создаем адаптер
    ArrayAdapter<String> adapter = new ArrayAdapter<String>(this,
        android.R.layout.simple_list_item_1, names);

    // присваиваем адаптер списку
    lvMain.setAdapter(adapter);
}

@Override
public boolean onCreateOptionsMenu(Menu menu) {
    // Inflate the menu; this adds items to the action bar if it is present.
    getMenuInflater().inflate(R.menu.activity_main, menu);
    return true;
}

}

公布20.feb.2013:

posted 20.feb.2013:

tryed做到这一点,因为它是提出Shoshy(感谢您的答案),但它没有工作(因为我不是从 - 右的地方种植的手也许)。下面是我修改code:
公共类MainActivity延伸活动{

tryed to do it as it was proposed by Shoshy (thanks for your answer), but it didn't work (perhaps because of my not-from-right-place-growing hands). Here is my modified code:public class MainActivity extends Activity {

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);
    url = "http://www.ts.kg/mults/";
    pd = ProgressDialog.show(MainActivity.this, "Working...", "request to server", true, false);
    //Запускаем парсинг
    new AsyncExecution().execute();
}
private ProgressDialog pd;
String url;;
String names[];

private class AsyncExecution extends AsyncTask<Void, Void, Void>{

    @Override
    protected Void doInBackground(Void... params) {
          // here your task will be done in seperate thread from UI thread
          // and if you want to use the variables (that will be modifed here)
          // from anywhere in MainActivity, then you should declare them as global
          // variable in MainActivity. remember you cannot update UI from here , like
          // Toast message. if you want to do that you can use OnPostExecute
          // method bellow .
               try {
                  ArrayList<String> array = new ArrayList<String>();
                  Document doc = Jsoup.connect(url).get();
                  Element e = doc.body();
                  Elements ggg = e.getElementsByAttributeValue("class", "categoryblocks");
                  for (int i =0;i<ggg.size();i++) {
                      Element linkk = ggg.get(i);
                      if(linkk.getElementsByTag("a")!=null){
                          Element atom = linkk.getElementsByTag("a").first();
                          String n = atom.getElementsByTag("span").first().text();
                          array.add(n);
                      }

                  }
                  for (int i = 0;i<array.size();i++){
                      names[i]=array.get(i);
                  }
              } catch (IOException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
              }
            return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        //Убираем диалог загрузки
        pd.dismiss();
        //Находим ListView
        ListView listview = (ListView) findViewById(R.id.listViewData);
        //Загружаем в него результат работы doInBackground
        listview.setAdapter(new ArrayAdapter<String>(MainActivity.this,
            android.R.layout.simple_list_item_1, names));

       }
}

}

推荐答案

你必须做出请求从UI线程获取的页面在另一个线程。您可以使用的AsyncTask 。我给通过编辑code一些例子:
关于的AsyncTask 的链接:

you have to make the request for getting the page in another thread from UI thread. you can use AsyncTask. i am giving some example by editing your code :the link about AsyncTask is : about AsynckTask

public class MainActivity extends Activity {

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    //the class is defined bellow
    new AsyncExecution().execute();

    //other codes.....
     .......................
}
/// your other codes .....


    // you need to add this class
    private class AsyncExecution extends AsyncTask<Void, Void, Void>{

    @Override
    protected Void doInBackground(Void... params) {
          // here your task will be done in seperate thread from UI thread
          // and if you want to use the variables (that will be modifed here)
          // from anywhere in MainActivity, then you should declare them as global
          // variable in MainActivity. remember you cannot update UI from here , like
          // Toast message. if you want to do that you can use OnPostExecute
          // method bellow .
               try {
                  Document doc = Jsoup.connect(url).get();
                  Element e = doc.body();
                  Elements ggg = e.getElementsByAttributeValue("class", "categoryblocks");
                  for (int i =0;i<ggg.size();i++) {
                      Element linkk = ggg.get(i);
                      if(linkk.getElementsByTag("a")!=null){
                          Element atom = linkk.getElementsByTag("a").first();
                          String n = atom.getElementsByTag("span").first().text();
                          names[i] = n;
                      }

                  }
              } catch (IOException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
              }
    }

    @Override
    protected void onPostExecute(Void result) {

        }

}

这篇关于机器人与HTML解析jsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:07