本文介绍了JSOUP选择< D​​IV>与特定的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在做一个小的Andr​​oid应用程序的一类,我觉得从美国癌症协会的网站上癌症相关的事件。我一直在使用JSoup已经获得有关事件的基本信息,并得到我试图使用select()方法,该网站的具体信息。但是,我使用的是待价而沽的方式更多的HTML节点比我想,我无法弄清楚,为什么当前的方法。我试图抓住该表看起来是这样的:

I'm making a small Android application for a class where I find cancer-related events from the American Cancer Society's website. I've been using JSoup to get basic information about the events, and to get specific information from the website I've tried to use the select() method. However, the current method that I'm using grabs way more HTML nodes than I would like and I couldn't figure out why. The table that I'm trying to grab looks like this:

编辑:我意识到其中id =pnlResults并没有结束,在该表中,约3个表后结束,所有的,我想抓住信息。下面是该表再次

I realized that the where id = "pnlResults" does not end at that table, it ends after about 3 more tables, all with information that I would like to grab. Here is the table again

    <div id="pnlResults">

        <h2><span id="lblEventName">American Cancer Society 44th Annual Walter Hagen Golf Tournament</span></h2>
        <!-- General Information Box -->
        <div class="text-box boxed wide">
            <h3 class="head" style="width:97%;">
                General Information
            </h3>
            <div class="content">


                <p>
                    <label>Event Times:</label><span id="lblStartDate">Monday, July 30, 2012</span><span id="lblEndDate"></span><br />
                    <label>&nbsp;</label><span id="lblStartTime">10:00 AM</span> - <span id="lblEndTime">9:00 PM</span>
                </p>
                <p>
                    <label>Time Zone:</label><span id="lblTimeZone">Eastern</span>

                </p>
                <p>
                    <label>Description:</label><span id="lblDesc" class="fieldData long">The American Cancer Society Walter Hagen Golf Tournament highlights the Society’s role in supporting research and patient care here in Rochester. Funds raised through this event help us make a difference in patents’ lives every day though programs including Road to Recovery and Patient Navigation as well as support grants to our research institutions.  144 golfers will play a round of golf and then enjoy cocktails, dinner, and silent auction following the tournament. </span>
                </p>
                <p>
                    <label>Agenda:</label><span id="lblAgenda" class="fieldData long">10:00am - Check-in, 11:00am - Lunch, 12:15pm - Shot gun start, 6:00 - Cocktails and silent auction, 7:00pm Dinner and program</span>
                </p>

            </div>
        </div>

        <div id="pnlStandardDisplay">


        <!-- Event Location Box -->
        <div class="text-box boxed wide line">
            <h3 class="head" style="width:97%;">
                Event Location
            </h3>
            <div class="content" style="display:inline-block; width:97%;">


                <div >
                    <div id="mapOutsideContainer" class="resource-map">
                       <div id="map_canvas" class="resource-map" ></div>
                    </div>
                    <script  type="text/javascript">
                        var mapDataPoints = [{ "lat":43.1075545,"lng":-77.5164518, "title":"Golf Event","content":"<b>American Cancer Society 44th Annual Walter Hagen Golf Tournament<\/b><br/><\/br>4045 East Avenue<br /><br/>Rochester, New York  14618<br /><br />Phone: <br />Fax: "} ];
                        buildMap(mapDataPoints, -5);
                    </script>
                </div>

                <h4><span id="lblLocationName">Irondequoit Country Club</span></h4>
                <p>

                    <label>Address:</label><span id="lblAddress" class="fieldData" style="width:150px;">4045 East Avenue<br />Rochester, New York 14618</span>
                </p>
                <p>
                    <label nowrap="nowrap">Handicap Accessible:</label><span id="lblHandicapAccesible">Yes</span>
                </p>
            </div>

        </div>

        <!-- Primary Contact Box -->
        <div class ="line" >
        <div id="eventPrimaryContact_divContact" class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Primary Contact
                    </h3>
                    <div class="content">

                        <p>

                            <label>Contact:</label><span id="eventPrimaryContact_lblContact">Katerina Kormas (<a href="mailto:[email protected]?subject=American Cancer Society 44th Annual Walter Hagen Golf Tournament">Contact ACS for Details</a>)</span>

                        </p>
                        <p>
                            <label>Contact Type:</label><span id="eventPrimaryContact_lblContactType">ACS Staff</span>
                        </p>
                        <p>

                            <label>Phone:</label><span id="eventPrimaryContact_lblContactPhone">(585) 288-1950</span>
                        </p>
                        <p>
                            <label>Additional Information:</label><span id="eventPrimaryContact_lblContactAddlInfo" class="fieldData long">Direct line is 585-224-4919 or cell 585-645-8912</span>
                        </p>
                    </div>
                </div>

        </div>

        <!-- Registration Information Box -->

        <div class="text-box boxed wide line">
            <h3 class="head" style="width:97%;">
                Registration Information
            </h3>
            <div class="content">

                <p>
                    <label nowrap="nowrap">Registration Required?: </label><span id="lblRegRequired">Yes</span>

                </p>
            </div>
        </div>

        <!-- Event Cost Box -->
        <div class ="line" >
        <div id="eventCost_divCost" class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Event Cost
                    </h3>
                    <div class="content">

                        <p>
                            <label>Cost/Registration Fee: </label><span id="eventCost_lblCostRegFee" class="fieldData long">$350 per golfer</span>
                        </p>
                        <p>
                            <label>Payment Type: </label><span id="eventCost_lblPaymentTypes" class="fieldData">Cash, Check, American Express, Mastercard, Visa, Discover</span>
                        </p>
                        <p>

                            <label>Check Payable To: </label><span id="eventCost_lblCheckPayable" class="fieldData">American Cancer Society</span>
                        </p>
                        <p>
                            <label>Memo Line: </label><span id="eventCost_lblCheckMemo" class="fieldData">American Cancer Society 44th Annual Walter Hagen Golf Tourna</span>
                        </p>
                        <p>
                            <label>Mail Check To:</label><span id="eventCost_lblCheckMailTo" class="fieldData">American Cancer Society<br />1120 South Goodman St<br />Rochester, New York 14620</span>

                        </p>
                    </div>
                </div>

        </div>

        <!-- Tax Deduction Information Box -->
        <div class="line">

                <div class="text-box boxed wide">
                    <h3 class="head" style="width:97%;">
                        Tax Deduction Information
                    </h3>

                    <div class="content">
                        <p>
                            $210  per golfer is tax deductible
                        </p>
                    </div>
                </div>

        </div>



</div> <!-- end standard display -->
         <!-- end daffodil display -->

编辑:鉴于这些新表,我想提取的一般信息,以及活动地点。我怎么会去这样做?也许使用的选择子集我只能再次选择当标头是我想要什么?

Given these new tables, I would like to extract the General Information, and Event location. How would I go about doing that? Maybe using the subset of select I just got to select again Where the headers are what I want?

在code,其中我使用select(),如下图所示。正如我之前所说,我试图用

The code where I'm using the select() is shown below. As I said before, I tried to use

select("div[id=pnlResults]);

但返回的数据不仅仅是股利其中ID为pnlResults得多。

but the returned data is much more than just the div where the id is pnlResults.

public ArrayList<Event> results()
{
    ArrayList<Event> results = new ArrayList<Event>();
    Document doc = Jsoup.parse(page);
    Elements links = doc.select("a[href*=event-details]");

    for(Element e: links)
    {
        String title = e.text();
        String link = "http://www.cancer.org/involved/participate/app/"+e.attr("href");
        try{
            Document eventInfo = Jsoup.connect(link).get();
            Elements info = eventInfo.select("div[id*=pnlResults");


        }
        catch(MalformedURLException exception)
        {
            exception.printStackTrace();
        }
        catch(IOException exception)
        {
            exception.printStackTrace();
        }

    }
    return results;
}

任何帮助将是很大的AP preciated。

Any help would be greatly appreciated.

推荐答案

尝试:

 Elements info = eventInfo.select("div#pnlResults");

更新您的更新:

既然你现在有更多的数据,而且由于HTML本身不是很大,你就只需要工作,通过它来挑选出你的数据。如果你需要的所有内容都 ID 值,然后使用 ID 这些元素的属性来获取文本。

Since you now have more data, and since the HTML itself isn't that great you'll just have to work through it to pick out your data. If the content you need all have id values then use the id attribute of those elements to get the text.

这篇关于JSOUP选择&LT; D​​IV&GT;与特定的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 12:04