本文介绍了使用 Beautiful Soup 按类名获取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Beautiful Soup 模块,如何获取类名为 feeditemcontent cxfeeditemcontentdiv 标签的数据?是吗:

soup.class['feeditemcontent cxfeeditemcontent']

或:

soup.find_all('class')

这是 HTML 源代码:

<div class="feeditembodyandfooter"><div class="feeditembody"><span>实际数据在此处</span>

这是Python代码:

 from BeautifulSoup import BeautifulSouphtml_doc = open('home.jsp.html', 'r')汤 = BeautifulSoup(html_doc)class="feeditemcontent cxfeeditemcontent"
解决方案

试试这个,也许这对这个简单的事情来说太过分了,但它确实有效:

def match_class(target):目标 = target.split()def do_match(tag):尝试:classes = dict(tag.attrs)["class"]除了 KeyError:类 = ""类 = 类.split()return all(c in classes for c in target)返回 do_matchhtml = """

<div class="feeditembodyandfooter"><div class="feeditembody"><span>实际数据在此处</span>

</div>"""从 BeautifulSoup 导入 BeautifulSoup汤 = BeautifulSoup(html)匹配 = 汤.findAll(match_class("feeditemcontent cxfeeditemcontent"))对于匹配中的 m:打印米打印-"*10匹配 = 汤.findAll(match_class("feeditembody"))对于匹配中的 m:打印米打印-"*10

Using Beautiful Soup module, how can I get data of a div tag whose class name is feeditemcontent cxfeeditemcontent? Is it:

soup.class['feeditemcontent cxfeeditemcontent']

or:

soup.find_all('class')

This is the HTML source:

<div class="feeditemcontent cxfeeditemcontent">
    <div class="feeditembodyandfooter">
         <div class="feeditembody">
         <span>The actual data is some where here</span>
         </div>
     </div>
 </div> 

and this is the Python code:

 from BeautifulSoup import BeautifulSoup
 html_doc = open('home.jsp.html', 'r')

 soup = BeautifulSoup(html_doc)
 class="feeditemcontent cxfeeditemcontent"
解决方案

Try this, maybe it's too much for this simple thing but it works:

def match_class(target):
    target = target.split()
    def do_match(tag):
        try:
            classes = dict(tag.attrs)["class"]
        except KeyError:
            classes = ""
        classes = classes.split()
        return all(c in classes for c in target)
    return do_match

html = """<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>"""

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)

matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))
for m in matches:
    print m
    print "-"*10

matches = soup.findAll(match_class("feeditembody"))
for m in matches:
    print m
    print "-"*10

这篇关于使用 Beautiful Soup 按类名获取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 17:40