问题描述
我可以将这两个块组合成一个:
编辑:不是像Yacoby结合循环任何其他方式在答题确实
在soup.findAll标签(['脚本','形式']):
tag.extract()在soup.findAll(ID =页脚)标签:
tag.extract()
也可以我多块到一个:
在soup.findAll(ID =页脚)标签:
tag.extract()在soup.findAll(ID =内容)标签:
tag.extract()在soup.findAll(ID =链接)标签:
tag.extract()
,或者可以是有一些拉姆达前pression哪里可以检查是否在阵列,或任何其它更简单的方法。
另外我怎么找到属性类的标签,因为该类保留关键字:
编辑:这部分是由soup.findAll(ATTRS = {:'NOPRINT''类'}):解决
在soup.findAll(类=NOPRINT)标签:
tag.extract()
您可以传递函数 .findall()
是这样的:
soup.findAll(拉姆达标签:tag.name在['脚本','形式']或标记['身份证'] ==页脚)
但是,你可能会首先建立的标签列表,然后遍历它更好:
标签= soup.findAll(['脚本','形式'])
tags.extend(soup.findAll(ID =页脚))在标签标签:
tag.extract()
如果您要筛选好 ID
S,你可以使用:
在soup.findAll(标签拉姆达标签:tag.has_key('身份证')和
标签['身份证']在['尾','内容','链接']):
tag.extract()
一个更具体的方法是指定一个lambda到 ID
参数:
在soup.findAll标签(ID =λ值:值['尾','内容','链接']):
tag.extract()
Can I combine these two blocks into one:
Edit: Any other method than combining loops like Yacoby did in the answer.
for tag in soup.findAll(['script', 'form']):
tag.extract()
for tag in soup.findAll(id="footer"):
tag.extract()
Also can I multiple blocks into one:
for tag in soup.findAll(id="footer"):
tag.extract()
for tag in soup.findAll(id="content"):
tag.extract()
for tag in soup.findAll(id="links"):
tag.extract()
or may be there is some lambda expression where I can check whether in array, or any other simpler method.
Also how do I find tags with attribute class, as class is reserved keyword:
EDIT: this part is solved by the soup.findAll(attrs={'class': 'noprint'}):
for tag in soup.findAll(class="noprint"):
tag.extract()
You can pass functions to .findall()
like this:
soup.findAll(lambda tag: tag.name in ['script', 'form'] or tag['id'] == "footer")
But you might be better off by first building a list of tags and then iterating over it:
tags = soup.findAll(['script', 'form'])
tags.extend(soup.findAll(id="footer"))
for tag in tags:
tag.extract()
If you want to filter for several id
s, you can use:
for tag in soup.findAll(lambda tag: tag.has_key('id') and
tag['id'] in ['footer', 'content', 'links']):
tag.extract()
A more specific approach would be to assign a lambda to the id
parameter:
for tag in soup.findAll(id=lambda value: value in ['footer', 'content', 'links']):
tag.extract()
这篇关于我可以合并两个'的findAll“搜索块beautifulsoup,成吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!