问题描述
我正在尝试使用beatifulsoup摆脱<script>
标签和标签内的内容.我去了文档,似乎是一个非常简单的函数.有关此功能的更多信息,请此处.这是到目前为止我已经解析的html页面的内容...
I'm trying to get rid of <script>
tags and the content inside the tag utilizing beatifulsoup. I went to the documentation and seems to be a really simple function to call. More information about the function is here. Here is the content of the html page that I have parsed so far...
<body class="pb-theme-normal pb-full-fluid">
<div class="pub_300x250 pub_300x250m pub_728x90 text-ad textAd text_ad text_ads text-ads text-ad-links" id="wp-adb-c" style="width: 1px !important;
height: 1px !important;
position: absolute !important;
left: -10000px !important;
top: -1000px !important;
">
</div>
<div id="pb-f-a">
</div>
<div class="" id="pb-root">
<script>
(function(a){
TWP=window.TWP||{};
TWP.Features=TWP.Features||{};
TWP.Features.Page=TWP.Features.Page||{};
TWP.Features.Page.PostRecommends={};
TWP.Features.Page.PostRecommends.url="https://recommendation-hybrid.wpdigital.net/hybrid/hybrid-filter/hybrid.json?callback\x3d?";
TWP.Features.Page.PostRecommends.trackUrl="https://recommendation-hybrid.wpdigital.net/hybrid/hybrid-filter/tracker.json?callback\x3d?";
TWP.Features.Page.PostRecommends.profileUrl="https://usersegment.wpdigital.net/usersegments";
TWP.Features.Page.PostRecommends.canonicalUrl=""
})(jQuery);
</script>
</div>
</body>
想象一下,您有类似的Web内容,并且存在于名为soup_html
的BeautifulSoup对象中.如果我运行soup_html.script.decompose()
,并且它们调用对象soup_html
,则脚本标记仍然存在.如何摆脱<script>
和这些标签内的内容?
Imagine you have some web content like that and you have that in a BeautifulSoup object called soup_html
. If I run soup_html.script.decompose()
and them call the object soup_html
the script tags still there. How I can get rid of the <script>
and the content inside those tags?
markup = 'The html above'
soup = BeautifulSoup(markup)
html_body = soup.body
soup.script.decompose()
html_body
推荐答案
这只会从汤"中删除一个单个脚本元素.相反,我认为您打算分解所有这些文件:
This would remove a single script element from the "Soup" only. Instead, I think you meant to decompose all of them:
for script in soup("script"):
script.decompose()
这篇关于Beautifulsoup分解()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!