我正在尝试从具有不同ID名称的段落中抓取文本。文本如下所示:

<p id="comFull1" class="comment" style="display:none"><strong>Comment:
</strong><br>I realized how much Abilify has been helping me when I recently
tried to taper off of it. I am on the bipolar spectrum, with mainly
depression and some OCD symptoms. My obsessive, intrusive thoughts came
racing back when I decreased the medication. I also got much more tired and
had insomnia with the decrease. am not happy with side effects of 15 lb
weight gain, increased cholesterol and a flat effect on my emotions. I am
actually wondering if an increase from the 7 mg would help even more...for
now I&#39;m living with the side effects.<br><a
onclick="toggle('comTrunc1'); toggle('comFull1');return false;"
href="#">Hide Full Comment</a></p>

<p id="comFull2" class="comment" style="display:none"><strong>Comment:
</strong><br>It&#39;s worked Very well for me. I&#39;m sleeping I&#39;m
eating I&#39;m going Out in the public. Overall I&#39;m very
satisfied.However I haven&#39;t heard anybody mention this but my feet are
very puffy and swollen is this a side effect does anyone know?<br><a
onclick="toggle('comTrunc2'); toggle('comFull2');return false;"
href="#">Hide Full Comment</a></p>

......


我只能只从特定ID抓取文本,而不能一次从所有ID抓取文本。任何人都可以在这个问题上提供帮助,以从所有ID中提取文字。代码看起来像这样

>>> from urllib.request import Request, urlopen
>>> from bs4 import BeautifulSoup
>>> url = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
>>> req = Request(url,headers={'User-Agent': 'Mozilla/5.0'})
>>> webpage = urlopen(req).read()
>>> soup = BeautifulSoup(webpage, "html.parser")
>>> required2 = soup.find("p", {"id": "comFull1"}).text
>>> required2
"Comment:I realized how much Abilify has been helping me when I recently
tried to taper off of it. I am on the bipolar spectrum, with mainly
depression and some OCD symptoms. My obsessive, intrusive thoughts came
racing back when I decreased the medication. I also got much more tired and
had insomnia with the decrease. am not happy with side effects of 15 lb
weight gain, increased cholesterol and a flat effect on my emotions. I am
actually wondering if an increase from the 7 mg would help even more...for
now I'm living with the side effects.Hide Full Comment"

最佳答案

尝试这个。如果所有包含段落的ID号都以1,2,3 e.t.c为后缀,如comFull1,comFull2,comFull3所示,则下面的选择器应进行处理。

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

soup = BeautifulSoup(content, "html.parser")
for item in soup.select("[id^='comFull']"):
    print(item.text)

关于python - 如何从具有不同ID名称的段落中抓取文本?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48375056/

10-14 18:59
查看更多