我正在尝试从第一个标题之后的网页(使用python)中抓取所有文本。该标题的标签是:<h1 id="firstHeading" class="firstHeading" lang="en">Albert Einstein</h1>

在此标题之前,我不需要任何信息。我要刮掉此标题之后写的所有文本。我可以在python中使用BeautifulSoup吗?

我正在运行以下代码:
*

import requests
import bs4
from bs4 import BeautifulSoup

urlpage = 'https://en.wikipedia.org/wiki/Albert_Einstein#Publications'
res = requests.get(urlpage)
soup1 = (bs4.BeautifulSoup(res.text, 'lxml')).get_text()
 print(soup1)


*

该网页包含以下信息:

Albert Einstein - Wikipedia
document.documentElement.className="client-js";RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Albert_Einstein","wgTitle":"Albert Einstein","wgCurRevisionId":920687884,"wgRevisionId":920687884,"wgArticleId":736,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages with missing ISBNs","Webarchive template wayback links","CS1 German-language sources (de)","CS1: Julian–Gregorian uncertainty","CS1 French-language sources (fr)","CS1 errors: missing periodical","CS1: long volume value","Wikipedia indefinitely semi-protected pages","Use American English from February 2019","All Wikipedia articles written in American English","Articles with short description","Good articles","Articles containing German-language text","Biography with signature","Articles with hCards","Articles with hAudio microformats","All articles with unsourced statements",
"Articles with unsourced statements from July 2019","Commons category link from Wikidata","Articles with Wikilivres links","Articles with Curlie links","Articles with Project Gutenberg links","Articles with Internet Archive links","Articles with LibriVox links","Use dmy dates from August 2019","Wikipedia articles with BIBSYS identifiers","Wikipedia articles with BNE identifiers","Wikipedia articles with BNF identifiers","Wikipedia articles with GND identifiers","Wikipedia articles with HDS identifiers","Wikipedia articles with ISNI identifiers","Wikipedia articles with LCCN identifiers","Wikipedia articles with LNB identifiers","Wikipedia articles with MGP identifiers","Wikipedia articles with NARA identifiers","Wikipedia articles with NCL identifiers","Wikipedia articles with NDL identifiers","Wikipedia articles with NKC identifiers","Wikipedia articles with NLA identifiers","Wikipedia articles with NLA-person identifiers","Wikipedia articles with NLI identifiers",
"Wikipedia articles with NLR identifiers","Wikipedia articles with NSK identifiers","Wikipedia articles with NTA identifiers","Wikipedia articles with SBN identifiers","Wikipedia articles with SELIBR identifiers","Wikipedia articles with SNAC-ID identifiers","Wikipedia articles with SUDOC identifiers","Wikipedia articles with ULAN identifiers","Wikipedia articles with VIAF identifiers","Wikipedia articles with WorldCat-VIAF identifiers","AC with 25 elements","Wikipedia articles with suppressed authority control identifiers","Pages using authority control with parameters","Articles containing timelines","Pantheists","Spinozists","Albert Einstein","1879 births","1955 deaths","20th-century American engineers","20th-century American writers","20th-century German writers","20th-century physicists","American agnostics","American inventors","American letter writers","American pacifists","American people of German-Jewish descent","American physicists","American science writers",
"American socialists","American Zionists","Ashkenazi Jews","Charles University in Prague faculty","Corresponding Members of the Russian Academy of Sciences (1917–25)","Cosmologists","Deaths from abdominal aortic aneurysm","Einstein family","ETH Zurich alumni","ETH Zurich faculty","German agnostics","German Jews","German emigrants to Switzerland","German Nobel laureates","German inventors","German physicists","German socialists","European democratic socialists","Institute for Advanced Study faculty","Jewish agnostics","Jewish American scientists","Jewish emigrants from Nazi Germany to the United States","Jews who emigrated to escape Nazism","Jewish engineers","Jewish inventors","Jewish philosophers","Jewish physicists","Jewish socialists","Leiden University faculty","Foreign Fellows of the Indian National Science Academy","Foreign Members of the Royal Society","Members of the American Philosophical Society","Members of the Bavarian Academy of Sciences","Members of the Lincean Academy"
,"Members of the Royal Netherlands Academy of Arts and Sciences","Members of the United States National Academy of Sciences","Honorary Members of the USSR Academy of Sciences","Naturalised citizens of Austria","Naturalised citizens of Switzerland","New Jersey socialists","Nobel laureates in Physics","Patent examiners","People from Berlin","People from Bern","People from Munich","People from Princeton, New Jersey","People from Ulm","People from Zürich","People who lost German citizenship","People with acquired American citizenship","Philosophers of science","Relativity theorists","Stateless people","Swiss agnostics","Swiss emigrants to the United States","Swiss Jews","Swiss physicists","Theoretical physicists","Winners of the Max Planck Medal","World federalists","Recipients of the Pour le Mérite (civil class)","Determinists","Activists from New Jersey","Mathematicians involved with Mathematische Annalen","Intellectual Cooperation","Disease-related deaths in New Jersey"],
"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRelevantPageName":"Albert_Einstein","wgRelevantArticleId":736,"wgRequestId":"XaChjApAICIAALSsYfgAAABV","wgCSPNonce":!1,"wgIsProbablyEditable":!1,"wgRelevantPageIsProbablyEditable":!1,"wgRestrictionEdit":["autoconfirmed"],"wgRestrictionMove":["sysop"],"wgMediaViewerOnClick":!0,"wgMediaViewerEnabledByDefault":!0,"wgPopupsReferencePreviews":!1,"wgPopupsConflictsWithNavPopupGadget":!1,"wgVisualEditor":{"pageLanguageCode":"en","pageLanguageDir":"ltr","pageVariantFallbacks":"en"},"wgMFDisplayWikibaseDescriptions":{"search":!0,"nearby":!0,"watchlist":!0,"tagline":
!1},"wgWMESchemaEditAttemptStepOversample":!1,"wgULSCurrentAutonym":"English","wgNoticeProject":"wikipedia","wgWikibaseItemId":"Q937","wgCentralAuthMobileDomain":!1,"wgEditSubmitButtonLabelPublish":!0};RLSTATE={"ext.globalCssJs.user.styles":"ready","site.styles":"ready","noscript":"ready","user.styles":"ready","ext.globalCssJs.user":"ready","user":"ready","user.options":"ready","user.tokens":"loading","ext.cite.styles":"ready","ext.math.styles":"ready","mediawiki.legacy.shared":"ready","mediawiki.legacy.commonPrint":"ready","jquery.makeCollapsible.styles":"ready","mediawiki.toc.styles":"ready","wikibase.client.init":"ready","ext.visualEditor.desktopArticleTarget.noscript":"ready","ext.uls.interlanguage":"ready","ext.wikimediaBadges":"ready","ext.3d.styles":"ready","mediawiki.skinning.interface":"ready","skins.vector.styles":"ready"};RLPAGEMODULES=["ext.cite.ux-enhancements","ext.cite.tracking","ext.math.scripts","ext.scribunto.logs","site","mediawiki.page.startup",
"mediawiki.page.ready","jquery.makeCollapsible","mediawiki.toc","mediawiki.searchSuggest","ext.gadget.teahouse","ext.gadget.ReferenceTooltips","ext.gadget.watchlist-notice","ext.gadget.DRN-wizard","ext.gadget.charinsert","ext.gadget.refToolbar","ext.gadget.extra-toolbar-buttons","ext.gadget.switcher","ext.centralauth.centralautologin","mmv.head","mmv.bootstrap.autostart","ext.popups","ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader","ext.eventLogging","ext.wikimediaEvents","ext.navigationTiming","ext.uls.compactlinks","ext.uls.interface","ext.cx.eventlogging.campaigns","ext.quicksurveys.init","ext.centralNotice.geoIP","ext.centralNotice.startUp","skins.vector.js"];
(RLQ=window.RLQ||[]).push(function(){mw.loader.implement("user.tokens@tffin",function($,jQuery,require,module){/*@nomin*/mw.user.tokens.set({"patrolToken":"+\\","watchToken":"+\\","csrfToken":"+\\"});
});});



  Albert Einstein
  
  维基百科,自由的百科全书
  
  跳转至导航跳转至搜索“爱因斯坦”,将其重定向到此处。对于其他
  人们,请看爱因斯坦(姓氏)。其他用途,请参见Albert Einstein
  (消除歧义)和爱因斯坦(消除歧义)。
  
  德国出生的物理学家和相对论的开发者
  
  爱因斯坦(Albert Einstein)1921年生于(1879-03-14)1879年3月14日
  德国帝国符腾堡王国1955年4月18日去世(1955-04-18)
  (76岁)美国新泽西州普林斯顿居住德国,意大利,
  瑞士,奥地利(今天的捷克共和国),比利时,美国
  符腾堡州在此期间的公民身份
  德意志帝国(1879–1896)[注1]无国籍(1896-1901)的公民
  瑞士(1901-1955年)奥匈帝国的奥地利题材
  帝国(1911-1912)德国期间的普鲁士王国主题
  帝国(1914–1918)[注1]普鲁士自由州的德国公民
  (魏玛共和国,1918–1933年)美国公民(1940年至1955年)
  教育联邦理工学校(1896-1900;学士,1900)
  苏黎世大学(博士学位,1905)以广义相对论着称
  狭义相对论光电效应E = mc2(质量-能量
  等价)E = hf(普朗克-爱因斯坦关系)布朗运动理论
  爱因斯坦场方程Bose-Einstein统计Bose-Einstein
  凝结引力波宇宙常数统一场
  理论EPR悖论合奏解释其他概念列表
  配偶米雷娃·马里奇(MilevaMarić)(1903年; 1919年除法)艾尔莎·洛文塔(ElsaLöwenthal)(1919年;
  逝世[1] [2] 1936)儿童“ Lieserl”爱因斯坦·汉斯·阿尔伯特·爱因斯坦·爱德华
  “泰特”爱因斯坦奖巴纳德奖(1920)诺贝尔物理学奖
  (1921)Matteucci勋章(1921)ForMemRS(1921)[3] Copley勋章
  (1925)[3]皇家天文学会金奖(1926)Max
  普朗克奖章(1929)美国国家科学院院士(1942)
  世纪时光人物(1999)科学职业领域物理学,
  哲学机构瑞士专利局(伯尔尼)(1902-1909年)
  伯尔尼大学(1908–1909)苏黎世大学(1909–1911)
  布拉格的查尔斯大学(1911-1912)苏黎世联邦理工(1912-1914)
  普鲁士科学院(1914–1933)柏林洪堡大学
  (1914–1933年)威廉皇帝学院(Kaiser Wilhelm Institute)(主任,1917-1933年)德语
  物理学会(1916–1918年主席)莱顿大学(访问,
  1920)高级研究所(1933-1955)加州理工学院(访问,
  1931–1933年)牛津大学(访问,1931-1933年)论文Eine neue
  Bestimmung derMoleküldimensionen(分子的新测定
  尺寸)(1905年)医生顾问Alfred Kleiner其他学术
  顾问海因里希·弗里德里希·韦伯影响亚瑟·叔本华·巴鲁克
  Spinoza Bernhard Riemann大卫·休姆·恩斯特·马赫·亨德里克·洛伦兹·赫尔曼
  Minkowski Isaac Newton James秘书麦克斯韦·米歇尔·贝索·莫里茨
  Schlick Thomas Young几乎影响了所有现代物理学
  
  签名阿尔伯特·爱因斯坦(/ ˈaɪnstaɪn / EYEN-styne; [4]德语:[ˈalbɛʁt
  ˈʔaɪnʃtaɪn](听); 1879年3月14日至1955年4月18日)是德国人
  发展相对论的理论物理学家[5]
  现代物理学的两个支柱
  技师)。[3] [6]:274他的作品也因其对
  科学哲学。[7] [8]他是最广为人知的
  他的质量-能量当量公式。 。 。 。 。


我只想要第一个标题“爱因斯坦”之后的文字

最佳答案

首先找到h1标签,然后使用find_next_siblings('div')并打印文本值。

import requests
import bs4

urlpage = 'https://en.wikipedia.org/wiki/Albert_Einstein#Publications'
res = requests.get(urlpage)
soup1 =bs4.BeautifulSoup(res.text, 'lxml')
h1=soup1.find('h1')
for item in h1.find_next_siblings('div'):
    print(item.text)

关于python - 在python中的id =“firstheading”之后,如何在网页上抓取所有信息?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58347139/

10-13 07:46