当表格无法返回值时，如何抓取表格?(美丽汤)

您可以提取表以检查标签是否为评论 : 将pandas导入为pd汇入要求从bs4导入BeautifulSoup，评论URL ="https://www.sports-reference.com/cbb/schools/loyola-il/2020.html"汤= BeautifulSoup(requests.get(URL).content，"html.parser")评论= soup.find_all(text = lambda t:isinstance(t，Comment))comment_soup = BeautifulSoup(str(comments)，"html.parser")表格= comment_soup.select(#div_per_poss")[0]df = pd.read_html(str(comment_soup))打印(df) 输出: [Rk Player G GS MP FG ... AST STL BLK TOV PF PTS0 1.0卡梅隆·克鲁特维格32 32.0 1001 201 ... 133 39 20 81 454821 2.0泰特音乐厅32 32.0 1052 141 ... 70 47 3 57 56 4062 3.0侯爵夫人肯尼迪32 6.0 671110 ... 43 38 9 37 72 2943 4.0卢卡斯·威廉姆森32 32.0 967 99 ... 53 49 9 57 64 2874 5.0基思·克莱蒙斯24 24.0 758 78 ... 47 29 1 32 50 2495 6.0阿赫·乌瓜克32 31.0 768 62 ... 61 15 3 59 561816 7.0贾隆小鸽30 1.0 392 34 ... 12 10 1 17 15 1017 8.0帕克森·沃西克(Paxson Wojcik)30 1.0 327 25 ... 18 14 0 14 23 61...... The following is my code:import numpy as npimport pandas as pdimport requestsfrom bs4 import BeautifulSoupstats_page = requests.get('https://www.sports-reference.com/cbb/schools/loyola-il/2020.html')content = stats_page.contentsoup = BeautifulSoup(content, 'html.parser')table = soup.find(name='table', attrs={'id':'per_poss'})html_str = str(table)df = pd.read_html(html_str)[0]df.head()And I get the error: ValueError: No tables found.However, when I swap attrs={'id':'per_poss'} with a different table id like attrs={'id':'per_game'} I get an output.I am not familiar with html and scraping but I noticed in the tables that work, this is the html: <table class="sortable stats_table now_sortable is_sorted" id="per_game" data-cols-to-freeze="2">And in the tables that don't work, this is the html: <table class="sortable stats_table now_sortable sticky_table re2 le1" id="totals" data-cols-to-freeze="2">It seems the table classes are different and I am not sure if that is causing this problem and how to fix it if so.Thank you! 解决方案 This is happening because the table is within HTML comments .You can extract the table checking if the tags are of the type Comment:import pandas as pdimport requestsfrom bs4 import BeautifulSoup, CommentURL = "https://www.sports-reference.com/cbb/schools/loyola-il/2020.html"soup = BeautifulSoup(requests.get(URL).content, "html.parser")comments = soup.find_all(text=lambda t: isinstance(t, Comment))comment_soup = BeautifulSoup(str(comments), "html.parser")table = comment_soup.select("#div_per_poss")[0]df = pd.read_html(str(comment_soup))print(df)Output:[ Rk Player G GS MP FG ... AST STL BLK TOV PF PTS0 1.0 Cameron Krutwig 32 32.0 1001 201 ... 133 39 20 81 45 4821 2.0 Tate Hall 32 32.0 1052 141 ... 70 47 3 57 56 4062 3.0 Marquise Kennedy 32 6.0 671 110 ... 43 38 9 37 72 2943 4.0 Lucas Williamson 32 32.0 967 99 ... 53 49 9 57 64 2874 5.0 Keith Clemons 24 24.0 758 78 ... 47 29 1 32 50 2495 6.0 Aher Uguak 32 31.0 768 62 ... 61 15 3 59 56 1816 7.0 Jalon Pipkins 30 1.0 392 34 ... 12 10 1 17 15 1017 8.0 Paxson Wojcik 30 1.0 327 25 ... 18 14 0 14 23 61...... 这篇关于当表格无法返回值时，如何抓取表格?(美丽汤)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！

With

当表格无法返回值时，如何抓取表格?(美丽汤)