问题描述
我使用漂亮的汤和python,对显示的网站进行了一些网页抓取工作,以隔离等级,公司名称和收入.
我想在我使用flask和jinja2渲染的html表中显示表中前十家公司的结果,但是,我编写的代码仅显示第一条记录五次. /p>
文件中的代码:webscraper.py
url = 'https://en.m.wikipedia.org/wiki/List_of_largest_Internet_companies'
req = requests.get(url)
bsObj = BeautifulSoup(req.text, 'html.parser')
data = bsObj.find('table',{'class':'wikitable sortable mw-collapsible'})
table_data=[]
trs = bsObj.select('table tr')
for tr in trs[1:6]: #first element is empty
row = []
for t in tr.select('td')[:3]: #td is referring to the columns
row.extend([t.text.strip()])
table_data.append(row)
data=table_data
rank=data[0][0]
name=data[0][1]
revenue=data[0][2]
home.html中的相关代码
<p>{{data}}</p>
<table class="table">
<thead>
<tr>
<th scope="col">#</th>
<th scope="col">Rank</th>
<th scope="col">Name</th>
<th scope="col">Revenue</th>
</tr>
</thead>
<tbody>
{% for element in data %}
<tr>
<th scope="row"></th>
<td>{{rank}}</td>
<td>{{name}}</td>
<td>{{revenue}}</td>
</tr>
{% endfor %}
</tbody>
HTML输出为:注意:变量{{data}}正确显示了所有五个记录.但是我没有正确隔离数据.
[['1','Amazon','$ 280.5'],['2',Google','$ 161.8'],['3','JD.com','$ 82.8'],[' 4','Facebook','$ 70.69'],['5','Alibaba','$ 56.152']]
排名名称收入
1亚马逊$ 280.51亚马逊$ 280.51亚马逊$ 280.51亚马逊$ 280.51亚马逊$ 280.5
如前所述,我想要1-10,所有列出最多10家公司,而不仅仅是亚马逊.
关于我在代码中做错了什么的任何建议-我想要与我自己的代码有关的最优雅的解决方案,而不是一个全新的想法或解决方案.
也请解释for循环及其背后的理论.
我知道这是错误的:
rank=data[0][0]
name=data[0][1]
revenue=data[0][2]
但是不明白为什么以及如何以最优雅的方式构造它,以使我的变量rank,name和Revenue包含各自的数据元素.
感谢@mmfallacy,他在上面提出了我刚刚充实的答案的建议.
它可以工作,但是将接受他建议的答案.这里供参考:
{% for element in data %}
<tr>
<th scope="row"></th>
<td>{{element[0]}}</td>
<td>{{element[1]}}</td>
<td>{{element[2]}}</td>
</tr>
{% endfor %}
我只是删除了所有试图在.py文件中生成变量等级和收入的尝试.
Using beautiful soup and python, I have undertaken some webscraping of the shown website to isolate: the rank, company name and revenue.
I would like to show, in an html table that I am rendering using flask and jinja2, the results of the top ten companies in the table, however, the code I have written is just displaying the first record five times.
Code in file: webscraper.py
url = 'https://en.m.wikipedia.org/wiki/List_of_largest_Internet_companies'
req = requests.get(url)
bsObj = BeautifulSoup(req.text, 'html.parser')
data = bsObj.find('table',{'class':'wikitable sortable mw-collapsible'})
table_data=[]
trs = bsObj.select('table tr')
for tr in trs[1:6]: #first element is empty
row = []
for t in tr.select('td')[:3]: #td is referring to the columns
row.extend([t.text.strip()])
table_data.append(row)
data=table_data
rank=data[0][0]
name=data[0][1]
revenue=data[0][2]
Relevant code in home.html
<p>{{data}}</p>
<table class="table">
<thead>
<tr>
<th scope="col">#</th>
<th scope="col">Rank</th>
<th scope="col">Name</th>
<th scope="col">Revenue</th>
</tr>
</thead>
<tbody>
{% for element in data %}
<tr>
<th scope="row"></th>
<td>{{rank}}</td>
<td>{{name}}</td>
<td>{{revenue}}</td>
</tr>
{% endfor %}
</tbody>
The HTML output is: Note: The variable {{data}} is showing all five records correctly..but I am not isolating the data correctly.
[['1', 'Amazon', '$280.5'], ['2', 'Google', '$161.8'], ['3', 'JD.com', '$82.8'], ['4', 'Facebook', '$70.69'], ['5', 'Alibaba', '$56.152']]
Rank Name Revenue
1 Amazon $280.51 Amazon $280.51 Amazon $280.51 Amazon $280.51 Amazon $280.5
As mentioned, I want 1 - 10, all the companies listed up to 10, not just Amazon.
Any suggestions as to what I've done wrong in my code - I'd like the most elegant solution that pertains to my own code, not a completely new idea or solution.
Explanation of the for loop and theory behind it please too.
I know this is wrong:
rank=data[0][0]
name=data[0][1]
revenue=data[0][2]
but don't understand why and how to go about constructing it in the most elegant way such that I have the variables rank, name and revenue contain the respective data elements.
Thank you to @mmfallacy above who suggested this answer that I am just fleshing out.
It works, but will accept the answer above as he suggested it.Here it is for reference:
{% for element in data %}
<tr>
<th scope="row"></th>
<td>{{element[0]}}</td>
<td>{{element[1]}}</td>
<td>{{element[2]}}</td>
</tr>
{% endfor %}
I simply deleted any tries to generate variables rank, revenue in the .py file.
这篇关于Python的Beautiful Soup:尝试以正确的方式在html页面上显示for循环的抓取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!