问题描述
我正在使用Beautiful Soup解析HTML表.
I am using Beautiful Soup to parse an HTML table.
- Python 3.2版
- Beautiful Soup版本4.1.3
尝试使用findAll方法查找行中的列时遇到问题.我收到一个错误,说列表对象没有属性findAll.我在另一篇关于堆栈交换的文章中找到了这种方法,而这并不是那里的问题. ( BeautifulSoup HTML表解析)
I am running into an issue when trying to use the findAll method to find the columns within my rows. I get an error that says list object has no attribute findAll. I found this method through another post on stack exchange and this was not an issue there. (BeautifulSoup HTML table parsing)
我意识到findAll是BeautifulSoup的一种方法,而不是python列表.奇怪的是,当我找到表列表中的行(我只需要页面上的第二个表)时,但是当我尝试在行列表中查找列时,findAll方法起作用.
I realize that findAll is a method of BeautifulSoup, not python lists. The weird part is the findAll method works when I find the rows within the table list (I only need the 2nd table on the page), but when I attempt to find the columns in the rows list.
这是我的代码:
from urllib.request import URLopener
from bs4 import BeautifulSoup
opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)
table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)
推荐答案
findAll()
返回结果列表,您需要遍历这些结果或选择一个使用自己的findAll()
方法访问另一个包含的元素:
findAll()
returns a result list, you'd need to loop over those or pick one to get to another contained element with it's own findAll()
method:
table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
cols = rows.findAll('td')
print(cols)
或选择一个行:
table = soup.findAll('table')[1]
rows = table.findAll('tr')
cols = rows[0].findAll('td') # columns of the *first* row.
print(cols)
请注意,不建议使用findAll
,而应使用find_all()
.
Note that findAll
is deprecated, you should use find_all()
instead.
这篇关于python&美丽的汤-搜索结果字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!