本文介绍了python&美丽的汤-搜索结果字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Beautiful Soup解析HTML表.

I am using Beautiful Soup to parse an HTML table.

  • Python 3.2版
  • Beautiful Soup版本4.1.3

尝试使用findAll方法查找行中的列时遇到问题.我收到一个错误,说列表对象没有属性findAll.我在另一篇关于堆栈交换的文章中找到了这种方法,而这并不是那里的问题. ( BeautifulSoup HTML表解析)

I am running into an issue when trying to use the findAll method to find the columns within my rows. I get an error that says list object has no attribute findAll. I found this method through another post on stack exchange and this was not an issue there. (BeautifulSoup HTML table parsing)

我意识到findAll是BeautifulSoup的一种方法,而不是python列表.奇怪的是,当我找到表列表中的行(我只需要页面上的第二个表)时,但是当我尝试在行列表中查找列时,findAll方法起作用.

I realize that findAll is a method of BeautifulSoup, not python lists. The weird part is the findAll method works when I find the rows within the table list (I only need the 2nd table on the page), but when I attempt to find the columns in the rows list.

这是我的代码:

from urllib.request import URLopener
from bs4 import BeautifulSoup

opener = URLopener() #Open the URL Connection
page = opener.open("http://www.labormarketinfo.edd.ca.gov/majorer/countymajorer.asp?CountyCode=000001") #Open the page
soup = BeautifulSoup(page)

table = soup.findAll('table')[1] #Get the 2nd table (index 1)
rows = table.findAll('tr') #findAll works here
cols = rows.findAll('td') #findAll fails here
print(cols)

推荐答案

findAll()返回结果列表,您需要遍历这些结果或选择一个使用自己的findAll()方法访问另一个包含的元素:

findAll() returns a result list, you'd need to loop over those or pick one to get to another contained element with it's own findAll() method:

table = soup.findAll('table')[1]
rows = table.findAll('tr')
for row in rows:
    cols = rows.findAll('td')
    print(cols)

或选择一个行:

table = soup.findAll('table')[1]
rows = table.findAll('tr')
cols = rows[0].findAll('td')  # columns of the *first* row.
print(cols)

请注意,不建议使用findAll,而应使用find_all().

Note that findAll is deprecated, you should use find_all() instead.

这篇关于python&美丽的汤-搜索结果字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 07:48