问题描述
< table cellspacing =0rules =allborder =1id =MainContent_grdUsers2style =border-style:None; width:100%; border -collapse:崩溃;>
< tbody>< tr class =listHeader>
< th scope =colstyle =width:11%;> Name< / th>< th scope =colstyle =width:12%;> Password< < th scope =colstyle =width:16%;> Rights< th>< th scope =colstyle =width:10%;> Bureaus< / th> < th scope =colstyle =width:15%;> FullName< th scope =colstyle =width:16%;> Email< / th>< < th scope =colstyle =width:12%;>状态< / th>< th scope =colstyle =width:12%;> Logon Tries< / th>
< / tr>< tr>
< td> user1< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl02 $ txtManageUsersPasswordtype =textmaxlength =50id =MainContent_grdUsers2_txtManageUsersPassword_0style =width:95%; background-image:url ; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC&安培; QUOT);背景重复:无-repeat; background-attachment:scroll; background-size:16px 18px; background-position:98%50%; cursor:auto;自动填充= 关闭 >
< / td>< td align =center>
< option value =User>用户< / option>
< option selected =selectedvalue =Supervisor> Supervisor< / option>
< option value =管理员>管理员< / option>
< option value =小孩监督员>小孩监督员< / option>
< / select>
< / td>< td align =center>
< option value =255>高< / option>
< option selected =selectedvalue =128>中< / option>
< option value =0>低< / option>
< / select>
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl02 $ txtManageUsersFullNametype =textvalue =First1 Last1maxlength =50id =MainContent_grdUsers2_txtManageUsersFullName_0style =width:95%; background - 图像:网址(&安培; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC&安培; QUOT) ; background-repeat:no-repeat; background-attachment:scroll; background-size:16px 18px; background-position:98%50%; cursor:auto;自动填充= 关闭 >
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl02 $ txtManageUsersEmailtype =textvalue =user1@company.commaxlength =50id =MainContent_grdUsers2_txtManageUsersEmail_0style =width:95 %;背景图像:网址(安培; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC&安培; ); background-repeat:no-repeat; background-attachment:scroll; background-size:16px 18px; background-position:98%50%; cursor:auto;自动填充= 关闭 >
< / td>< td align =center>
< option value =Active> Active< / option>
< option selected =selectedvalue =Inactive> Inactive< / option>
< option value =Terminated>已终止< / option>
< / select>
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl02 $ txtManageUsersLogonTriestype =textvalue =0maxlength =1id =MainContent_grdUsers2_txtManageUsersLogonTries_0style =width:95%;>> ;
< / td>
< / tr>< tr style =background-color:#CED6E7;>
< td> user2< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl03 $ txtManageUsersPasswordtype =textmaxlength =50id =MainContent_grdUsers2_txtManageUsersPassword_1style =background-color:rgb(206,214,231) ;宽度:95%;背景图像:网址(安培; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC& quot;); background-repeat:no-repeat; background-attachment:scroll; background-size:16px 18px; background-position:98%50%;自动填充= 关闭 >
< / td>< td align =center>
< option value =User>用户< / option>
< option selected =selectedvalue =Supervisor> Supervisor< / option>
< option value =管理员>管理员< / option>
< option value =小孩监督员>小孩监督员< / option>
< / select>
< / td>< td align =center>
< option value =255>高< / option>
< option selected =selectedvalue =128>中< / option>
< option value =0>低< / option>
< / select>
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl03 $ txtManageUsersFullNametype =textvalue =First2 Last2maxlength =50id =MainContent_grdUsers2_txtManageUsersFullName_1style =background-color:rgb( 206,214,231);宽度:95%;背景图像:网址(安培; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC&安培; QUOT);背景重复:不重复;背景附件:滚动;背景大小:16px的部18px;背景位置:98%50%;光标:auto;自动填充= 关闭 >
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl03 $ txtManageUsersEmailtype =textvalue =user2@company.commaxlength =50id =MainContent_grdUsers2_txtManageUsersEmail_1style =background-color :RGB(206,214,231);宽度:95%;背景图像:网址(安培; QUOT;数据:图像/ PNG; BASE64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE + vvq6 + 6qTgthGH6O4 / jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf / 2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF / mePB2biRdhwHdF2HJEmgaRrwPA + qqoI4jle5 / 8XkXzrCFoHg + / 5ICdpm13UTho7Q9 / 0WnsfwiL / ouHwHrJgQR8WEwVG + oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD / D3Np7AHOIrvNpHmPI + Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb / AGVLcWwc7Ng3AAAAAElFTkSuQmCC&安培; QUOT);背景重复:不重复;背景附件:滚动;背景大小:16px的部18px;背景位置:98%50%;光标:汽车;自动填充= 关闭 >
< / td>< td align =center>
< option selected =selectedvalue =Active> Active< / option>
< option value =无效>无效< / option>
< option value =Terminated>已终止< / option>
< / select>
< / td>< td align =center>
< input name =ctl00 $ MainContent $ grdUsers2 $ ctl03 $ txtManageUsersLogonTriestype =textvalue =0maxlength =1id =MainContent_grdUsers2_txtManageUsersLogonTries_1style =background-color:#CED6E7;宽度:95%;>
< / td>
< / tr>
< / tbody>
< / table>
我试图刮取包含文本,下拉选项和值的表格。结果如下:
user1 |主管|中等| First1 Last1 | user1@company.com |无效
user2 |主管|中等| First2 Last2 | user2@company.com |活跃
打算输出到csv。到目前为止,我有:
headers = [c.get_text(strip = True)for c in soup.find('tr' ,attrs = {'class':'listHeader'})。findAll('th')]
#find_all在这里不起作用,它只是为soup.find中的table获取一个
('table',attrs = {'id':'MainContent_grdUsers2'}):
try:
column3 =(table.find(option,attrs = {selected:selected} ).get('value'))
除外:
continue
#this只抓取特定的单元格
用于soup.find中的表格('table', ('输入',attrs = {id:MainContent_grdUsers2_txtManageUsersEmail_0})。get('值'))
除外:
continue
我可以单独进入抓住我想要的单元格,但这张表格中有大约100行记录,我发现很难弄清楚如何抓住它在一次,因为不仅有文本,而且还有下拉选项值和值。有没有办法用Beautifulsoup来做到这一点?
更新后的代码:
headers = [c.get_text(strip = True)for soup.find('tr',attrs = {'class':'listHeader'})。findAll('th')] $对于table.find_all中的tr,b $ b table = soup.find('table',attrs = {'id':'MainContent_grdUsers2'})
data = []
')[1:]:
td = tr.find_all('td')
try:
data + = [
[
td [0] .getText (),
td [2] .find('option',{'selected':'selected'})。getText(),
td [3] .find('option',{如果value是None:
,那么get('value'),
。
td [5] .find('input')。get('value'),
td [6] .find('option',{'selected':'selected '})。getText()
]
]
除了例外,例如:
#print(ex)##你可以取消注释此行以进行调试##
继续
作为数据中的行:
print(''.join(row))
考虑到您提供的html,这应该可以工作:
如果soup.find('tr',attrs = {'class':'listHeader'}):
headers = [
'none'if c is None else c.get_text(strip = True)
for soup.find('tr',attrs = {'class':'listHeader'})。findAll('th')
]
else:
header = None
table = soup.find('table',attrs = {'id':'MainContent_grdUsers2'})
data = []
for tr in table.find_all('tr')[1:]:
td = tr.find_all('td')
try:
data + = [
[
td [0] .getText(),
td [2] .find('option',{'selected':'selected'})。getText(),
td [3] .find('option',{'selected':'selected'} ).getText(),
td [4] .find('input')。get('value'),
td [5] .find('input')。get('value' ),
td [6] .find('option',{'selected':'selected'})。getText()
]
]
除外例如:
#print(ex)##你可以取消注释这一行以进行调试##
continue
用于数据中的行:
print(''.join(str (r)for r in))
输出:
user1 Supervisor Medium First1 Last1 user1@company.com不活动
user2 Supervisor中First2 Last2 user2@company.com活跃
<table cellspacing="0" rules="all" border="1" id="MainContent_grdUsers2" style="border-style:None;width:100%;border-collapse:collapse;">
<tbody><tr class="listHeader">
<th scope="col" style="width:11%;">Name</th><th scope="col" style="width:12%;">Password</th><th scope="col" style="width:16%;">Rights</th><th scope="col" style="width:10%;">Bureaus</th><th scope="col" style="width:15%;">FullName</th><th scope="col" style="width:16%;">Email</th><th scope="col" style="width:12%;">Status</th><th scope="col" style="width:12%;">Logon Tries</th>
</tr><tr>
<td>user1</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl02$txtManageUsersPassword" type="text" maxlength="50" id="MainContent_grdUsers2_txtManageUsersPassword_0" style="width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" autocomplete="off">
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl02$ddlManageUsersRights" id="MainContent_grdUsers2_ddlManageUsersRights_0" style="width:95%;">
<option value="User">User</option>
<option selected="selected" value="Supervisor">Supervisor</option>
<option value="Administrator">Administrator</option>
<option value="Child Supervisor">Child Supervisor</option>
</select>
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl02$ddlManageUsersBureaus" id="MainContent_grdUsers2_ddlManageUsersBureaus_0" style="width:95%;">
<option value="255">High</option>
<option selected="selected" value="128">Medium</option>
<option value="0">Low</option>
</select>
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl02$txtManageUsersFullName" type="text" value="First1 Last1" maxlength="50" id="MainContent_grdUsers2_txtManageUsersFullName_0" style="width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" autocomplete="off">
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl02$txtManageUsersEmail" type="text" value="user1@company.com" maxlength="50" id="MainContent_grdUsers2_txtManageUsersEmail_0" style="width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" autocomplete="off">
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl02$ddlManageUsersStatus" id="MainContent_grdUsers2_ddlManageUsersStatus_0" style="width:95%;">
<option value="Active">Active</option>
<option selected="selected" value="Inactive">Inactive</option>
<option value="Terminated">Terminated</option>
</select>
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl02$txtManageUsersLogonTries" type="text" value="0" maxlength="1" id="MainContent_grdUsers2_txtManageUsersLogonTries_0" style="width:95%;">
</td>
</tr><tr style="background-color:#CED6E7;">
<td>user2</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl03$txtManageUsersPassword" type="text" maxlength="50" id="MainContent_grdUsers2_txtManageUsersPassword_1" style="background-color: rgb(206, 214, 231); width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%;" autocomplete="off">
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl03$ddlManageUsersRights" id="MainContent_grdUsers2_ddlManageUsersRights_1" style="background-color:#CED6E7;width:95%;">
<option value="User">User</option>
<option selected="selected" value="Supervisor">Supervisor</option>
<option value="Administrator">Administrator</option>
<option value="Child Supervisor">Child Supervisor</option>
</select>
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl03$ddlManageUsersBureaus" id="MainContent_grdUsers2_ddlManageUsersBureaus_1" style="background-color:#CED6E7;width:95%;">
<option value="255">High</option>
<option selected="selected" value="128">Medium</option>
<option value="0">Low</option>
</select>
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl03$txtManageUsersFullName" type="text" value="First2 Last2" maxlength="50" id="MainContent_grdUsers2_txtManageUsersFullName_1" style="background-color: rgb(206, 214, 231); width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" autocomplete="off">
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl03$txtManageUsersEmail" type="text" value="user2@company.com" maxlength="50" id="MainContent_grdUsers2_txtManageUsersEmail_1" style="background-color: rgb(206, 214, 231); width: 95%; background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAUBJREFUOBGVVE2ORUAQLvIS4gwzEysHkHgnkMiEc4zEJXCMNwtWTmDh3UGcYoaFhZUFCzFVnu4wIaiE+vvq6+6qTgthGH6O4/jA7x1OiCAIPwj7CoLgSXDxSjEVzAt9k01CBKdWfsFf/2WNuEwc2YqigKZpK9glAlVVwTTNbQJZlnlCkiTAZnF/mePB2biRdhwHdF2HJEmgaRrwPA+qqoI4jle5/8XkXzrCFoHg+/5ICdpm13UTho7Q9/0WnsfwiL/ouHwHrJgQR8WEwVG+oXpMPaDAkdzvd7AsC8qyhCiKJjiRnCKwbRsMw9hcQ5zv9maSBeu6hjRNYRgGFuKaCNwjkjzPoSiK1d1gDDecQobOBwswzabD/D3Np7AHOIrvNpHmPI+Kc2RZBm3bcp8wuwSIot7QQ0PznoR6wYSK0Xb/AGVLcWwc7Ng3AAAAAElFTkSuQmCC"); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" autocomplete="off">
</td><td align="center">
<select name="ctl00$MainContent$grdUsers2$ctl03$ddlManageUsersStatus" id="MainContent_grdUsers2_ddlManageUsersStatus_1" style="background-color:#CED6E7;width:95%;">
<option selected="selected" value="Active">Active</option>
<option value="Inactive">Inactive</option>
<option value="Terminated">Terminated</option>
</select>
</td><td align="center">
<input name="ctl00$MainContent$grdUsers2$ctl03$txtManageUsersLogonTries" type="text" value="0" maxlength="1" id="MainContent_grdUsers2_txtManageUsersLogonTries_1" style="background-color:#CED6E7;width:95%;">
</td>
</tr>
</tbody>
</table>
I am trying to scrape a table that contains text, dropdown options, and values. The result would look:user1 | Supervisor | Medium | First1 Last1 | user1@company.com | Inactive
user2 | Supervisor | Medium | First2 Last2 | user2@company.com | Active
Intended to be outputted to csv. So far I have:
headers = [c.get_text(strip=True) for c in soup.find('tr', attrs={'class':'listHeader'}).findAll('th')]
#find_all doesn't work here it just grabs one
for table in soup.find('table', attrs={'id':'MainContent_grdUsers2'}):
try:
column3=(table.find("option", attrs={"selected": "selected"}).get('value'))
except:
continue
#this only grabs a specific cell
for table in soup.find('table', attrs={'id':'MainContent_grdUsers2'}):
try:
column6=(table.find("input", attrs={"id": "MainContent_grdUsers2_txtManageUsersEmail_0"}).get('value'))
except:
continue
I can go in and individually grab the cells I want but there are around 100 rows of records in this table and I am finding it difficult to figure out how to grab it all at once since there isn't just text, but dropdown option values, and values. Is there a way to do this with Beautifulsoup? I tried briefly with pandas and lxml but I have never used those before.
Updated code:
headers = [c.get_text(strip=True) for c in soup.find('tr', attrs={'class':'listHeader'}).findAll('th')]
table = soup.find('table', attrs={'id':'MainContent_grdUsers2'})
data = []
for tr in table.find_all('tr')[1:] :
td = tr.find_all('td')
try :
data += [
[
td[0].getText() ,
td[2].find('option', {'selected':'selected'}).getText(),
td[3].find('option', {'selected':'selected'}).getText(),
td[4].find('input').get('value'),
if value is None:
continue
td[5].find('input').get('value'),
td[6].find('option', {'selected':'selected'}).getText()
]
]
except Exception as ex :
#print(ex) ## you can uncomment this line for debugging ##
continue
for row in data :
print(' '.join(row))
Given the html you provided, this should work :
if soup.find('tr', attrs={'class':'listHeader'}) :
headers = [
'none' if c is None else c.get_text(strip=True)
for c in soup.find('tr', attrs={'class':'listHeader'}).findAll('th')
]
else :
headers = None
table = soup.find('table', attrs={'id':'MainContent_grdUsers2'})
data = []
for tr in table.find_all('tr')[1:] :
td = tr.find_all('td')
try :
data += [
[
td[0].getText() ,
td[2].find('option', {'selected':'selected'}).getText(),
td[3].find('option', {'selected':'selected'}).getText(),
td[4].find('input').get('value'),
td[5].find('input').get('value'),
td[6].find('option', {'selected':'selected'}).getText()
]
]
except Exception as ex :
#print(ex) ## you can uncomment this line for debugging ##
continue
for row in data :
print(' '.join(str(r) for r in row))
Output:
user1 Supervisor Medium First1 Last1 user1@company.com Inactive
user2 Supervisor Medium First2 Last2 user2@company.com Active
这篇关于使用Beautifulsoup和Python刮取复杂的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!