我无法获得后续页面的标题。问题出在哪里?
from bs4 import BeautifulSoup
import urllib.request
# First page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows title as expected
# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'lxml')
print(soup.title) # shows None
最佳答案
不确定为什么只有第二种情况失败了。如其他SO thread所述,有时使用其他解析器可能会起作用。
我可以让第二页与html.parser
一起正常工作。虽然它发出了有关解码错误的警告。
from bs4 import BeautifulSoup
import urllib.request
# Second page
source = urllib.request.urlopen('https://yeniemlak.az/elan/axtar?emlak=1&elan_nov=1&seher=0&metro=0&qiymet=&qiymet2=&mertebe=&mertebe2=&otaq=&otaq2=&sahe_m=&sahe_m2=&sahe_s=&sahe_s2=&page=2').read()
soup = BeautifulSoup(source,'html.parser')
print(soup.title) # Now works
输出量
Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
<title>YENIEMLAK.AZ Satılır Bina ev menzil </title>
关于python - BeautifulSoup在后续页面上不起作用,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58912364/