python - 使用request_html时无法按预期提取结果

我无法使用“请求”html提取正确的结果：

>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://www.amazon.com/dp/B07569DYGN')
>>> r.html.find("#productDetails_detailBullets_sections1")
[]

我可以在源内容中找到“productDetails\u detailBullets”部分1：

>>> """<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">""" in r.text
True

事实上，这个问题在PyQuery同样存在。
为什么找不到这个元素？

最佳答案

我在寻找仍能找到东西的#comparison_price_row。源中的下一个id是comparison_shipping_info_row，但搜索#comparison_shipping_info_row将返回一个空数组。这两个元素位于同一级别（同一父级）。我检查了两者之间的所有来源，但没有发现任何问题。
一开始。
然后我看到在两者之间有一个NUL字节，这可能会使库出错。
从输入中删除NUL字节后，可以找到所需的元素：

r2 = requests_html.HTML(html=r.text.replace('\0', ''))
r2.find('#productDetails_detailBullets_sections1')

[<Element 'table' role='presentation' class=('a-keyvalue', 'prodDetTable') id='productDetails_detailBullets_sections1'>]

关于python - 使用request_html时无法按预期提取结果，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/52699466/