我正试着从TripAdvisor中搜集某个酒店的数据。
TripAdvisor中酒店的URL是
https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html
页面分隔符出现在“d92240评论”后面,键“-or5-”是5的倍数,页面返回5个评论。

https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or5-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html
https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or10-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html
对于url以“&start=(结果数)”结尾的页面,我可以创建for循环来返回每个页面

for i in range(0,200,5):
  url = http://blahblahblah&start= + str(i)

但是我不知道如何使用我的tripadvisor url。

最佳答案

干得好:

initial='https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html'
url_part1='https://www.tripadvisor.com/Hotel_Review-g39143-d92240-Reviews-or'
url_part2='-Hawthorn_Suites_by_Wyndham_Wichita_East-Wichita_Kansas.html'
print (initial)
for index in range (5,200, 5):
    print(url_part1+str(index)+url_part2)

08-16 03:49