本文介绍了如何使用Python将Steam中的游戏评论中的所有Steam ID,评论内容,profile_url抓取到excel文件中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

#错误是它只打印前11条评论(使用n< 500时)或根本不打印(使用True:时).要求是将游戏评论中的所有Steam ID,评论内容和profile_url保存到excel中.

#The error is either it prints only first 11 reviews (when while n<500 is used) or does not print at all(when while True: is used). Requirement is to save all Steam id, review content, profile_url from reviews of the game into excel.

from msedge.selenium_tools import Edge, EdgeOptions
from selenium.webdriver.common.keys import Keys
import re
from time import sleep
from datetime import datetime
from openpyxl import Workbook

game_id= 1097150
url = 'https://steamcommunity.com/app/1097150/positivereviews/?p=1&browsefilter=trendweek&filterLanguage=english'

options = EdgeOptions()
options.use_chromium = True
driver = Edge(options=options)
driver.get(url)

#页面不断滚动,开始抓取

#The page is continously scrolling, and scraping begins

last_position = driver.execute_script("return window.pageYOffset;")
reviews = []
review_ids = set()

while True:
  cards = driver.find_elements_by_class_name('apphub_Card')
  for card in cards[-20:]:
    profile_url = card.find_element_by_xpath('.//div[@class="apphub_friend_block"]/div/a[2]').get_attribute('href')
    steam_id = profile_url.split('/')[-2]
    date_posted = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]/div').text
    review_content = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]').text.replace(date_posted,'').strip()

    review = (steam_id, profile_url, review_content)
    reviews.append(review)

  attempt_count = 0
  while attempt_count < 3:
       driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
       curr_position = driver.execute_script("return window.pageYOffset;")

       if curr_position == last_position:
             attempt_count += 1
             sleep(0.5)
         else:
             break
driver.close()

#保存结果

wb = Workbook()
ws = wb.worksheets[0]
ws.append(['SteamId', 'ProfileURL', 'ReviewText'])
for row in reviews:
    ws.append(row)

today = datetime.today().strftime('%Y%m%d')
wb.save(f'Steam_Reviews_{game_id}_{today}.xlsx')
wb.close()

推荐答案

以下是无限滚动或直到您的情况下达到500个元素的方法.

Here's how to infinitely scroll down or until 500 elements in your case.

while True:
  cards = driver.find_elements_by_class_name('apphub_Card')
  if(len(cards)>=500):
      break
  last_position = driver.execute_script("return window.pageYOffset;")
  driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  time.sleep(1)
  curr_position = driver.execute_script("return window.pageYOffset;")
  if(last_position==curr_position):
      break

for card in cards[:500]:
    profile_url = card.find_element_by_xpath('.//div[@class="apphub_friend_block"]/div/a[2]').get_attribute('href')
    steam_id = profile_url.split('/')[-2]
    date_posted = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]/div').text
    review_content = card.find_element_by_xpath('.//div[@class="apphub_CardTextContent"]').text.replace(date_posted,'').strip()
    review = (steam_id, profile_url, review_content)
    reviews.append(review)

这篇关于如何使用Python将Steam中的游戏评论中的所有Steam ID,评论内容,profile_url抓取到excel文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 11:50
查看更多