结课作业想爬steam的评论区,但是在写代码时遇到了这种问题:requests.exceptions.ConnectTimeout:HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000019CEA1665E0>, 'Connection to steamcommunity.com timed out. (connect timeout=5)'))
代码如下import requests
from bs4 import BeautifulSoup
url = 'https://steamcommunity.com/app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
'application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'close',
'Cookie': 'timezoneOffset=28800,0; _ga=GA1.2.2068924460.1568258364; browserid=1637567403399993326; '
'sessionid=0be65a169d686e464855602e; _gid=GA1.2.614169347.1590475272; '
'steamMachineAuth76561199007171441=CA87107A153BDC24FE6D54504A124395F9C36243; '
'deep_dive_carousel_focused_app=673750; deep_dive_carousel_method=gems; '
'steamCountry=HK%7C413cfd508f5a328ba1679f9252231a8e; '
'app_impressions=377530@1_7_7_topsellers_150_3|1245200@1_7_7_topsellers_150_3|632360'
'@1_7_7_topsellers_150_3|1245180@1_7_7_topsellers_150_3|1298590@1_7_7_topsellers_150_3|285900'
'@1_7_7_topsellers_150_3|632470@1_7_7_topsellers_150_2|813780@1_7_7_topsellers_150_2|1100600'
'@1_7_7_topsellers_150_2|1293820@1_7_7_topsellers_150_2|952860@1_7_7_topsellers_150_2|1209110'
'@1_7_7_topsellers_150_2|1015610@1_7_7_topsellers_150_3|1238440@1_7_7_topsellers_150_3|1042550'
'@1_7_7_topsellers_150_3|466560@1_7_7_topsellers_150_3|602960@1_7_7_topsellers_150_3|457140'
'@1_7_7_topsellers_150_3|793460@1_7_7_topsellers_150_3|512540@1_7_7_topsellers_150_3|619150'
'@1_7_7_topsellers_150_3|1100410@1_7_7_topsellers_150_3|1088780@1_7_7_topsellers_150_3|1100420'
'@1_7_7_topsellers_150_3|1201360@1_7_7_topsellers_150_3|642280@1_7_7_topsellers_150_3|673950'
'@1_7_7_topsellers_150_3|613830@1_7_7_topsellers_150_3|636480@1_7_7_topsellers_150_3|387990'
'@1_7_7_topsellers_150_3|414700@1_7_7_topsellers_150_3|382310@1_7_7_topsellers_150_3|1162520'
'@1_7_7_topsellers_150_2|289070@1_7_7_topsellers_150_2|307690@1_7_7_topsellers_150_2|973760'
'@1_7_7_topsellers_150_2|1273370@1_5_9__405; recentapps=%7B%22578080%22%3A1590476900%7D',
'Host': 'store.steampowered.com',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/80.0.3987.116 Safari/5 '
}
html = requests.get(url, headers=headers, timeout=5)
soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'apphub_Card'})
for review in reviews:
nick = review.find('div', {'class': 'apphub_CardContentAuthorName'})
title = review.find('div', {'class': 'title'}).text
hour = review.find('div', {'class': 'hours'}).text.split(' ')[0]
link = nick.find('a').attrs['href']
comment = review.find('div', {'class': 'apphub_CardTextContent'}).text
print(nick.text, title, hour, link, )
print(comment.split('')[3].strip(' '))
本帖最后由 Twilight6 于 2020-5-26 19:05 编辑
连接超时,应该是被反爬了或者要 外 、网 IP代、理 吧?
|