requests.exceptions.ConnectTimeout怎么解决
结课作业想爬steam的评论区,但是在写代码时遇到了这种问题:requests.exceptions.ConnectTimeout:HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000019CEA1665E0>, 'Connection to steamcommunity.com timed out. (connect timeout=5)'))代码如下
import requests
from bs4 import BeautifulSoup
url = 'https://steamcommunity.com/app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
'application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'close',
'Cookie': 'timezoneOffset=28800,0; _ga=GA1.2.2068924460.1568258364; browserid=1637567403399993326; '
'sessionid=0be65a169d686e464855602e; _gid=GA1.2.614169347.1590475272; '
'steamMachineAuth76561199007171441=CA87107A153BDC24FE6D54504A124395F9C36243; '
'deep_dive_carousel_focused_app=673750; deep_dive_carousel_method=gems; '
'steamCountry=HK%7C413cfd508f5a328ba1679f9252231a8e; '
'app_impressions=377530@1_7_7_topsellers_150_3|1245200@1_7_7_topsellers_150_3|632360'
'@1_7_7_topsellers_150_3|1245180@1_7_7_topsellers_150_3|1298590@1_7_7_topsellers_150_3|285900'
'@1_7_7_topsellers_150_3|632470@1_7_7_topsellers_150_2|813780@1_7_7_topsellers_150_2|1100600'
'@1_7_7_topsellers_150_2|1293820@1_7_7_topsellers_150_2|952860@1_7_7_topsellers_150_2|1209110'
'@1_7_7_topsellers_150_2|1015610@1_7_7_topsellers_150_3|1238440@1_7_7_topsellers_150_3|1042550'
'@1_7_7_topsellers_150_3|466560@1_7_7_topsellers_150_3|602960@1_7_7_topsellers_150_3|457140'
'@1_7_7_topsellers_150_3|793460@1_7_7_topsellers_150_3|512540@1_7_7_topsellers_150_3|619150'
'@1_7_7_topsellers_150_3|1100410@1_7_7_topsellers_150_3|1088780@1_7_7_topsellers_150_3|1100420'
'@1_7_7_topsellers_150_3|1201360@1_7_7_topsellers_150_3|642280@1_7_7_topsellers_150_3|673950'
'@1_7_7_topsellers_150_3|613830@1_7_7_topsellers_150_3|636480@1_7_7_topsellers_150_3|387990'
'@1_7_7_topsellers_150_3|414700@1_7_7_topsellers_150_3|382310@1_7_7_topsellers_150_3|1162520'
'@1_7_7_topsellers_150_2|289070@1_7_7_topsellers_150_2|307690@1_7_7_topsellers_150_2|973760'
'@1_7_7_topsellers_150_2|1273370@1_5_9__405; recentapps=%7B%22578080%22%3A1590476900%7D',
'Host': 'store.steampowered.com',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/80.0.3987.116 Safari/5 '
}
html = requests.get(url, headers=headers, timeout=5)
soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'apphub_Card'})
for review in reviews:
nick = review.find('div', {'class': 'apphub_CardContentAuthorName'})
title = review.find('div', {'class': 'title'}).text
hour = review.find('div', {'class': 'hours'}).text.split(' ')
link = nick.find('a').attrs['href']
comment = review.find('div', {'class': 'apphub_CardTextContent'}).text
print(nick.text, title, hour, link, )
print(comment.split('').strip(' '))
本帖最后由 Twilight6 于 2020-5-26 19:05 编辑
连接超时,应该是被反爬了或者要 外 、网 IP代、理 吧? 本帖最后由 Twilight6 于 2020-5-27 07:28 编辑
{:10_277:} 去掉timeout后,报错是这样的:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000128140DFD00>: Failed to establish a new connection: 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。'))
Twilight6 发表于 2020-5-26 19:03
连接超时,应该是被反爬了或者要 V劈n 吧
要v皮n的话,怎么解决呢?挂代理吗? 诶呀二傻子是我 发表于 2020-5-26 19:05
要v皮n的话,怎么解决呢?挂代理吗?
我不清楚,steam 不是大部分都需要运行加速器的...不然有的时候连steam的个人资料都访问不了 不用代理浏览器也登陆不上,超时正常。 suchocolate 发表于 2020-5-26 19:24
不用代理浏览器也登陆不上,超时正常。
那是不是没办法爬steam社区了{:10_250:} 诶呀二傻子是我 发表于 2020-5-26 19:33
那是不是没办法爬steam社区了
去买个海外IP 哈哈
https://www.ipidea.net/?utm-source=bdtg&utm-keyword=?46 诶呀二傻子是我 发表于 2020-5-26 19:33
那是不是没办法爬steam社区了
得使用代理或vpn。 果然挂了海外代理还是不行 诶呀二傻子是我 发表于 2020-5-26 21:37
果然挂了海外代理还是不行
还是不行吗?那就是被反爬了,去百度上找找有没有类似的 看看人家是怎么爬的 买了个全局代理,解决了{:10_266:}现在有个新问题
import requests
from bs4 import BeautifulSoup
import json
# from selenium import webdriver
#
# driver = webdriver.Chrome(r'C:\Users\m1359\AppData\Local\Google\Chrome\Application\chromedriver.exe')
def sen_from_text(text):
SENTIMENT_URL = 'http://api.bosonnlp.com/sentiment/analysis'
h = {'X-Token': 'balbala'}# your token
data = json.dumps(text)
resp = requests.post(SENTIMENT_URL, headers=h, data=data.encode('utf-8'))
resp = json.loads(resp.text)# print(resp)
front = float(resp)
return front
headers = {'Accept-Language': 'zh-CN,zh;q=0.9',
'Host': 'steamcommunity.com',
'Referer': 'https://steamcommunity.com/app/578080',
'Connection': 'keep-alive',
'Cookie': '',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/83.0.4103.61 Safari/537.36'}
file = open('steam.txt', 'w+', encoding='utf-8')
for i in range(1, 10):
url = 'https://steamcommunity.com/app/578080/homecontent/?userreviewsoffset=' + str(10 * (i - 1)) + '&p=' + str(
i) + '&workshopitemspage=' + str(i) + '&readytouseitemspage=' + str(i) + '&mtxitemspage=' + str(
i) + '&itemspage=' + str(i) + '&screenshotspage=' + str(i) + '&videospage=' + str(i) + '&artpage=' + str(
i) + '&allguidepage=' + str(i) + '&webguidepage=' + str(i) + '&integratedguidepage=' + str(
i) + '&discussionspage=' + str(
i) + '&numperpage=10&browsefilter=trendweek&browsefilter=trendweek&appid=578080&appHubSubSection=10&l=schinese' \
'&filterLanguage=default&searchText=&forceanon=1'
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')# 如果装了lxml,推荐把解析器改为lxml
reviews = soup.find_all('div', {'class': 'apphub_Card modalContentLink interactable'})
for review in reviews:
nick = review.find('div', {'class': 'apphub_CardContentAuthorName offline ellipsis'})
title = review.find('div', {'class': 'title'}).text
hour = review.find('div', {'class': 'hours'}).text.split(' ')
link = nick.find('a').attrs['href']
comment = review.find('div', {'class': 'apphub_CardTextContent'}).text.split('\n').strip('\t')
# sen = sen_from_text(comment)
print(nick.text, title, hour, link, comment)
print(type(nick))
运行的时候会报错Traceback (most recent call last):
File "C:/Users/m1359/PycharmProjects/test/pubg.py", line 97, in <module>
link = nick.find('a').attrs['href']
AttributeError: 'NoneType' object has no attribute 'find'
也没能像这个博主一样实现翻页获取更多评论
https://www.tinymind.net.cn/articles/6c517fc1b33931
页:
[1]