[已解决]requests.exceptions.ConnectTimeout怎么解决

诶呀二傻子是我 · 发表于 2020-5-26 19:00:02

结课作业想爬steam的评论区，但是在写代码时遇到了这种问题：requests.exceptions.ConnectTimeout:HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000019CEA1665E0>, 'Connection to steamcommunity.com timed out. (connect timeout=5)'))
代码如下

import requests
from bs4 import BeautifulSoup
url = 'https://steamcommunity.com/app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_'
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
'application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'close',
'Cookie': 'timezoneOffset=28800,0; _ga=GA1.2.2068924460.1568258364; browserid=1637567403399993326; '
'sessionid=0be65a169d686e464855602e; _gid=GA1.2.614169347.1590475272; '
'steamMachineAuth76561199007171441=CA87107A153BDC24FE6D54504A124395F9C36243; '
'deep_dive_carousel_focused_app=673750; deep_dive_carousel_method=gems; '
'steamCountry=HK%7C413cfd508f5a328ba1679f9252231a8e; '
'app_impressions=377530@1_7_7_topsellers_150_3|1245200@1_7_7_topsellers_150_3|632360'
'@1_7_7_topsellers_150_3|1245180@1_7_7_topsellers_150_3|1298590@1_7_7_topsellers_150_3|285900'
'@1_7_7_topsellers_150_3|632470@1_7_7_topsellers_150_2|813780@1_7_7_topsellers_150_2|1100600'
'@1_7_7_topsellers_150_2|1293820@1_7_7_topsellers_150_2|952860@1_7_7_topsellers_150_2|1209110'
'@1_7_7_topsellers_150_2|1015610@1_7_7_topsellers_150_3|1238440@1_7_7_topsellers_150_3|1042550'
'@1_7_7_topsellers_150_3|466560@1_7_7_topsellers_150_3|602960@1_7_7_topsellers_150_3|457140'
'@1_7_7_topsellers_150_3|793460@1_7_7_topsellers_150_3|512540@1_7_7_topsellers_150_3|619150'
'@1_7_7_topsellers_150_3|1100410@1_7_7_topsellers_150_3|1088780@1_7_7_topsellers_150_3|1100420'
'@1_7_7_topsellers_150_3|1201360@1_7_7_topsellers_150_3|642280@1_7_7_topsellers_150_3|673950'
'@1_7_7_topsellers_150_3|613830@1_7_7_topsellers_150_3|636480@1_7_7_topsellers_150_3|387990'
'@1_7_7_topsellers_150_3|414700@1_7_7_topsellers_150_3|382310@1_7_7_topsellers_150_3|1162520'
'@1_7_7_topsellers_150_2|289070@1_7_7_topsellers_150_2|307690@1_7_7_topsellers_150_2|973760'
'@1_7_7_topsellers_150_2|1273370@1_5_9__405; recentapps=%7B%22578080%22%3A1590476900%7D',
'Host': 'store.steampowered.com',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/80.0.3987.116 Safari/5 '
}
html = requests.get(url, headers=headers, timeout=5)
soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'apphub_Card'})
for review in reviews:
nick = review.find('div', {'class': 'apphub_CardContentAuthorName'})
title = review.find('div', {'class': 'title'}).text
hour = review.find('div', {'class': 'hours'}).text.split(' ')[0]
link = nick.find('a').attrs['href']
comment = review.find('div', {'class': 'apphub_CardTextContent'}).text
print(nick.text, title, hour, link, )
print(comment.split('')[3].strip(' '))

复制代码

最佳答案

月排行榜 / 总排行榜

Twilight6

2020-5-26 19:00:03

本帖最后由 Twilight6 于 2020-5-26 19:05 编辑

连接超时，应该是被反爬了或者要外、网 IP代、理吧?

跳转到最佳答案楼层

Twilight6 · 发表于 2020-5-26 19:00:03

这个最佳答案由 Twilight6 给出，感谢 Twilight6 的回答。

单击隐藏图章

本帖最后由 Twilight6 于 2020-5-26 19:05 编辑

连接超时，应该是被反爬了或者要外、网 IP代、理吧?

Twilight6 · 发表于 2020-5-26 19:02:49

本帖最后由 Twilight6 于 2020-5-27 07:28 编辑

诶呀二傻子是我 · 发表于 2020-5-26 19:03:57

去掉timeout后，报错是这样的：

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000128140DFD00>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。'))

复制代码

诶呀二傻子是我 · 发表于 2020-5-26 19:05:57

Twilight6 发表于 2020-5-26 19:03
连接超时，应该是被反爬了或者要 V劈n 吧

要v皮n的话，怎么解决呢？挂代理吗？

Twilight6 · 发表于 2020-5-26 19:08:11

诶呀二傻子是我发表于 2020-5-26 19:05
要v皮n的话，怎么解决呢？挂代理吗？

我不清楚，steam 不是大部分都需要运行加速器的...不然有的时候连steam的个人资料都访问不了

suchocolate · 发表于 2020-5-26 19:24:41

不用代理浏览器也登陆不上，超时正常。

诶呀二傻子是我 · 发表于 2020-5-26 19:33:28

suchocolate 发表于 2020-5-26 19:24
不用代理浏览器也登陆不上，超时正常。

那是不是没办法爬steam社区了

Twilight6 · 发表于 2020-5-26 19:46:21

诶呀二傻子是我发表于 2020-5-26 19:33
那是不是没办法爬steam社区了

去买个海外IP 哈哈
https://www.ipidea.net/?utm-source=bdtg&utm-keyword=?46

suchocolate · 发表于 2020-5-26 21:29:51

诶呀二傻子是我发表于 2020-5-26 19:33
那是不是没办法爬steam社区了

得使用代理或vpn。

诶呀二傻子是我 · 发表于 2020-5-26 21:37:27

果然挂了海外代理还是不行

Twilight6 · 发表于 2020-5-27 07:27:53

诶呀二傻子是我发表于 2020-5-26 21:37
果然挂了海外代理还是不行

还是不行吗？那就是被反爬了，去百度上找找有没有类似的看看人家是怎么爬的

诶呀二傻子是我 · 发表于 2020-5-27 15:15:59

买了个全局代理，解决了

现在有个新问题

import requests
from bs4 import BeautifulSoup
import json
# from selenium import webdriver
#
# driver = webdriver.Chrome(r'C:\Users\m1359\AppData\Local\Google\Chrome\Application\chromedriver.exe')
def sen_from_text(text):
SENTIMENT_URL = 'http://api.bosonnlp.com/sentiment/analysis'
h = {'X-Token': 'balbala'} # your token
data = json.dumps(text)
resp = requests.post(SENTIMENT_URL, headers=h, data=data.encode('utf-8'))
resp = json.loads(resp.text) # print(resp)
front = float(resp[0][0])
return front
headers = {'Accept-Language': 'zh-CN,zh;q=0.9',
'Host': 'steamcommunity.com',
'Referer': 'https://steamcommunity.com/app/578080',
'Connection': 'keep-alive',
'Cookie': '',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/83.0.4103.61 Safari/537.36'}
file = open('steam.txt', 'w+', encoding='utf-8')
for i in range(1, 10):
url = 'https://steamcommunity.com/app/578080/homecontent/?userreviewsoffset=' + str(10 * (i - 1)) + '&p=' + str(
i) + '&workshopitemspage=' + str(i) + '&readytouseitemspage=' + str(i) + '&mtxitemspage=' + str(
i) + '&itemspage=' + str(i) + '&screenshotspage=' + str(i) + '&videospage=' + str(i) + '&artpage=' + str(
i) + '&allguidepage=' + str(i) + '&webguidepage=' + str(i) + '&integratedguidepage=' + str(
i) + '&discussionspage=' + str(
i) + '&numperpage=10&browsefilter=trendweek&browsefilter=trendweek&appid=578080&appHubSubSection=10&l=schinese' \
'&filterLanguage=default&searchText=&forceanon=1'
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser') # 如果装了lxml，推荐把解析器改为lxml
reviews = soup.find_all('div', {'class': 'apphub_Card modalContentLink interactable'})
for review in reviews:
nick = review.find('div', {'class': 'apphub_CardContentAuthorName offline ellipsis'})
title = review.find('div', {'class': 'title'}).text
hour = review.find('div', {'class': 'hours'}).text.split(' ')[1]
link = nick.find('a').attrs['href']
comment = review.find('div', {'class': 'apphub_CardTextContent'}).text.split('\n')[2].strip('\t')
# sen = sen_from_text(comment)
print(nick.text, title, hour, link, comment)
print(type(nick))

复制代码

运行的时候会报错

Traceback (most recent call last):
File "C:/Users/m1359/PycharmProjects/test/pubg.py", line 97, in <module>
link = nick.find('a').attrs['href']
AttributeError: 'NoneType' object has no attribute 'find'

复制代码

也没能像这个博主一样实现翻页获取更多评论
https://www.tinymind.net.cn/articles/6c517fc1b33931

账号		自动登录	找回密码
密码			立即注册