诶呀二傻子是我 发表于 2020-5-26 19:00:02

requests.exceptions.ConnectTimeout怎么解决

结课作业想爬steam的评论区,但是在写代码时遇到了这种问题:requests.exceptions.ConnectTimeout:HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000019CEA1665E0>, 'Connection to steamcommunity.com timed out. (connect timeout=5)'))
代码如下
import requests
from bs4 import BeautifulSoup

url = 'https://steamcommunity.com/app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_'
headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,'
            'application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Cache-Control': 'max-age=0',
    'Connection': 'close',
    'Cookie': 'timezoneOffset=28800,0; _ga=GA1.2.2068924460.1568258364; browserid=1637567403399993326; '
            'sessionid=0be65a169d686e464855602e; _gid=GA1.2.614169347.1590475272; '
            'steamMachineAuth76561199007171441=CA87107A153BDC24FE6D54504A124395F9C36243; '
            'deep_dive_carousel_focused_app=673750; deep_dive_carousel_method=gems; '
            'steamCountry=HK%7C413cfd508f5a328ba1679f9252231a8e; '
            'app_impressions=377530@1_7_7_topsellers_150_3|1245200@1_7_7_topsellers_150_3|632360'
            '@1_7_7_topsellers_150_3|1245180@1_7_7_topsellers_150_3|1298590@1_7_7_topsellers_150_3|285900'
            '@1_7_7_topsellers_150_3|632470@1_7_7_topsellers_150_2|813780@1_7_7_topsellers_150_2|1100600'
            '@1_7_7_topsellers_150_2|1293820@1_7_7_topsellers_150_2|952860@1_7_7_topsellers_150_2|1209110'
            '@1_7_7_topsellers_150_2|1015610@1_7_7_topsellers_150_3|1238440@1_7_7_topsellers_150_3|1042550'
            '@1_7_7_topsellers_150_3|466560@1_7_7_topsellers_150_3|602960@1_7_7_topsellers_150_3|457140'
            '@1_7_7_topsellers_150_3|793460@1_7_7_topsellers_150_3|512540@1_7_7_topsellers_150_3|619150'
            '@1_7_7_topsellers_150_3|1100410@1_7_7_topsellers_150_3|1088780@1_7_7_topsellers_150_3|1100420'
            '@1_7_7_topsellers_150_3|1201360@1_7_7_topsellers_150_3|642280@1_7_7_topsellers_150_3|673950'
            '@1_7_7_topsellers_150_3|613830@1_7_7_topsellers_150_3|636480@1_7_7_topsellers_150_3|387990'
            '@1_7_7_topsellers_150_3|414700@1_7_7_topsellers_150_3|382310@1_7_7_topsellers_150_3|1162520'
            '@1_7_7_topsellers_150_2|289070@1_7_7_topsellers_150_2|307690@1_7_7_topsellers_150_2|973760'
            '@1_7_7_topsellers_150_2|1273370@1_5_9__405; recentapps=%7B%22578080%22%3A1590476900%7D',
    'Host': 'store.steampowered.com',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/80.0.3987.116 Safari/5 '
}
html = requests.get(url, headers=headers, timeout=5)

soup = BeautifulSoup(html, 'html.parser')
reviews = soup.find_all('div', {'class': 'apphub_Card'})
for review in reviews:
    nick = review.find('div', {'class': 'apphub_CardContentAuthorName'})
    title = review.find('div', {'class': 'title'}).text
    hour = review.find('div', {'class': 'hours'}).text.split(' ')
    link = nick.find('a').attrs['href']
    comment = review.find('div', {'class': 'apphub_CardTextContent'}).text
    print(nick.text, title, hour, link, )
    print(comment.split('').strip('        '))

Twilight6 发表于 2020-5-26 19:00:03

本帖最后由 Twilight6 于 2020-5-26 19:05 编辑

连接超时,应该是被反爬了或者要 外 、网 IP代、理 吧?

Twilight6 发表于 2020-5-26 19:02:49

本帖最后由 Twilight6 于 2020-5-27 07:28 编辑

{:10_277:}

诶呀二傻子是我 发表于 2020-5-26 19:03:57

去掉timeout后,报错是这样的:requests.exceptions.ConnectionError: HTTPSConnectionPool(host='steamcommunity.com', port=443): Max retries exceeded with url: /app/578080/reviews/?browsefilter=toprated&snr=1_5_100010_ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000128140DFD00>: Failed to establish a new connection: 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。'))

诶呀二傻子是我 发表于 2020-5-26 19:05:57

Twilight6 发表于 2020-5-26 19:03
连接超时,应该是被反爬了或者要 V劈n 吧

要v皮n的话,怎么解决呢?挂代理吗?

Twilight6 发表于 2020-5-26 19:08:11

诶呀二傻子是我 发表于 2020-5-26 19:05
要v皮n的话,怎么解决呢?挂代理吗?

我不清楚,steam 不是大部分都需要运行加速器的...不然有的时候连steam的个人资料都访问不了

suchocolate 发表于 2020-5-26 19:24:41

不用代理浏览器也登陆不上,超时正常。

诶呀二傻子是我 发表于 2020-5-26 19:33:28

suchocolate 发表于 2020-5-26 19:24
不用代理浏览器也登陆不上,超时正常。

那是不是没办法爬steam社区了{:10_250:}

Twilight6 发表于 2020-5-26 19:46:21

诶呀二傻子是我 发表于 2020-5-26 19:33
那是不是没办法爬steam社区了

去买个海外IP 哈哈
https://www.ipidea.net/?utm-source=bdtg&utm-keyword=?46

suchocolate 发表于 2020-5-26 21:29:51

诶呀二傻子是我 发表于 2020-5-26 19:33
那是不是没办法爬steam社区了

得使用代理或vpn。

诶呀二傻子是我 发表于 2020-5-26 21:37:27

果然挂了海外代理还是不行

Twilight6 发表于 2020-5-27 07:27:53

诶呀二傻子是我 发表于 2020-5-26 21:37
果然挂了海外代理还是不行

还是不行吗?那就是被反爬了,去百度上找找有没有类似的 看看人家是怎么爬的

诶呀二傻子是我 发表于 2020-5-27 15:15:59

买了个全局代理,解决了{:10_266:}现在有个新问题
import requests
from bs4 import BeautifulSoup
import json
# from selenium import webdriver
#
# driver = webdriver.Chrome(r'C:\Users\m1359\AppData\Local\Google\Chrome\Application\chromedriver.exe')

def sen_from_text(text):
    SENTIMENT_URL = 'http://api.bosonnlp.com/sentiment/analysis'
    h = {'X-Token': 'balbala'}# your token
    data = json.dumps(text)
    resp = requests.post(SENTIMENT_URL, headers=h, data=data.encode('utf-8'))
    resp = json.loads(resp.text)# print(resp)
    front = float(resp)
    return front



headers = {'Accept-Language': 'zh-CN,zh;q=0.9',
         'Host': 'steamcommunity.com',
         'Referer': 'https://steamcommunity.com/app/578080',
         'Connection': 'keep-alive',
         'Cookie': '',
         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/83.0.4103.61 Safari/537.36'}
file = open('steam.txt', 'w+', encoding='utf-8')
for i in range(1, 10):
    url = 'https://steamcommunity.com/app/578080/homecontent/?userreviewsoffset=' + str(10 * (i - 1)) + '&p=' + str(
      i) + '&workshopitemspage=' + str(i) + '&readytouseitemspage=' + str(i) + '&mtxitemspage=' + str(
      i) + '&itemspage=' + str(i) + '&screenshotspage=' + str(i) + '&videospage=' + str(i) + '&artpage=' + str(
      i) + '&allguidepage=' + str(i) + '&webguidepage=' + str(i) + '&integratedguidepage=' + str(
      i) + '&discussionspage=' + str(
      i) + '&numperpage=10&browsefilter=trendweek&browsefilter=trendweek&appid=578080&appHubSubSection=10&l=schinese' \
             '&filterLanguage=default&searchText=&forceanon=1'
    html = requests.get(url, headers=headers).text
    soup = BeautifulSoup(html, 'html.parser')# 如果装了lxml,推荐把解析器改为lxml
    reviews = soup.find_all('div', {'class': 'apphub_Card modalContentLink interactable'})

    for review in reviews:
      nick = review.find('div', {'class': 'apphub_CardContentAuthorName offline ellipsis'})
      title = review.find('div', {'class': 'title'}).text
      hour = review.find('div', {'class': 'hours'}).text.split(' ')
      link = nick.find('a').attrs['href']
      comment = review.find('div', {'class': 'apphub_CardTextContent'}).text.split('\n').strip('\t')
      # sen = sen_from_text(comment)
      print(nick.text, title, hour, link, comment)
      print(type(nick))
运行的时候会报错Traceback (most recent call last):
File "C:/Users/m1359/PycharmProjects/test/pubg.py", line 97, in <module>
    link = nick.find('a').attrs['href']
AttributeError: 'NoneType' object has no attribute 'find'
也没能像这个博主一样实现翻页获取更多评论
https://www.tinymind.net.cn/articles/6c517fc1b33931
页: [1]
查看完整版本: requests.exceptions.ConnectTimeout怎么解决