|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
先放上代码:
- import requests
- import bs4
- url = 'https://www.investing.com/stock-screener/?sp=country::5|sector::a|industry::a|equityType::a|exchange::2|last::2,100|avg_volume::0,1000000<turnover_volume;1'
- headers = {'User_Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
- #proxies = {"http": "127.0.0.1:1080", "https": "127.0.0.1:1080"}
- res = requests.get(url,headers=headers)
- print(res)
复制代码 这个爬虫只写了一个开头,但是开头就遇到了问题,一开始我没有增加headers,运行后直接返回<Response [403]> , 我在网上查了一下原因,说是爬虫被服务器禁止了,应该增加headers模仿浏览器访问,于是我增加了headers,依然返回的是<Response [403]> , 按照小甲鱼的教程,我也添加了proxies,但是添加proxies之后提示的无法与目标服务器建立链接。所以代码里暂时就把proxies注释掉了。
于是我用了Chrome去查看了这个网站的Response Headers,内容如下:
- Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
- Accept-Encoding:
gzip, deflate, br
- Accept-Language:
zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7
- Cache-Control:
max-age=0
- Connection:
keep-alive
- Cookie:
adBlockerNewUserDomains=1551401834; _ga=GA1.2.561833304.1551401853; __qca=P0-1525036165-1551401854143; r_p_s_n=1; __gads=ID=c4c1bc4a976e22b6:T=1555029818:S=ALNI_MYAN3BaOifhCcYWliRU-6ITfhCwBA; editionPostpone=1555029870585; _gid=GA1.2.1715261453.1555398609; G_ENABLED_IDPS=google; _fbp=fb.1.1555460700436.7968688; PHPSESSID=3r05a5j82m28c3cnmcr104d5bb; geoC=CN; gtmFired=OK; StickySession=id.18830491796.047www.investing.com; billboardCounter_1=1; nyxDorf=NjEwYW47NHY1azswZCljYzBiN28yKzUwYWM%3D
- Host:
www.investing.com
- Upgrade-Insecure-Requests:
1
- User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36
这里的内容不仅仅有User-Agent,还有很多其它内容,是否应该把这些内容都想办法添加到代码中?
我没有在网上找到其它解决办法,所以来这里求助啦,麻烦大佬们看看应该怎么解决这个问题,谢谢
- import requests
- headers = {
- 'Host': 'www.investing.com',
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
- }
- url = 'https://www.investing.com/stock-screener/?sp=country::5|sector::a|industry::a|equityType::a|exchange::2|last::2,100|avg_volume::0,1000000%3Cturnover_volume;1'
- res = requests.get(url, headers=headers)
- print(res)
复制代码
你试试。。
|
|