|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
我在学习使用爬虫的过程中,想试一下能否爬到百度搜索的效果,然后我搜了关键词‘Python’,得到url = ‘https://www.baidu.com/s?wd=pytho ... =4036&rsv_sug=9’;
在这个网页上审查元素的时候可以正常查看到代码,但是用‘urllib.request’模块访问这个url的时候会失败(代码如下),
- import urllib.request
- def url_open(url):
- req = urllib.request.Request(url)
- req.add_header('user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36')
- response = urllib.request.urlopen(url)
- html = response.read().decode('utf-8')
- print(html)
- if __name__ == '__main__':
- url = 'https://www.baidu.com/s?wd=python&rsv_spt=1&rsv_iqid=0x8801453e00019784&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=56060048_3_pg&rsv_enter=1&rsv_dl=tb&rsv_sug3=3&rsv_sug1=1&rsv_sug7=001&rsv_sug2=0&rsv_btype=i&inputT=1593&rsv_sug4=4036&rsv_sug=9'
- url_open(url)
复制代码
并且出现如下内容:
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<title>百度安全验证</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
<meta name="format-detection" content="telephone=no, email=no">
<link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
<link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
<link rel="stylesheet" href="https://wappass.bdimg.com/static/touch/css/api/mkdjump_8befa48.css" />
</head>
<body>
<div class="timeout hide">
<div class="timeout-img"></div>
<div class="timeout-title">网络不给力,请稍后重试</div>
<button type="button" class="timeout-button">返回首页</button>
</div>
<div class="timeout-feedback hide">
<div class="timeout-feedback-icon"></div>
<p class="timeout-feedback-title">问题反馈</p>
</div>
<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://wappass.bdimg.com/static/touch/js/mkdjump_6003cf3.js"></script>
</body>
</html><!--05549909460286507274042909-->
<script> var _trace_page_logid = 0554990946; </script>
想问下大佬们这个是什么情况,怎么解决啊
试一下这么写哦
- import requests
- import bs4
- url = "https://www.baidu.com/s?wd=python&rsv_spt=1&rsv_iqid=0x8801453e00019784&issp=1&f=8&rsv_bp=1&rsv_idx=2&ie=utf-8&tn=56060048_3_pg&rsv_enter=1&rsv_dl=tb&rsv_sug3=3&rsv_sug1=1&rsv_sug7=001&rsv_sug2=0&rsv_btype=i&inputT=1593&rsv_sug4=4036&rsv_sug=9"
- payload = {}
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'
- }
- response = requests.request("GET", url, headers=headers, data = payload)
- soup = bs4.BeautifulSoup(response.text, "html.parser")
- targets = soup.select("div.result.c-container ")
- for each in targets:
- targets2 = each.select("h3.t > a")
- print(targets2[0].text)
复制代码
输出结果:
python官方网站 - Welcome to Python.org
Python 基础教程 | 菜鸟教程
Python基础教程,Python入门教程(非常详细)
Python 简介 | 菜鸟教程
python吧-百度贴吧--python学习交流基地。--这里有一群python爱好...
Python教程 - 廖雪峰的官方网站
你都用 Python 来做什么? - 知乎
Python下载-Python中文版官方下载-华军软件园
|
|