|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
import requests
import bs4
res = requests.get('https://movie.douban.com/top250')
soup = bs4.BeautifulSoup(res.text,'html.parser')
targets = soup.find_all('div',class_='hd')
for each in targets:
print(each.a.span.text)
for each in targets:
print(each.a.span.text)
到最后打印不出结果,请问哪出问题了?
被反爬虫了(这个网站只会给浏览器发送数据)
需要加一个 headers 来假装自己是浏览器
- import requests
- import bs4
- headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
- res = requests.get('https://movie.douban.com/top250', headers=headers)
- soup = bs4.BeautifulSoup(res.text,'html.parser')
- targets = soup.find_all('div',class_='hd')
- print(targets)
- for each in targets:
- print(each.a.span.text)
复制代码
|
|