fytfytf 发表于 2020-7-23 16:48:00

requests问题

import requests
import bs4

r=requests.get('https://movie.douban.com/top250')

soup=bs4.BeautifulSoup(r.text,'lxml')

title=soup.find_all('div',class_='hd')
for i in title:
    print(i.a.span.text)

代码应该是没错的,为什么什么都打印不出来,求解{:10_277:}

xiaosi4081 发表于 2020-7-23 16:52:06

本帖最后由 xiaosi4081 于 2020-7-23 16:59 编辑

python -m pip install -U urllib3 -i https://pypi.tuna.tsinghua.edu.cn/simple
还有把代码改成这样:

import requests
import bs4
headers = {"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36"}
r=requests.get('https://movie.douban.com/top250',headers=headers)

soup=bs4.BeautifulSoup(r.text,'lxml')

title=soup.find_all('div',class_='hd')
for i in title:
    print(i.a.span.text)

headers就是请求头,不加请求头,可能爬不到,不信的话{:10_256:} :
import requests
import bs4

r=requests.get('https://movie.douban.com/top250')
print(r.status_code)
soup=bs4.BeautifulSoup(r.text,'html.parser')

title=soup.find_all('div',class_='hd')
for i in title:
    print(i.a.span.text)

结果:
418

也就是说被反爬了
求最佳{:10_254:}

永恒的蓝色梦想 发表于 2020-7-23 16:52:41

python -m pip install -U chardet -i https://pypi.tuna.tsinghua.edu.cn/simple

qiuyouzhi 发表于 2020-7-23 16:55:39

一点反反爬措施都没有的嘛..
import requests
import bs4

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'}

r=requests.get('https://movie.douban.com/top250', headers = headers)

soup=bs4.BeautifulSoup(r.text,'lxml')

title=soup.find_all('div',class_='hd')
for i in title:
    print(i.a.span.text)
页: [1]
查看完整版本: requests问题