柿子饼同学 发表于 2021-2-25 14:21:17

爬虫时的问题

我爬豆瓣250的那个网站,它不回应是怎么回事{:10_245:}
import requests as r
import bs4 as b

res = r.get(r'https://movie.douban.com/top250')
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for i in target:
   print(each.a.span.text)

这是我写的,运行之后什么都不出来,这是什么情况

suchocolate 发表于 2021-2-25 14:26:53

本帖最后由 suchocolate 于 2021-2-25 14:28 编辑

加header
headers = {'user-agent': 'Mozilla'}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for each in target:    # 改成each
   print(each.a.span.text)

青出于蓝 发表于 2021-2-25 14:53:48

豆瓣有反爬虫系统,加上handers就好了

青出于蓝 发表于 2021-2-25 14:59:17

以下为完整代码
import requests as r
import bs4 as b
headers={"User-Agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser',headers=headers)
target = soup.find_all('div', class_='hd')
for each in target:    # 改成each
   print(each.a.span.text)

柿子饼同学 发表于 2021-2-25 15:02:59

青出于蓝 发表于 2021-2-25 14:59
以下为完整代码

它出错了{:10_266:}============== RESTART: C:\Users\86177\Desktop\python学习\豆瓣250爬虫.py =============
Traceback (most recent call last):
File "C:\Users\86177\Desktop\python学习\豆瓣250爬虫.py", line 6, in <module>
    soup = b.BeautifulSoup(res.text, 'html.parser',headers=headers)
File "C:\Users\86177\AppData\Local\Programs\Python\Python39-32\lib\site-packages\bs4\__init__.py", line 252, in __init__
    builder = builder_class(**kwargs)
File "C:\Users\86177\AppData\Local\Programs\Python\Python39-32\lib\site-packages\bs4\builder\_htmlparser.py", line 325, in __init__
    super(HTMLParserTreeBuilder, self).__init__(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'headers'
>>>

青出于蓝 发表于 2021-2-25 15:06:16

草率草率,没在编译器上运行
import requests as r
import bs4 as b
headers={"User-Agent":
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for each in target:    # 改成each
   print(each.a.span.text)

粗心了,抱歉{:10_266:}

青出于蓝 发表于 2021-2-25 15:07:10

柿子饼同学 发表于 2021-2-25 15:02
它出错了

这样好了吗,环境不太方便,看不出有啥问题了。

柿子饼同学 发表于 2021-2-25 15:51:51

谢谢谢谢,能用了{:10_279:}
页: [1]
查看完整版本: 爬虫时的问题