爬虫时的问题
我爬豆瓣250的那个网站,它不回应是怎么回事{:10_245:}import requests as r
import bs4 as b
res = r.get(r'https://movie.douban.com/top250')
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for i in target:
print(each.a.span.text)
这是我写的,运行之后什么都不出来,这是什么情况 本帖最后由 suchocolate 于 2021-2-25 14:28 编辑
加header
headers = {'user-agent': 'Mozilla'}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for each in target: # 改成each
print(each.a.span.text)
豆瓣有反爬虫系统,加上handers就好了 以下为完整代码
import requests as r
import bs4 as b
headers={"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser',headers=headers)
target = soup.find_all('div', class_='hd')
for each in target: # 改成each
print(each.a.span.text)
青出于蓝 发表于 2021-2-25 14:59
以下为完整代码
它出错了{:10_266:}============== RESTART: C:\Users\86177\Desktop\python学习\豆瓣250爬虫.py =============
Traceback (most recent call last):
File "C:\Users\86177\Desktop\python学习\豆瓣250爬虫.py", line 6, in <module>
soup = b.BeautifulSoup(res.text, 'html.parser',headers=headers)
File "C:\Users\86177\AppData\Local\Programs\Python\Python39-32\lib\site-packages\bs4\__init__.py", line 252, in __init__
builder = builder_class(**kwargs)
File "C:\Users\86177\AppData\Local\Programs\Python\Python39-32\lib\site-packages\bs4\builder\_htmlparser.py", line 325, in __init__
super(HTMLParserTreeBuilder, self).__init__(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'headers'
>>>
草率草率,没在编译器上运行
import requests as r
import bs4 as b
headers={"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:85.0) Gecko/20100101 Firefox/85.0"}
res = r.get('https://movie.douban.com/top250', headers=headers)
soup = b.BeautifulSoup(res.text, 'html.parser')
target = soup.find_all('div', class_='hd')
for each in target: # 改成each
print(each.a.span.text)
粗心了,抱歉{:10_266:} 柿子饼同学 发表于 2021-2-25 15:02
它出错了
这样好了吗,环境不太方便,看不出有啥问题了。 谢谢谢谢,能用了{:10_279:}
页:
[1]