[已解决]爬虫入门问题

1119625819 · 发表于 2023-1-31 21:48:59

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 1119625819 于 2023-1-31 21:50 编辑

import requests
import bs4

res = requests.get("https:top250")
soup = bs4.BeautifulSoup(res.text,"html.parser")
targets = soup.find_all("div",class_="hd")
print(res.text)
print(res.status_code)
for each in targets:
print(each.a.span.text)

D:\Anaconda\envs\python\python.exe D:\python\kk.py

418

进程已结束,退出代码0

上面的代码是截取豆瓣前250名的电影，但是我用pycharm运行后只出现上述结果，与一般情况不符合（应该会出现名字），而且没有显示print(res.text)的内容，但是print(res.status_code)却有显示，是遇到什么问题，那个网址没有完全打出（收到来自URL权限）

最佳答案

月排行榜 / 总排行榜

isdkz

2023-1-31 22:13:15

因为有反爬机制，要加请求头模拟浏览器
import requests

import bs4

headers = {'user-agent': 'Mozilla/5.0'} # 加了这行

res = requests.get("https://movie.douban.com/top250", headers=headers) # 改了这行

soup = bs4.BeautifulSoup(res.text,"html.parser")

targets = soup.find_all("div",class_="hd")

print(res.text)

print(res.status_code)

for each in targets:

print(each.a.span.text)
复制代码

跳转到最佳答案楼层

isdkz · 发表于 2023-1-31 22:13:15

因为有反爬机制，要加请求头模拟浏览器
import requests

import bs4

headers = {'user-agent': 'Mozilla/5.0'} # 加了这行

res = requests.get("https://movie.douban.com/top250", headers=headers) # 改了这行

soup = bs4.BeautifulSoup(res.text,"html.parser")

targets = soup.find_all("div",class_="hd")

print(res.text)

print(res.status_code)

for each in targets:

print(each.a.span.text)
复制代码

1119625819 · 发表于 2023-2-1 22:00:33

import requests
import bs4
headers = {'user-agent': 'Mozilla/5.0'} # 加了这行
res = requests.get("httpstop250", headers=headers) # 改了这行
soup = bs4.BeautifulSoup(res.text,"html.parser")
targets = soup.find_all("div",class_="hd")
print(res.text)
print(res.status_code)
for each in targets:
print(each.a.span.text)

复制代码

（headers = {'user-agent': 'Mozilla/5.0'} # 加了这行
对于这位鱼友所加的这行代码中 'Mozilla/5.0'，单引号的部分是指使用firefox浏览器搜索吗，因为Mozilla这个名称是firefox浏览器的缩写，其中链接受权限限制缩写

isdkz · 发表于 2023-2-2 08:14:32

本帖最后由 isdkz 于 2023-2-2 08:16 编辑

1119625819 发表于 2023-2-1 22:00
（headers = {'user-agent': 'Mozilla/5.0'} # 加了这行
对于这位鱼友所 ...

Mozilla并不是 firefox 浏览器的缩写，你见过哪个缩写比它名字还长的？Mozilla 是一个组织，firefox 只是 Mozilla 的一个产品，

如果你观察过所有浏览器的UA，会发现它们都带着 Mozilla/5.0，

我就是不想复制浏览器 UA 的一长串，才手敲的，谁能记住那一长串，所以我就敲个 Mozilla/5.0，用于豆瓣的反爬足够了

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫入门问题

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块