爬虫,萌新交流区,萌新训练营,鱼C论坛

君子好逑 发表于 2021-1-22 17:18:44

爬虫

有没有大佬有能爬百度图片的爬虫。我在B站上找到的视频上的程序都用不了，求大佬发程序

Daniel_Zhang 发表于 2021-1-22 17:24:08

嗯，就请参考这个

https://fishc.com.cn/forum.php?mod=redirect&goto=findpost&ptid=189113&pid=5204619

君子好逑 发表于 2021-1-22 17:30:40

import requests
from lxml import etree
import re

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'
}

url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=10175432787291229734&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E4%BA%8C%E6%AC%A1%E5%85%83%E5%9B%BE%E7%89%87&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&hd=&latest=&copyright=&word=%E4%BA%8C%E6%AC%A1%E5%85%83%E5%9B%BE%E7%89%87&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=&fr=&expermode=&force=&pn=30'

response = requests.get(url=url,headers=headers)

response.encoding = response.apparent_encoding

html = response.text

regular = re.compile('"middleURL":"(.*?)"')
m = regular.findall(html)
for each in m:
print(each)

这个程序应该是拿到前30张图片的地址，有兴趣继续发挥的小伙伴可以看看设成最佳答案的大佬的思路和讲解

页: [1]

鱼C论坛's Archiver

爬虫