爬虫爬取妹子图

音频线 · 发表于 2018-8-14 14:42:23

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

给大家送一波儿福利了，用python写的网络爬虫爬取妹子图上的优质资源。

因为很多网站都有反爬取机制，所以新手很难找到一些合适的网站进行爬取练习，在这里给大家推荐妹子图这个网站，图片都可以爬取。
记住不要让妈妈和女朋友发现了你爬的东西哦！！！

import urllib.request
import urllib.parse
import re
import time

def getHtml(url, header):
request = urllib.request.Request(url, headers = header)
response = urllib.request.urlopen(request)
html = response.read().decode("gbk")

return html

def getaddressofpic(html):
r_key = "<img alt=\"(.*?)\" src=\"(.*?)\" />"
key = re.compile(r_key)

piclist = re.findall(key, html)

return piclist

def saving(piclist):
for each in piclist:
      address = each[1]
      name = each[0]
      print(name)
      print(address)
      urllib.request.urlretrieve(address, "e://pachong/%s.jpg"%name)

def paqu():
for num in range(5550, 5580):
      if num % 10 != 3:
         url = "http://www.meizitu.com/a/" + str(num) + ".html"

         header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134"}

         html = getHtml(url, header)

         piclist = getaddressofpic(html)

         saving(piclist)

      else:
         url = "http://www.meizitu.com/a//" + str(num) + ".html"

         header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134"}

         html = getHtml(url, header)

         piclist = getaddressofpic(html)

         saving(piclist)

      time.sleep(60)

if __name__ == "__main__":
paqu()

print("爬取成功！")

这里的储存地址大家按自己的电脑可以改一下，我这里是E盘。
另外User Agent设置大家也按照自己的电脑有需要的改一下。
paqu()函数中的num参数是爬取的哪一页，大家可以在浏览的时候注意一下网站的URL。

梦星 · 发表于 2018-8-14 15:37:42

老司机。。。

kana173 · 发表于 2018-8-14 15:46:40

老铁没毛病

dddsswu · 发表于 2018-8-14 16:07:21

我也要来试试看

小小小小的鱼丶 · 发表于 2018-8-14 16:25:46

王小xiao · 发表于 2018-8-14 16:42:16

了解下

秋木叶 · 发表于 2018-8-14 16:45:46

sxl730 · 发表于 2018-8-14 16:47:14

西南孤狼 · 发表于 2018-8-14 17:41:34

想看

幻影也疯狂 · 发表于 2018-8-14 17:50:32

随便看看

只想敲代码 · 发表于 2018-8-14 18:05:17

看下代码

eyes888 · 发表于 2018-8-14 19:28:42

厉害了

清风与酒ing · 发表于 2018-8-14 20:00:53

厉害了

中年神仙 · 发表于 2018-8-14 20:11:42

Devin锋 · 发表于 2018-8-14 20:49:59

一个小号 · 发表于 2018-8-14 22:39:15

chobits024 · 发表于 2018-8-14 23:20:05

看看是什么好东西

Vibrant · 发表于 2018-8-15 00:17:57

hellohero · 发表于 2018-8-15 06:40:13

污污污，小火车开车啦

杀不死的比尔 · 发表于 2018-8-15 07:50:35

账号		自动登录	找回密码
密码			立即注册

[技术交流] 爬虫爬取妹子图

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块