鱼C论坛

 找回密码
 立即注册
查看: 2105|回复: 2

[已解决]关于抓取某商品网站图头的问题

[复制链接]
发表于 2016-9-6 19:27:38 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
win7 64位系统 spython3.5
抓取韩国某商品网站图片,代码如下
  1. import re
  2. from selenium import webdriver

  3. #抓取用Chrome渲染后的网页图片
  4. a=webdriver.Chrome()
  5. a.get('http://www.funnymade.com/shop/goods/goods_view.php?goodsno=143&category=')
  6. c=a.page_source

  7. re1='<img[^>]*src="([^"]*)"'
  8. imglist=re.findall(re1,c)

  9. print(imglist)
复制代码


因为该网页中有JS渲染内容所以用了selenium模拟Chrome浏览器来加载,但是加载后的内容还是不全,且有大量相对地址。请问有没有什么方法可以得到以上网址五个600*600的图头?
以上代码运行后的结果如下:
  1. ['/shop/data/skin/no88/img/main/close.gif', '/shop/data/skin/no88/img/main/btn_mypage_go.gif', '/shop/data/wizdesign/up.png', '/shop/data//wizdesign/btn_down.png', '/shop/data/wizdesign/list_icon.png', '/shop/data/wizdesign/list_icon2.png', '/shop/data/wizdesign/add_icon1.png', '/shop/data/wizdesign/joinpoint.png', '/shop/data/wizdesign/add_icon1.png', '/shop/data/wizdesign/logo.png', '/shop/data/skin/no88/img/banner/sns_f.png', '/shop/data/skin/no88/img/banner/sns_k.png', '/shop/data/skin/no88/img/banner/sns_i.png', '../data/goods/1465437265560m0.jpg', '/shop/data/skin/no88/img/common/btn_zoom.gif', '../data/goods/t/1465437265560m0.jpg', '../data/goods/t/1465437265299m1.jpg', '../data/goods/t/1465437265838m2.jpg', '../data/goods/t/1465437265522m3.jpg', '../data/goods/t/1465437265910m4.jpg', '/shop/data/skin/no88/img/common/btn_plus.gif', '/shop/data/skin/no88/img/common/btn_minus.gif', '../data/skin/no88/img/icon/good_icon_new.gif', '/shop/data/skin/no88/img/common/btn_multioption_br.gif', '/shop/admin/img/natescrab_btn.gif', '../data/skin/no88/img/sns/icon_twitter.png', '../data/skin/no88/img/sns/icon_facebook.png', '../data/skin/no88/img/sns/icon_url.png', 'https://paycoscdn.toastoven.net/payco/bill/checkout/img/v2/ico_arr_bx.gif', 'http://funnymade.godohosting.com/image/pouch/clearpouch/clearpouch-m.jpg', '../data/goods/1467268630178s0.jpg', '../data/skin/no88/img/icon/good_icon_new.gif', '../data/goods/1333007167_s_0.jpg', '../data/goods/1457505743522s0.jpg', '../data/skin/no88/img/icon/good_icon_new.gif', '../data/skin/no88/img/icon/good_icon_best.gif', '../data/goods/1333007143_s_0.jpg', '../data/goods/139143631622s0.jpg', '/shop/data/skin/no88/img/common/bar_detail_07.gif', 'http://funnymade.godohosting.com/toktok.png', '/shop/data/wizdesign/tit_today_view_h21.gif', '/shop/data/skin/no88/img/common/sky_btn_up.gif', '/shop/data/skin/no88/img/common/sky_btn_down.gif', '/shop/data/wizdesign/go_top_h18.gif']
复制代码
最佳答案
2016-9-7 09:36:27
棒子写的网页不太规范~
  1. import re
  2. import requests as req

  3. url = 'http://www.funnymade.com/shop/goods/goods_view.php?goodsno=143&category='
  4. url_shop = 'http://www.funnymade.com/shop'
  5. html = req.get(url).text

  6. ptn = re.compile(r"<img[^>]*src='([^']*)'[^>]*?class=hand")
  7. imglist = [src.replace('..', url_shop)\
  8.               .replace('/t/', '/')
  9.            for src in ptn.findall(html)]         

  10. print(imglist)
复制代码

结果:
  1. ['http://www.funnymade.com/shop/data/goods/1465437265560m0.jpg',
  2. 'http://www.funnymade.com/shop/data/goods/1465437265299m1.jpg',
  3. 'http://www.funnymade.com/shop/data/goods/1465437265838m2.jpg',
  4. 'http://www.funnymade.com/shop/data/goods/1465437265522m3.jpg',
  5. 'http://www.funnymade.com/shop/data/goods/1465437265910m4.jpg']
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2016-9-7 09:36:27 | 显示全部楼层    本楼为最佳答案   
棒子写的网页不太规范~
  1. import re
  2. import requests as req

  3. url = 'http://www.funnymade.com/shop/goods/goods_view.php?goodsno=143&category='
  4. url_shop = 'http://www.funnymade.com/shop'
  5. html = req.get(url).text

  6. ptn = re.compile(r"<img[^>]*src='([^']*)'[^>]*?class=hand")
  7. imglist = [src.replace('..', url_shop)\
  8.               .replace('/t/', '/')
  9.            for src in ptn.findall(html)]         

  10. print(imglist)
复制代码

结果:
  1. ['http://www.funnymade.com/shop/data/goods/1465437265560m0.jpg',
  2. 'http://www.funnymade.com/shop/data/goods/1465437265299m1.jpg',
  3. 'http://www.funnymade.com/shop/data/goods/1465437265838m2.jpg',
  4. 'http://www.funnymade.com/shop/data/goods/1465437265522m3.jpg',
  5. 'http://www.funnymade.com/shop/data/goods/1465437265910m4.jpg']
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2016-9-9 21:51:29 | 显示全部楼层
SixPy 发表于 2016-9-7 09:36
棒子写的网页不太规范~

结果:

我还有个问题,那就是 为啥你这么牛泥?为啥这么牛泥?为啥泥?
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2026-2-22 15:25

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表