|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
win7 64位系统 spython3.5
抓取韩国某商品网站图片,代码如下
- import re
- from selenium import webdriver
- #抓取用Chrome渲染后的网页图片
- a=webdriver.Chrome()
- a.get('http://www.funnymade.com/shop/goods/goods_view.php?goodsno=143&category=')
- c=a.page_source
- re1='<img[^>]*src="([^"]*)"'
- imglist=re.findall(re1,c)
- print(imglist)
复制代码
因为该网页中有JS渲染内容所以用了selenium模拟Chrome浏览器来加载,但是加载后的内容还是不全,且有大量相对地址。请问有没有什么方法可以得到以上网址五个600*600的图头?
以上代码运行后的结果如下:
- ['/shop/data/skin/no88/img/main/close.gif', '/shop/data/skin/no88/img/main/btn_mypage_go.gif', '/shop/data/wizdesign/up.png', '/shop/data//wizdesign/btn_down.png', '/shop/data/wizdesign/list_icon.png', '/shop/data/wizdesign/list_icon2.png', '/shop/data/wizdesign/add_icon1.png', '/shop/data/wizdesign/joinpoint.png', '/shop/data/wizdesign/add_icon1.png', '/shop/data/wizdesign/logo.png', '/shop/data/skin/no88/img/banner/sns_f.png', '/shop/data/skin/no88/img/banner/sns_k.png', '/shop/data/skin/no88/img/banner/sns_i.png', '../data/goods/1465437265560m0.jpg', '/shop/data/skin/no88/img/common/btn_zoom.gif', '../data/goods/t/1465437265560m0.jpg', '../data/goods/t/1465437265299m1.jpg', '../data/goods/t/1465437265838m2.jpg', '../data/goods/t/1465437265522m3.jpg', '../data/goods/t/1465437265910m4.jpg', '/shop/data/skin/no88/img/common/btn_plus.gif', '/shop/data/skin/no88/img/common/btn_minus.gif', '../data/skin/no88/img/icon/good_icon_new.gif', '/shop/data/skin/no88/img/common/btn_multioption_br.gif', '/shop/admin/img/natescrab_btn.gif', '../data/skin/no88/img/sns/icon_twitter.png', '../data/skin/no88/img/sns/icon_facebook.png', '../data/skin/no88/img/sns/icon_url.png', 'https://paycoscdn.toastoven.net/payco/bill/checkout/img/v2/ico_arr_bx.gif', 'http://funnymade.godohosting.com/image/pouch/clearpouch/clearpouch-m.jpg', '../data/goods/1467268630178s0.jpg', '../data/skin/no88/img/icon/good_icon_new.gif', '../data/goods/1333007167_s_0.jpg', '../data/goods/1457505743522s0.jpg', '../data/skin/no88/img/icon/good_icon_new.gif', '../data/skin/no88/img/icon/good_icon_best.gif', '../data/goods/1333007143_s_0.jpg', '../data/goods/139143631622s0.jpg', '/shop/data/skin/no88/img/common/bar_detail_07.gif', 'http://funnymade.godohosting.com/toktok.png', '/shop/data/wizdesign/tit_today_view_h21.gif', '/shop/data/skin/no88/img/common/sky_btn_up.gif', '/shop/data/skin/no88/img/common/sky_btn_down.gif', '/shop/data/wizdesign/go_top_h18.gif']
复制代码
棒子写的网页不太规范~
- import re
- import requests as req
- url = 'http://www.funnymade.com/shop/goods/goods_view.php?goodsno=143&category='
- url_shop = 'http://www.funnymade.com/shop'
- html = req.get(url).text
- ptn = re.compile(r"<img[^>]*src='([^']*)'[^>]*?class=hand")
- imglist = [src.replace('..', url_shop)\
- .replace('/t/', '/')
- for src in ptn.findall(html)]
- print(imglist)
复制代码
结果:
- ['http://www.funnymade.com/shop/data/goods/1465437265560m0.jpg',
- 'http://www.funnymade.com/shop/data/goods/1465437265299m1.jpg',
- 'http://www.funnymade.com/shop/data/goods/1465437265838m2.jpg',
- 'http://www.funnymade.com/shop/data/goods/1465437265522m3.jpg',
- 'http://www.funnymade.com/shop/data/goods/1465437265910m4.jpg']
复制代码
|
|