|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
原本程序就是写着,去淘宝里面搜索任何东西,把搜索显示出来的图片匹配出来,下载到本地,
可是在匹配这一步就出错了,求大神解答,感谢感谢!
xpath匹配不成功,规则为".//*[@class='J_ItemPic img']/@src"
https://s.taobao.com/search?q=pen&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20180504&ie=utf8
url过期,你可以随便去淘宝搜索一样东西来代替
代码如下,程序只是写到xpath匹配的部分
**********************************************************************************************************************
#!/usr/bin/python
# coding=utf-8
from urllib import request
from urllib.request import urlopen
from urllib.parse import urlencode
from lxml import etree
"""
谨记要遵循网站的爬取规则
0、输入用户想要爬取什么物品的图片,raw_input()
1、编码之后,获取要爬取的url,urllib2.Request()/urlopen()/xpath()
2、deal_url()下载网页之后(read()),使用xpath匹配出首页显示图片的链接, urllib2.Request()/urlopen()/xpath()
3、读取图片链接,把图片下载下来,download_jpg()
"""
class Spider(object):
def __init__(self):
self.switch = True
self.page = 44
def deal_jpg(self, html):
"""
加载用户指定商品url的图片,并且通过xpath匹配出需要找图片的url,把url传给download_jpg
"""
print("----------------2---------------------")
print(html)
content = etree.HTML(html)
print(content)
link_list = content.xpath(".//*[@class='J_ItemPic img']/@src")
#link_list = content.xpath('.//*[@class="J_ItemPic img"]/@src')
print(link_list)
#link_list = content.xpath(".//*[@id='mainsrp-itemlist']/div/div/img[@class ="J_ItemPic img"]")
for link in link_list:
print(link)
print("----------------3--------------")
self.download_jpg(link)
print("----------------4--------------")
def download_jpg(self,link):
"""
下载目标图片
"""
with open("taobao.jpg", 'ab') as f:
f.write(link)
print("下载完毕----")
def switch_jpg(self):
"""
获取用户指定要爬取商品的图片,并且处理为可用url,传给deal_jpg()
"""
"""进行编码,把中文字编码,然后构成完整的url"""
url_01 = "https://s.taobao.com/search?"
url_02 = "&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20180504&ie=utf8"
page = 0
url_page = "&bcoffset=3&ntoffset=3&p4ppushleft=1%2C48&s="
key = input("请输入你要爬取商品的关键字:")
key = {"q": key}
key = urlencode(key)
full_url = url_01 + key + url_02
print(full_url)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393"}
request2 = request.Request(full_url, headers=headers)
response = urlopen(request2)
html = response.read()
print(html)
print("-------------1----------------")
self.deal_jpg(html)
print("-------------5---------------")
while self.switch == False:
print("-----------------6---------------")
if page == 0:
full_url = url_01 + key + url_02
else:
full_url = url_01 + key + url_02 + url_page + str(self.page)
if __name__ == "__main__":
"""
当这个程序单独运行时,会被执行以下功能
"""
taobao = Spider()
taobao.switch_jpg()
******************************************************************************************************* |
-
浏览器而可以匹配成功
-
程序匹配为空值
|