|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
import urllib.request
import re
import sys
headers = {"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"}
url='https://www.nuomi.com/?cid=002540'
req=urllib.request.Request(url,headers=headers)
response=urllib.request.urlopen(req)
html=response.read().decode('utf-8')
#print(html)
listurl=re.findall(r'http:.+\.jpg',html,re.S|re.M)
#print(listurl)
i=0
for url in listurl:
f=open(str(i)+'.jpg','wb')
req=urllib.request.urlopen(url)
response=req.head().decode('utf-8')
f.write(response)
i+=1
这个程序报错:UnicodeEncodeError: 'ascii' codec can't encode characters in position 88-92: ordinal not in range(128)
试了网上的方法 没用 编码问题 但是不知道怎么改
请会的同学指点下。谢谢了。。@凌九霄
本帖最后由 凌九霄 于 2018-8-27 23:12 编辑
修改了下,拿到了图片。本来我想用listurl = re.sub(r'src="([^"]+jpg)"','http:\1', html)直接替换成最终图片地址,在regexbuddy里面测试是没问题的,但是用代码却没成功,这里我也有点迷惑
- import urllib.request
- import re
- headers = {"User-Agent":
- "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"}
- url = 'https://www.nuomi.com/?cid=002540'
- req = urllib.request.Request(url, headers=headers)
- response = urllib.request.urlopen(req)
- html = response.read().decode('utf-8')
- # print(html)
- listurl = re.findall(r'src="([^"]+jpg)"', html)
- i=0
- for url in listurl:
- with open(str(i)+'.jpg','wb') as f:
- req=urllib.request.urlopen('http:'+url)
- response=req.read()
- f.write(response)
- i+=1
复制代码
|
|