|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import re
- import urllib.request as ur
- url = 'http://jandan.net/ooxx/MjAyMDA0MjQtMjAx#comments'
- req = ur.Request(url)
- req.add_header('user-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36')
- response = ur.urlopen(req)
- html = response.read().decode('utf-8')
- p = r'<img src="//wx.+\.jpg'
- imglist = re.findall(p,html)
- for i in imglist:
- a = i.split('//',1)[1]
- b = 'http://'+a
- c = ur.urlopen(b).read()
- for m in range(len(imglist)):
- file = str(m+1)+'.jpg'
- with open(file,'wb') as f:
- f.write(c)
复制代码
这段代码是我想写来联系爬虫的,从这行【 for m in range(len(imglist)):】起,逻辑有点问题,前面是获取爬到的内容,下面新建文件,把内容保存进去,但是这样新建的文件后面的会把前面的覆盖掉,应该怎么修改这两个for的逻辑,保证每次获取的内容写到对应的文件啊?谢谢
你第二个for循环有逻辑问题,这样写就是每下载一张图片就循环保存n次。下载完肯定N张图片都是最后一次下载的那张。
- import re
- import urllib.request as ur
- url = 'http://jandan.net/ooxx/MjAyMDA0MjQtMjAx#comments'
- req = ur.Request(url)
- req.add_header('user-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36')
- response = ur.urlopen(req)
- html = response.read().decode('utf-8')
- p = r'<img src="//wx.+\.jpg'
- imglist = re.findall(p,html)
- for i in range(len(imglist)):
- a = imglist[i].split('//',1)[1]
- b = 'http://'+a
- c = ur.urlopen(b).read()
- file = str(i+1)+'.jpg'
- with open(file,'wb') as f:
- f.write(c)
复制代码
纯手机编辑的没有测试,如果有错误请提醒
|
|