|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
>>> import urllib.request
>>> import re
>>> response = urllib.request.urlopen('https://tieba.baidu.com/p/1397681324')
>>> html = response.read().decode('utf-8')
>>> p = r'<img class="BDE_Image".*?src="([^"]*\.jpg)[^"]*".*?>'
>>> imglist = re.findall(p,html)
>>> for each in imglist:
print(each)
https://imgsa.baidu.com/forum/w%3D580/sign=3c06220887d6277fe912323018391f63/d4ecea1f3a292df55bf91aa6bc315c6034a87349.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=c35a268c4bfbfbeddc59367748f1f78e/4cf62badcbef7609d2c32e2c2edda3cc7dd99ebf.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=e92e32b633fa828bd1239debcd1e41cd/d18dbaec08fa513d7387da6d3d6d55fbb2fbd952.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=e70575b5afc379317d688621dbc5b784/6edfb9cc7cd98d109e1cbb82213fb80e7aec90bf.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=d3525bb0c895d143da76e42b43f18296/4cd58b2397dda1443cc1db01b2b7d0a20df486bf.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=0e36fbb20a55b3199cf9827d73a88286/3cdb971001e9390180ca3a8d7bec54e737d196bf.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=3208d0e5d0160924dc25a213e406359b/7af12087e950352a42a992bb5343fbf2b3118bbf.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=1b2ee48ac93d70cf4cfaaa05c8ddd1ba/d0508222720e0cf3b3daf8480a46f21fbe09aa52.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=a60fc2b7970a304e5222a0f2e1c9a7c3/7d6f4ffbb2fb43163632ccc920a4462309f7d352.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=a90b8d771bd5ad6eaaf964e2b1ca39a3/7499193b5bb5c9ea6edeb550d539b6003af3b352.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=d31a9a5bbc3eb13544c7b7b3961fa8cb/22d2ed03918fa0ec5bead2be269759ee3d6ddb52.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=fc80da6d3d6d55fbc5c6762e5d234f40/bd1d5b34970a304eff081649d1c8a786c9175c59.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=b8d1da01b2b7d0a27bc90495fbee760d/33f298025aafa40f5e1e786eab64034f79f019ae.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=ff80693c6c81800a6ee58906813433d6/e9ee9013632762d0d9db340ca0ec08fa503dc69a.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=9f063901d158ccbf1bbcb53229d9bcd4/172e0bdfa9ec8a134fffc753f703918fa1ecc09a.jpg
https://imgsa.baidu.com/forum/w%3D580/sign=b15a74dbccbf6c81f7372ce08c3fb1d7/51ddb3ec8a1363274f085280918fa0ec09fac79a.jpg
【我的问题】
1、以上是返回的结果,数了一下共16张图,而实际上https://tieba.baidu.com/p/1397681324页面中分2页,仅第1页就不止16张图,而我想把2页所有的图片都爬取
2、r'<img class="BDE_Image".*?src="([^"]*\.jpg)[^"]*".*?>'有点看不懂呢,请大神指点下
|
|