|
10鱼币
- import re
- s = '<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg" size="57323" width="450" height="600"><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg" size="57323" width="450" height="600">'
- result = re.findall(r'<img class="BDE_Image" src="[^"]+\.jpg', s)
- print(result)
- 结果:
- ['<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg', '<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg']
复制代码 疑问:[^"]表示除了双引号以外的字符,
得到的结果jpg后面的都没有了:" size="57323" width="450" height="600"
为什么不是:
['<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg size=57323 width=450 height=600 后面略]
或者优化我自己写的:
- import re
- s = '<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg" size="57323" width="450" height="600"><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg" size="57323" width="450" height="600">'
- result = re.findall(r'<img class="BDE_Image" src=".+\.jpg', s)
- print(result)
- 结果:
- ['<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg" size="57323" width="450" height="600"><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=0c4bc7595afbb2fb342b581a7f4b2043/3992a116fdfaaf51a67b4ebb835494eef11f7aa0.jpg']
复制代码 结果是最后一个图片路径后面没有size,width, height,但前面有,怎么优化?
结论就是,看不懂小甲鱼老师写的那句,有小伙伴指点一下吗?
1. 是把匹配的内容提取出来,不是把不匹配的去掉。
2. 该正则表达式还要求双引号前面是jpg.
3. 也就是说以<img class="BDE_Image" src=" 开头,后边的只要不是双引号就匹配,碰到双引号就匹配结束,并且并且双引号前面必须是 .jpg
|
最佳答案
查看完整内容
1. 是把匹配的内容提取出来,不是把不匹配的去掉。
2. 该正则表达式还要求双引号前面是jpg.
3. 也就是说以
|