鱼C论坛

 找回密码
 立即注册
查看: 2749|回复: 6

[已解决]各位好,我想提取倒数第4行的网址,该怎么操作呢?

[复制链接]
发表于 2021-5-1 10:46:43 | 显示全部楼层 |阅读模式
10鱼币
各位好,我想提取倒数第4行的网址//cnxinre.en.made-in-china.com/360-Virtual-Tour.html
<div class="compnay-name-li J-compnay-name-li ellipsis">
<div class="auth-list J-auth-list">
<div class="auth">
<span class="auth-gold-span">
<img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
<div class="tip arrow-bottom tip-gold">
<div class="tip-con">
<p class="tip-para">Suppliers with verified business licenses</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</span>
</div>
<div class="auth auth-as">
<span class="as-logo" reportusable="reportUsable">
<input type="hidden" value="KokmJZBlCzWY"/>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
<img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
</a>
</span>
</div>
<div class="auth auth-360 J-panorama" style="display: inline-block;">
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
<img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
</a>
<div class="tip arrow-bottom tip-360">
<div class="tip-con">
<p class="tip-para">360° Virtual Tour</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</div>
</div>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
<span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
</a>
</div>
最佳答案
2021-5-1 10:46:44
本帖最后由 suchocolate 于 2021-5-2 09:10 编辑
from lxml import etree


s = '''<div class="compnay-name-li J-compnay-name-li ellipsis">
<div class="auth-list J-auth-list">
<div class="auth">
<span class="auth-gold-span">
<img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
<div class="tip arrow-bottom tip-gold">
<div class="tip-con">
<p class="tip-para">Suppliers with verified business licenses</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</span>
</div>
<div class="auth auth-as">
<span class="as-logo" reportusable="reportUsable">
<input type="hidden" value="KokmJZBlCzWY"/>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
<img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
</a>
</span>
</div>
<div class="auth auth-360 J-panorama" style="display: inline-block;">
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
<img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
</a>
<div class="tip arrow-bottom tip-360">
<div class="tip-con">
<p class="tip-para">360° Virtual Tour</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</div>
</div>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
<span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
</a>
</div>'''
html = etree.HTML(s)   # 将字符串转成etree对象
result = html.xpath('//a/@href')[-1]   # etree对象的xpath方法,根据xpath语法选内容。列出所有a的href属性,选最后一个。
print(result)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-5-1 10:46:44 | 显示全部楼层    本楼为最佳答案   
本帖最后由 suchocolate 于 2021-5-2 09:10 编辑
from lxml import etree


s = '''<div class="compnay-name-li J-compnay-name-li ellipsis">
<div class="auth-list J-auth-list">
<div class="auth">
<span class="auth-gold-span">
<img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
<div class="tip arrow-bottom tip-gold">
<div class="tip-con">
<p class="tip-para">Suppliers with verified business licenses</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</span>
</div>
<div class="auth auth-as">
<span class="as-logo" reportusable="reportUsable">
<input type="hidden" value="KokmJZBlCzWY"/>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
<img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
</a>
</span>
</div>
<div class="auth auth-360 J-panorama" style="display: inline-block;">
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
<img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
</a>
<div class="tip arrow-bottom tip-360">
<div class="tip-con">
<p class="tip-para">360° Virtual Tour</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</div>
</div>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
<span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
</a>
</div>'''
html = etree.HTML(s)   # 将字符串转成etree对象
result = html.xpath('//a/@href')[-1]   # etree对象的xpath方法,根据xpath语法选内容。列出所有a的href属性,选最后一个。
print(result)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-5-2 19:43:46 | 显示全部楼层
本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑
import re
url = re.findall(r'<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="(.*?)" rel="nofollow" target="_blank">',html)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-5-2 19:44:53 | 显示全部楼层
本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑
print(url)
>>> url
['//cnxinre.en.made-in-china.com/360-Virtual-Tour.html']
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-5-3 21:32:39 | 显示全部楼层
用xpath啊 简单粗暴
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2021-5-3 21:33:31 | 显示全部楼层
直接右键点需要的项右键 copy里有xpath 直接复制呗
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-5-9 14:18:53 | 显示全部楼层

楼上的比你回复的早,我就给楼上了,谢谢你哦,你的也很好
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-1-15 22:40

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表