lizhiyong_11 发表于 2021-5-1 10:46:43

各位好,我想提取倒数第4行的网址,该怎么操作呢?

各位好,我想提取倒数第4行的网址//cnxinre.en.made-in-china.com/360-Virtual-Tour.html

<div class="compnay-name-li J-compnay-name-li ellipsis">
<div class="auth-list J-auth-list">
<div class="auth">
<span class="auth-gold-span">
<img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
<div class="tip arrow-bottom tip-gold">
<div class="tip-con">
<p class="tip-para">Suppliers with verified business licenses</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</span>
</div>
<div class="auth auth-as">
<span class="as-logo" reportusable="reportUsable">
<input type="hidden" value="KokmJZBlCzWY"/>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
<img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
</a>
</span>
</div>
<div class="auth auth-360 J-panorama" style="display: inline-block;">
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
<img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
</a>
<div class="tip arrow-bottom tip-360">
<div class="tip-con">
<p class="tip-para">360° Virtual Tour</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</div>
</div>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
<span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
</a>
</div>

suchocolate 发表于 2021-5-1 10:46:44

本帖最后由 suchocolate 于 2021-5-2 09:10 编辑

from lxml import etree


s = '''<div class="compnay-name-li J-compnay-name-li ellipsis">
<div class="auth-list J-auth-list">
<div class="auth">
<span class="auth-gold-span">
<img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
<div class="tip arrow-bottom tip-gold">
<div class="tip-con">
<p class="tip-para">Suppliers with verified business licenses</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</span>
</div>
<div class="auth auth-as">
<span class="as-logo" reportusable="reportUsable">
<input type="hidden" value="KokmJZBlCzWY"/>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
<img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
</a>
</span>
</div>
<div class="auth auth-360 J-panorama" style="display: inline-block;">
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
<img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
</a>
<div class="tip arrow-bottom tip-360">
<div class="tip-con">
<p class="tip-para">360° Virtual Tour</p>
</div>
<span class="arrow arrow-out">
<span class="arrow arrow-in"></span>
</span>
</div>
</div>
</div>
<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
<span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
</a>
</div>'''
html = etree.HTML(s)   # 将字符串转成etree对象
result = html.xpath('//a/@href')[-1]   # etree对象的xpath方法,根据xpath语法选内容。列出所有a的href属性,选最后一个。
print(result)

Py与C。。。 发表于 2021-5-2 19:43:46

本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑

import re
url = re.findall(r'<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="(.*?)" rel="nofollow" target="_blank">',html)

Py与C。。。 发表于 2021-5-2 19:44:53

本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑

print(url)
>>> url
['//cnxinre.en.made-in-china.com/360-Virtual-Tour.html']

1589895304 发表于 2021-5-3 21:32:39

用xpath啊 简单粗暴

1589895304 发表于 2021-5-3 21:33:31

直接右键点需要的项右键 copy里有xpath 直接复制呗

lizhiyong_11 发表于 2021-5-9 14:18:53

Py与C。。。 发表于 2021-5-2 19:43


楼上的比你回复的早,我就给楼上了,谢谢你哦,你的也很好
页: [1]
查看完整版本: 各位好,我想提取倒数第4行的网址,该怎么操作呢?