鱼C论坛

 找回密码
 立即注册
查看: 3276|回复: 6

[已解决]各位好,我想提取倒数第4行的网址,该怎么操作呢?

[复制链接]
发表于 2021-5-1 10:46:43 | 显示全部楼层 |阅读模式
10鱼币
各位好,我想提取倒数第4行的网址//cnxinre.en.made-in-china.com/360-Virtual-Tour.html

  1. <div class="compnay-name-li J-compnay-name-li ellipsis">
  2. <div class="auth-list J-auth-list">
  3. <div class="auth">
  4. <span class="auth-gold-span">
  5. <img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
  6. <div class="tip arrow-bottom tip-gold">
  7. <div class="tip-con">
  8. <p class="tip-para">Suppliers with verified business licenses</p>
  9. </div>
  10. <span class="arrow arrow-out">
  11. <span class="arrow arrow-in"></span>
  12. </span>
  13. </div>
  14. </span>
  15. </div>
  16. <div class="auth auth-as">
  17. <span class="as-logo" reportusable="reportUsable">
  18. <input type="hidden" value="KokmJZBlCzWY"/>
  19. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
  20. <img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
  21. </a>
  22. </span>
  23. </div>
  24. <div class="auth auth-360 J-panorama" style="display: inline-block;">
  25. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
  26. <img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
  27. </a>
  28. <div class="tip arrow-bottom tip-360">
  29. <div class="tip-con">
  30. <p class="tip-para">360° Virtual Tour</p>
  31. </div>
  32. <span class="arrow arrow-out">
  33. <span class="arrow arrow-in"></span>
  34. </span>
  35. </div>
  36. </div>
  37. </div>
  38. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
  39. <span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
  40. </a>
  41. </div>
复制代码
最佳答案
2021-5-1 10:46:44
本帖最后由 suchocolate 于 2021-5-2 09:10 编辑
  1. from lxml import etree


  2. s = '''<div class="compnay-name-li J-compnay-name-li ellipsis">
  3. <div class="auth-list J-auth-list">
  4. <div class="auth">
  5. <span class="auth-gold-span">
  6. <img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
  7. <div class="tip arrow-bottom tip-gold">
  8. <div class="tip-con">
  9. <p class="tip-para">Suppliers with verified business licenses</p>
  10. </div>
  11. <span class="arrow arrow-out">
  12. <span class="arrow arrow-in"></span>
  13. </span>
  14. </div>
  15. </span>
  16. </div>
  17. <div class="auth auth-as">
  18. <span class="as-logo" reportusable="reportUsable">
  19. <input type="hidden" value="KokmJZBlCzWY"/>
  20. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
  21. <img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
  22. </a>
  23. </span>
  24. </div>
  25. <div class="auth auth-360 J-panorama" style="display: inline-block;">
  26. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
  27. <img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
  28. </a>
  29. <div class="tip arrow-bottom tip-360">
  30. <div class="tip-con">
  31. <p class="tip-para">360° Virtual Tour</p>
  32. </div>
  33. <span class="arrow arrow-out">
  34. <span class="arrow arrow-in"></span>
  35. </span>
  36. </div>
  37. </div>
  38. </div>
  39. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
  40. <span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
  41. </a>
  42. </div>'''
  43. html = etree.HTML(s)   # 将字符串转成etree对象
  44. result = html.xpath('//a/@href')[-1]   # etree对象的xpath方法,根据xpath语法选内容。列出所有a的href属性,选最后一个。
  45. print(result)

复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-5-1 10:46:44 | 显示全部楼层    本楼为最佳答案   
本帖最后由 suchocolate 于 2021-5-2 09:10 编辑
  1. from lxml import etree


  2. s = '''<div class="compnay-name-li J-compnay-name-li ellipsis">
  3. <div class="auth-list J-auth-list">
  4. <div class="auth">
  5. <span class="auth-gold-span">
  6. <img alt="China Supplier - Diamond Member" class="auth-icon" src="//www.micstatic.com/gb/img/icon/diamond_member_16.png?_v=1619604439818"/>
  7. <div class="tip arrow-bottom tip-gold">
  8. <div class="tip-con">
  9. <p class="tip-para">Suppliers with verified business licenses</p>
  10. </div>
  11. <span class="arrow arrow-out">
  12. <span class="arrow arrow-in"></span>
  13. </span>
  14. </div>
  15. </span>
  16. </div>
  17. <div class="auth auth-as">
  18. <span class="as-logo" reportusable="reportUsable">
  19. <input type="hidden" value="KokmJZBlCzWY"/>
  20. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:4" class="J-prod-reportUrl" href="//cnxinre.en.made-in-china.com/audited-supplier-reports/report.html" rel="nofollow" target="_blank" title="">
  21. <img alt="Audited Suppliers" class="auth-icon" src="//www.micstatic.com/gb/img/icon/as-audited.png?_v=1619604439818"/>
  22. </a>
  23. </span>
  24. </div>
  25. <div class="auth auth-360 J-panorama" style="display: inline-block;">
  26. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:12" href="//world-port.made-in-china.com/viewVR?comId=KokmJZBlCzWY" rel="nofollow" target="_blank">
  27. <img class="auth-icon" src="//www.micstatic.com/gb/img/icon/360.png?_v=1516122340225"/>
  28. </a>
  29. <div class="tip arrow-bottom tip-360">
  30. <div class="tip-con">
  31. <p class="tip-para">360° Virtual Tour</p>
  32. </div>
  33. <span class="arrow arrow-out">
  34. <span class="arrow arrow-in"></span>
  35. </span>
  36. </div>
  37. </div>
  38. </div>
  39. <a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="//cnxinre.en.made-in-china.com/360-Virtual-Tour.html" rel="nofollow" target="_blank">
  40. <span title="Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.">Ningbo Beilun Xinre Machinery Manufacturing Co., Ltd.</span>
  41. </a>
  42. </div>'''
  43. html = etree.HTML(s)   # 将字符串转成etree对象
  44. result = html.xpath('//a/@href')[-1]   # etree对象的xpath方法,根据xpath语法选内容。列出所有a的href属性,选最后一个。
  45. print(result)

复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-5-2 19:43:46 | 显示全部楼层
本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑
  1. import re
  2. url = re.findall(r'<a ads-data="t:6,aid:,si:1,md:1,pdid:JXknSWNCXVpE,ps:998016.4,a:1,mds:30,c:3,pa:3" class="compnay-name J-compnay-name" href="(.*?)" rel="nofollow" target="_blank">',html)
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-5-2 19:44:53 | 显示全部楼层
本帖最后由 Py与C。。。 于 2021-5-2 20:33 编辑
  1. print(url)
复制代码
  1. >>> url
  2. ['//cnxinre.en.made-in-china.com/360-Virtual-Tour.html']
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-5-3 21:32:39 | 显示全部楼层
用xpath啊 简单粗暴
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-5-3 21:33:31 | 显示全部楼层
直接右键点需要的项右键 copy里有xpath 直接复制呗
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

 楼主| 发表于 2021-5-9 14:18:53 | 显示全部楼层

楼上的比你回复的早,我就给楼上了,谢谢你哦,你的也很好
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-6-24 04:43

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表