[已解决]爬虫求助

1062195630 · 发表于 2021-7-12 09:38:22

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

from  lxml import etree
import requests
if __name__ == '__main__':
url='https://www.aqistudy.cn/historydata/'
headers={
      'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Mobile Safari/537.36'
}
all_hotcity=[]
all_allcity=[]
page_data=requests.get(url=url,headers=headers).text
tree=etree.HTML(page_data)
hot_li_list=tree.xpath('//div[@class="bottom"]/ul/li')
for hot_li in hot_li_list:
      hot_city=hot_li.xpath('./a/text()')[0]
      all_hotcity.append(hot_city)
all_ul_list=tree.xpath('//div[@class="bottom"]/ul')
for all_ul in all_ul_list:
      all_li_list=all_ul.xpath('./div[2]/li')
      for li in all_li_list:
         all_city=li.xpath('./a/text()')[0]
         all_allcity.append(all_city)
print(all_hotcity)
print(all_allcity)

为什么所有城市和热门城市都可以用div的class属性定位，所有城市的div class在后面，使用该定位的时候不会定位到前一个div class吗

最佳答案

月排行榜 / 总排行榜

Twilight6

2021-7-12 09:47:21

因为这里热门城市在类 bottom 所属标签下的 ul 标签

xpath 会获取所有符合条件的标签构成一个序列对象

不管前后只好符合条件都会在这个序列对象中

最后通过 for 循环就能依次读取该序列中符合条件的标签，以便后续进行数据提取

跳转到最佳答案楼层

Twilight6 · 发表于 2021-7-12 09:47:21

这个最佳答案由 Twilight6 给出，感谢 Twilight6 的回答。

单击隐藏图章

因为这里热门城市在类 bottom 所属标签下的 ul 标签

xpath 会获取所有符合条件的标签构成一个序列对象

不管前后只好符合条件都会在这个序列对象中

最后通过 for 循环就能依次读取该序列中符合条件的标签，以便后续进行数据提取

Jin_Yu · 发表于 2021-7-12 09:58:44

如果你不想全部都定位到，那就不要tree.xpath('//div[@class="bottom"]/ul') 这样写
写绝对路径，或者加多一层，tree.xpath('//div[@class="hot"]/div[@class="bottom"]/ul')
因为xpath表达式是匹配所有符合的，返回一个列表

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫求助

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块