|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import requests
- from lxml import etree
- headers = {
- 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'
- }
- url = 'https://s.weibo.com/top/summary'
- response = requests.get(url,headers = headers)
- html = etree.HTML(response.text)
- infos = html.xpath('//*[@id="pl_top_realtimehot"]/table/tbody')[0]
- ranks = infos.xpath('//tr/td[1]/text()')
- titles = infos.xpath('//tr/td[2]/a/text()')
- comments = infos.xpath('//tr/td[2]/span/text()')
- buzzs = infos.xpath('//tr/td[3]/i/text()')
- for rank,title,comment,buzz in zip(ranks,titles,comments,buzzs):
- print('('+rank + ')','('+title + ')','('+comment + ')','('+buzz + ')',)
复制代码
以微博热搜榜为例,有些热搜的最后会有一个 '热' 或者 '沸' 字,有些就没有,这样爬下来的数据会向前补齐,出现错误,怎么办? |
|