马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
想爬取微博热搜的,内容都在<li class="clearfix"></li>里,打印出来的源码里也有<li>标签,但是用bs4和正则就是找不到,是我哪里写错了吗,求指导。
import bs4,re,requests
def open_url(url):
head = {
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
'cookie': 'SUHB=0P1cj8UzrPZieU; ALF=1612759724; UOR=login.sina.com.cn,widget.weibo.com,hs.blizzard.cn; SINAGLOBAL=368489229629.54724.1584496033619; SUB=_2AkMpoQ7pf8NxqwJRmfgXym_lbI13yw_EieKf_f8yJRMxHRl-yT9jqkIrtRB6AiEgBqMePr49zcjIOmMqSFuMxuXBHrjG; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WhTNUK1iYkr2EuR8lSXbYID; login_sid_t=804e3dc45618b7bcbf86163d280ec29f; cross_origin_proto=SSL; Ugrow-G0=1ac418838b431e81ff2d99457147068c; YF-V5-G0=b1b8bc404aec69668ba2d36ae39dd980; WBStorage=42212210b087ca50|undefined; _s_tentry=-; wb_view_log=1536*8641.25; Apache=5618895476598.254.1594094991152; ULV=1594094991156:4:2:1:5618895476598.254.1594094991152:1593672161009',
}
res = requests.get(url,headers=head)
return res
def get_some(res):
soup = bs4.BeautifulSoup(res.text,'html.parser')
print(res.text)
hot_list = re.search(r'<li class="clearfix">',res.text)
#hot_list = soup.find('li',class_='clearfix')
print(hot_list)
def main():
url = 'https://weibo.com/'
res = open_url(url)
get_some(res)
if __name__ == "__main__":
main()
看看下面两篇文章,只是为了爬微博热门话题:
https://blog.csdn.net/lwgkzl/article/details/89237060
https://blog.csdn.net/qq_38316655/article/details/80671358
还是慢慢来吧,爬虫要学的东西很多
|