本帖最后由 景暄 于 2020-10-11 23:03 编辑
我用爬虫在微博关注界面提取出了含有关注人的文本,但无法精确提取,如果有知道的大佬的话,可以请教一下用xpath或者bs4等提取的方法吗?
想提取的文本已在下方用红线标红了图片
我这么写但提出来的是空列表targets = soup.find_all('div', class_="title W_fb W_autocut ")
for each in targets:
names.append(each.a.text)
print(names)
这是用爬虫提取出来的内容<script>FM.view({"ns":"pl.relation.myFollow.index","domid":"Pl_Official_RelationMyfollow__93","css":["style/css/module/pagecard/PCD_connectlist.css?version=825e5991ea0d00a7"],"js":"page/js/pl/relation/myFollow/index.js?version=84f2f62c1a6e1201","html":"<div class="WB_cardwrap S_bg2">\r\n <div class="PCD_connectlist PCD_connectlist_spe">\r\n <div class="WB_innerwrap">\r\n <div class="WB_tab_b" node-type="relationnav">\r\n <div class="opt_choose">\r\n <div class="inner S_line2 clearfix">\r\n <ul class="tab_ul tab_ul_s W_fl">\r\n <li class="tab_li"><span class="tab_item tab_cur S_line1 textcut"><span class="W_f14 S_txt1">全部关注<\/span><em class="attach S_txt1">2<\/em><em class="attach S_txt2" title=""><\/em><\/span><\/li>\r\n <\/ul>\r\n <\/div>\r\n <\/div>\r\n <div fixed-item="true">\r\n <div class="opt_bar clearfix S_bg2" node-type="navTools">\r\n <div class="W_fl">\r\n <a href="javascript:void(0);" class="btn_link S_txt1" action-type="batselect">批量管理<\/a>\r\n <a href="javascript:void(0);" class="btn_link S_txt1" node-type="sort_target">排序<em class="W_ficon ficon_arrow_down_lite S_ficon">g<\/em><\/a>\r\n <\/div>\r\n <div class="W_fr">\r\n <div class="search_box">\r\n <span class="WB_search_s"><input node-type="searchInput" type="text" value="输入昵称或备注" notice="输入昵称或备注" class="W_input"><span class="pos"><a href="javascript:void(0);" node-type="searchBtn" title="搜索" class="W_ficon ficon_search S_ficon">f<\/a><\/span><\/span>\r\n <\/div>\r\n <\/div>\r\n <\/div>\r\n <div class="opt_bar clearfix S_bg2" node-type="batnavTools" style="display:none">\r\n <div class="W_fl">\r\n <a href="javascript:void(0);" class="W_btn_b W_btn_b_disable" node-type="addToOtherGroupBtn" action-type="add_to_other_group">添加到<em class="W_ficon ficon_arrow_down_lite S_ficon">g<\/em><\/a>\r\n <a href="javascript:void(0);" class="W_btn_b W_btn_b_disable" node-type="cancelFollowBtn" action-type="cancel_follow_all">取消关注<\/a>\r\n <a href="javascript:void(0);" class="W_btn_b W_btn_b_disable" node-type="addSpecialBtn" action-type="add_special_all" suda-uatrack="key=weibo_pc_PostFollow_FollowList&value=FollowListHost_UserCard_SpeFol">添加特别关注<\/a>\r\n <a href="javascript:void(0);" class="W_btn_b" action-type="unbatselect">退出批量管理<\/a>\r\n <span style="display: none" node-type="select_text">\r\n <span class="text">已选择<em class="num" node-type="count_Num">0<\/em>人<\/span>\r\n <a href="javascript:void(0);" action-type="cancel_select">取消选择<\/a>\r\n <\/span>\r\n <\/div>\r\n <\/div>\r\n <\/div>\r\n <\/div>\r\n <div class="layer_menu_list" style="display:none;" node-type="sort_layer">\r\n <ul>\r\n <li class="cur"><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=1#_0">全部关注<\/a><\/li>\r\n <li ><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?ftype=1&t=1#_0">互相关注<\/a><\/li>\r\n <li class="line"><\/li>\r\n <\/ul>\r\n <ul >\r\n <li class="cur"><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=1#_0">按关注时间排序<\/a><\/li>\r\n <li ><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=2#_0">按昵称首字母排序<\/a><\/li>\r\n <li ><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=3#_0">按最近更新排序<\/a><\/li>\r\n <li ><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=4#_0">按最近联系排序<\/a><\/li>\r\n <li ><a bpfilter="page" href="\/p\/1005057317518635\/myfollow?t=5#_0">按粉丝数排序<\/a><\/li>\r\n <\/ul>\r\n <\/div>\r\n <div class="member_box" node-type="groupContainer">\r\n <ul class="member_ul clearfix" node-type="relation_user_list">\r\n <li class="member_li S_bg1" node-type="user_item" action-type="user_item" action-data="uid=5715470927&profile_image_url=https:\/\/tvax3.sinaimg.cn\/crop.0.0.828.828.50\/006eNx4Xly8gcdc4ek0t9j30n00n0gn3.jpg?KID=imgbed,tva&Expires=1602436541&ssig=i%2Fpst5IybB&gid=0&gname=未分组&screen_name=一大罐柚子酱&sex=f">\r\n <div class="member_wrap clearfix">\r\n <div class="mod_pic S_line1">\r\n <p class="pic_box"><a action-type="ignore_list" target="_blank" href="\/u\/5715470927?from=myfollow_all" class=""><img src="https:\/\/tvax3.sinaimg.cn\/crop.0.0.828.828.50\/006eNx4Xly8gcdc4ek0t9j30n00n0gn3.jpg?KID=imgbed,tva&Expires=1602436541&ssig=i%2Fpst5IybB" title="一大罐柚子酱" usercard="id=5715470927" width="50" height="50" alt="一大罐柚子酱" class="W_face_radius"><\/a><\/p>\r\n <\/div>\r\n <div class="mod_info">\r\n <div class="title W_fb W_autocut ">\r\n <a target="_blank" action-type="ignore_list" node-type="screen_name" href="\/u\/5715470927?from=myfollow_all" class="S_txt1" title="一大罐柚子酱" usercard="id=5715470927" >一大罐柚子酱<\/a>\r\n \t\t\t\t \t\t\t\t\t \t\t\t\t\t \t\t\t\t \t\t\t\t\r\n <\/div>\r\n <div class="statu">\r\n <em class="W_ficon ficon_addtwo S_ficon">Z<\/em><span class="S_txt1">互相关注<\/span>\r\n <\/div>\r\n <div class="text W_autocut S_txt2">\r\n 简介:事如春梦了无痕。 <\/div>\r\n <div class="info_from S_txt2">\r\n \t\t\t\t\t\t通过<a href="http:\/\/app.weibo.com\/t\/feed\/6vtZb0" class="S_link2" >微博 weibo.com<\/a>关注\t\t\t\t\t<\/div>\r\n <div class="opt">\r\n <p class="btn_bed">\r\n <a class="W_btn_b" action-data="gid=0&nick=一大罐柚子酱&uid=5715470927&sex=f" diss-data="refer_sort=relationManage&location=myfollow&refer_flag=add" action-type="relation_setGroup" node-type="setGroupBtn" href="javascript:void(0);" title="未分组">\r\n <span node-type="groupName" class="txt W_autocut">未分组<\/span>\r\n <em class="W_ficon ficon_arrow_down_lite S_ficon">g<\/em>\r\n <\/a>\r\n <a class="W_btn_b btn_spe" action-type="special_follow" href="javascript:void(0);" action-data="uids=5715470927">\r\n <em class="W_ficon S_ficon ficon_add">+<\/em>特别关注\r\n <\/a>\r\n <a class="W_btn_b btn_set" action-type="relation_hover"><em node-type="setGroupIcon" class="W_ficon ficon_setup S_ficon">J<\/em><\/a>\r\n <\/p>\r\n <div class="layer_menu_list layer_spe" style="display:none;position:absolute;z-index:99;" node-type="special_unFollow_list" action-type="special_unFollow_hover">\r\n <ul>\r\n <li><a href="javascript:void(0);" action-type="special_unFollow" action-data="remove=0">移出特别关注<\/a><\/li>\r\n <\/ul>\r\n <\/div>\r\n <div class="layer_menu_list" style="display:none;" node-type="layer_hover_list" action-type="relation_hover_more">\r\n <ul>\r\n \t <li><a href="javascript:void(0);" action-type="webim.conversation" action-data="uid=5715470927&nick=一大罐柚子酱">私信<\/a><\/li>\r\n <li><a href="javascript:void(0);" action-type="relation_setRemark" action-data="uid=5715470927">设置备注<\/a><\/li>\r\n <li><a href="javascript:void(0);" action-type="cancel_follow_single">取消关注<\/a><\/li>\r\n <\/ul>\r\n <\/div>\r\n <\/div>\r\n \r\n <\/div>\r\n <\/div>\r\n <div class="markup_choose"><\/div>\r\n <\/li>\r\n <li class="member_li S_bg1" node-type="user_item" action-type="user_item" action-data="uid=3069466401&profile_image_url=https:\/\/tvax3.sinaimg.cn\/crop.0.0.1080.1080.50\/b6f45721ly8ggzux431r6j20u00u00v4.jpg?KID=imgbed,tva&Expires=1602436541&ssig=znzklLNX0M&gid=0&gname=未分组&screen_name=DavidDWayne&sex=m">\r\n <div class="member_wrap clearfix">\r\n <div class="mod_pic S_line1">\r\n <p class="pic_box"><a action-type="ignore_list" target="_blank" href="\/u\/3069466401?from=myfollow_all" class=""><img src="https:\/\/tvax3.sinaimg.cn\/crop.0.0.1080.1080.50\/b6f45721ly8ggzux431r6j20u00u00v4.jpg?KID=imgbed,tva&Expires=1602436541&ssig=znzklLNX0M" title="DavidDWayne" usercard="id=3069466401" width="50" height="50" alt="DavidDWayne" class="W_face_radius"><\/a><\/p>\r\n <\/div>\r\n <div class="mod_info">\r\n <div class="title W_fb W_autocut ">\r\n <a target="_blank" action-type="ignore_list" node-type="screen_name" href="\/u\/3069466401?from=myfollow_all" class="S_txt1" title="DavidDWayne" usercard="id=3069466401" >DavidDWayne<\/a>\r\n \t\t\t\t <a target="_blank" href="\/\/verified.weibo.com\/verify"><i title= "微博个人认证 " class="W_icon icon_approve"><\/i><\/a> \t\t\t\t\t<a title="微博会员" target="_blank" href="https:\/\/vip.weibo.com\/personal?from=main" action-type="ignore_list"suda-uatrack="key=profile_head&value=member_guest"><em class="W_icon icon_member6"><\/em><\/a> \t\t\t\t\t \t\t\t\t \t\t\t\t\r\n <\/div>\r\n <div class="statu">\r\n <em class="W_ficon ficon_right S_ficon">Y<\/em><span class="S_txt1">已关注<\/span>\r\n <\/div>\r\n <div class="text W_autocut S_txt2">\r\n 设计美学博主 <\/div>\r\n <div class="info_from S_txt2">\r\n \t\t\t\t\t\t通过<a href="http:\/\/app.weibo.com\/t\/feed\/6c3EMN" class="S_link2" >头条文章<\/a>关注\t\t\t\t\t<\/div>\r\n <div class="opt">\r\n <p class="btn_bed">\r\n <a class="W_btn_b" action-data="gid=0&nick=DavidDWayne&uid=3069466401&sex=m" diss-data="refer_sort=relationManage&location=myfollow&refer_flag=add" action-type="relation_setGroup" node-type="setGroupBtn" href="javascript:void(0);" title="未分组">\r\n <span node-type="groupName" class="txt W_autocut">未分组<\/span>\r\n <em class="W_ficon ficon_arrow_down_lite S_ficon">g<\/em>\r\n <\/a>\r\n <a class="W_btn_b btn_spe" action-type="special_follow" href="javascript:void(0);" action-data="uids=3069466401">\r\n <em class="W_ficon S_ficon ficon_add">+<\/em>特别关注\r\n <\/a>\r\n <a class="W_btn_b btn_set" action-type="relation_hover"><em node-type="setGroupIcon" class="W_ficon ficon_setup S_ficon">J<\/em><\/a>\r\n <\/p>\r\n <div class="layer_menu_list layer_spe" style="display:none;position:absolute;z-index:99;" node-type="special_unFollow_list" action-type="special_unFollow_hover">\r\n <ul>\r\n <li><a href="javascript:void(0);" action-type="special_unFollow" action-data="remove=0">移出特别关注<\/a><\/li>\r\n <\/ul>\r\n <\/div>\r\n <div class="layer_menu_list" style="display:none;" node-type="layer_hover_list" action-type="relation_hover_more">\r\n <ul>\r\n \t <li><a href="javascript:void(0);" action-type="webim.conversation" action-data="uid=3069466401&nick=DavidDWayne">私信<\/a><\/li>\r\n <li><a href="javascript:void(0);" action-type="relation_setRemark" action-data="uid=3069466401">设置备注<\/a><\/li>\r\n <li><a href="javascript:void(0);" action-type="cancel_follow_single">取消关注<\/a><\/li>\r\n <\/ul>\r\n <\/div>\r\n <\/div>\r\n \r\n <\/div>\r\n <\/div>\r\n <div class="markup_choose"><\/div>\r\n <\/li>\r\n <\/ul>\r\n <\/div>\r\n <\/div>\r\n <\/div>\r\n <input type="hidden" node-type="hidden" action-data="is_special=0" value="allFollow" gname="0"\/>\r\n<\/div>\r\n"})</script>
你要提取关注人列表,试试这个网址,提取出来的是json数据
|