|
发表于 2021-8-23 21:10:35
|
显示全部楼层
本楼为最佳答案
本帖最后由 白two 于 2021-8-23 23:32 编辑
可以解析到,但不知道为啥 xpath 不行
我把网页源码保存下来在本地的文档看着写的 xpath,和根据 element 写的不一样
但是,不知道为啥返回的结果还是空列表
然后换了bs4,就能正常解析出来:
- import requests
- from bs4 import BeautifulSoup
- #
- # headers ={
- # 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
- # 'cookie': 'hb_MA-BFF5-63705950A31C_source=www.baidu.com; _ga=GA1.2.779947586.1629010530; NTES_P_UTID=BB2HIPNo4KdwUXKZ9kIYM1Omkebmu53Y|1629010584; NTES_SESS=JB_vEDJu463hs8jXyUYRk_J6bFWPAuexS4Q9Z69PxTT_xaw1xYbLveyv8HOW7utrefHaM_ZZ64hZz5bP7yj5gUg7xThkW4M1oO.NvmQf6VgWXrtGKaQQ2VkXTf18_K8Ef9skq9A0Zebr3io3Yb.bo6mxvcdt0cfMnGCnTFnNl6l_35h7Jr69dEVthWDZu3DpUeWY6ZP81ulGe; S_INFO=1629010584|0|3&80##|taoist_two; P_INFO=taoist_two@163.com|1629010584|0|epay|00&99|sic&1627108648&epay#sic&510100#10#0#0|&0|youdaodict_client|taoist_two@163.com; _ntes_nuid=f5692e5033bacce44b5c418c6f98c274; _ntes_nnid=d8212b1a00fd27dd2b3836d19a9a1ce9,1629728232401; s_n_f_l_n3=10ca5d99d778f4f11629728232406; BAIDU_SSP_lcr=https://fishc.com.cn/; cm_newmsg=user%3Dtaoist_two%40163.com%26new%3D2%26total%3D9; NTES_CMT_USER_INFO=468769398%7C%E6%9C%89%E6%80%81%E5%BA%A6%E7%BD%91%E5%8F%8B0rYdFS%7Chttp%3A%2F%2Fcms-bucket.nosdn.127.net%2F2018%2F08%2F13%2F078ea9f65d954410b62a52ac773875a1.jpeg%7Cfalse%7CdGFvaXN0X3R3b0AxNjMuY29t; ne_analysis_trace_id=1629729616665; vinfo_n_f_l_n3=10ca5d99d778f4f1.1.0.1629728232406.0.1629729616997'
- # }
- # a = requests.get('https://war.163.com/',headers = headers)
- # m = a.text
- # with open('html1.txt','w',encoding='gbk') as f:
- # f.write(m)
- # print(m)
- with open ('html1.txt','r',encoding = 'gbk') as f:
- m = f.read()
- html_tree = BeautifulSoup(m,'lxml')
- tgt = html_tree.find('a',class_ = 'photo')
- print(tgt['title'])
复制代码
这是结果:
是可以解析出来的,用 bs4 没问题,为啥 xpath 会出问题我在研究研究 |
|