[已解决]使用xpath筛选出的结果有疑惑

杨清玄 · 发表于 2021-11-22 20:55:52

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

import requests
from lxml import etree
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
# 'Host':'www.xbiquge.la'
}
url = 'https://www.xbiquge.la'
# 发起请求
html_url = requests.get(url)
# 解析后的地址进行转码
#html_url.encoding = html_url.apparent_encoding
html_url.encoding = 'utf-8'
html_code = html_url.text
html_content = etree.HTML(html_code)
html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a')[0]
print(html_word)

最佳答案

月排行榜 / 总排行榜

suchocolate

2021-11-22 21:08:45

xpath不像bs那样直接打印节点码源，想打印码源得这样：

html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a')[0]
pt = etree.tostring(html_word, encoding='unicode')
print(pt)

复制代码

但如果你要获取文本，这样既可：

html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a/text()')[0]

复制代码

跳转到最佳答案楼层

suchocolate · 发表于 2021-11-22 21:08:45

这个最佳答案由 suchocolate 给出，感谢 suchocolate 的回答。

单击隐藏图章

xpath不像bs那样直接打印节点码源，想打印码源得这样：

html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a')[0]
pt = etree.tostring(html_word, encoding='unicode')
print(pt)

复制代码

但如果你要获取文本，这样既可：

html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a/text()')[0]

复制代码

杨清玄 · 发表于 2021-11-22 23:27:20

多谢兔子大佬的指教

账号		自动登录	找回密码
密码			立即注册

[已解决]使用xpath筛选出的结果有疑惑

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块