|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
import requests
from lxml import etree
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
# 'Host':'www.xbiquge.la'
}
url = 'https://www.xbiquge.la'
# 发起请求
html_url = requests.get(url)
# 解析后的地址进行转码
#html_url.encoding = html_url.apparent_encoding
html_url.encoding = 'utf-8'
html_code = html_url.text
html_content = etree.HTML(html_code)
html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a')[0]
print(html_word)
xpath不像bs那样直接打印节点码源,想打印码源得这样:
- html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a')[0]
- pt = etree.tostring(html_word, encoding='unicode')
- print(pt)
复制代码
但如果你要获取文本,这样既可: - html_word = html_content.xpath('//div[@id="main"]/div[1]/div[1]/div[1]/dl/dt/a/text()')[0]
复制代码
|
|