|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import requests
- from lxml import etree
- if __name__=="__main__":
- #Ua伪装
- headers = {
- "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
- }
- #获取页面地址
- list_url = []
- url_1 = "https://sc.chinaz.com/jianli/free.html"
- list_url.append(url_1)
- # for i in range(2,4):
- # url_n = "https://sc.chinaz.com/jianli/free_{}.html".format(i)
- # list_url.append(url_n)
- page_text = requests.get(url=url_1,headers=headers).text
- tree = etree.HTML(page_text)%
- jianli_list = []
- jianli_div = tree.xpath("//div[@id='main']/div/a")
- for a in jianli_div:
- jianli_web = a.xpath("./@href/text()")[0]
- print(jianli_web)
复制代码
最后返回的结果是Process finished with exit code 0,我的目的就是想把网页里的url给添加到一个列表里去,怎么也成功不了,愁到凌晨4点了,跪请老鸟教育。。不知道是不是xpath写错了 还是什么情况
xpath语法错误,多学学基础。
这一段修改一下
- page_text = requests.get(url=url, headers=header).text
- tree = etree.HTML(page_text)
- jianli_list = []
- jianli_div = tree.xpath("//div[@id='main']/div/div")
- for a in jianli_div:
- jianli_web = 'https:' + a.xpath("a/@href")[0]
- print(jianli_web)
复制代码
|
|