鱼C论坛

 找回密码
 立即注册
查看: 1545|回复: 5

[已解决]xpath有误

[复制链接]
发表于 2020-8-5 14:55:19 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
本帖最后由 xiaosi4081 于 2020-8-5 14:58 编辑

部分代码:
  1. try:
  2.     res = get(url,headers=headers).text
  3.     soup = BeautifulSoup(res,'lxml')
  4.     vulstring = ""
  5.     for target in soup.find_all("table",class_="plhin"):
  6.         content = etree.HTML(target.text).xpath('//*[@class="t_f"]/test()')
  7.         string = "%s:%s"% (target.find("div",class_="pls favatar").div.div.a.text,content)
  8.         vulstring += string
  9.         vulstring += "\n"
  10.    
  11.     print(vulstring)
  12. except exceptions.MissingSchema:
  13.     print('url有误')
复制代码


url是论坛上的帖子的地址,例:https://fishc.com.cn/thread-176366-1-1.html
主要是xpath的问题:

content = etree.HTML(target.text).xpath('//*[@class="t_f"]/test()')

报错:

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\x4\AppData\Local\Programs\Python\Python38\lib\tkinter\__init__.py", line 1883, in __call__
    return self.func(*args)
  File "d:\requests\getwangye.py", line 47, in <lambda>
    startButton = Button(frame1,text="start",command=lambda : self.getting(self.url.get()))
  File "d:\requests\getwangye.py", line 21, in getting
    content = etree.HTML(target.text).xpath('//*[@class="t_f"]/test()')
  File "src\lxml\etree.pyx", line 1582, in lxml.etree._Element.xpath
  File "src\lxml\xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__
  File "src\lxml\xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid expression
最佳答案
2020-8-5 15:01:15
噗,太粗心了,text打成了test
  1. try:
  2.     res = get(url,headers=headers).text
  3.     soup = BeautifulSoup(res,'lxml')
  4.     vulstring = ""
  5.     for target in soup.find_all("table",class_="plhin"):
  6.         content = etree.HTML(target.text).xpath('//*[@class="t_f"]/text()')
  7.         string = "%s:%s"% (target.find("div",class_="pls favatar").div.div.a.text,content)
  8.         vulstring += string
  9.         vulstring += "\n"
  10.    
  11.     print(vulstring)
  12. except exceptions.MissingSchema:
  13.     print('url有误')
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2020-8-5 15:01:15 | 显示全部楼层    本楼为最佳答案   
噗,太粗心了,text打成了test
  1. try:
  2.     res = get(url,headers=headers).text
  3.     soup = BeautifulSoup(res,'lxml')
  4.     vulstring = ""
  5.     for target in soup.find_all("table",class_="plhin"):
  6.         content = etree.HTML(target.text).xpath('//*[@class="t_f"]/text()')
  7.         string = "%s:%s"% (target.find("div",class_="pls favatar").div.div.a.text,content)
  8.         vulstring += string
  9.         vulstring += "\n"
  10.    
  11.     print(vulstring)
  12. except exceptions.MissingSchema:
  13.     print('url有误')
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-8-5 15:05:49 | 显示全部楼层
本帖最后由 xiaosi4081 于 2020-8-5 15:07 编辑
qiuyouzhi 发表于 2020-8-5 15:01
噗,太粗心了,text打成了test


但我这个代码没法弄到帖子的内容,那个xpath返回的是一个空列表,有什么其他的办法吗?

QQ截图20200805150654.png
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-8-5 15:07:07 | 显示全部楼层
xiaosi4081 发表于 2020-8-5 15:05
但我这个代码没法弄到帖子的内容,那个xpath返回的是一个空列表,有什么其他的办法吗?

咱把代码发完整呗
不然我运行不了
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-8-5 15:08:40 | 显示全部楼层
qiuyouzhi 发表于 2020-8-5 15:07
咱把代码发完整呗
不然我运行不了

我只是怕被人抄袭

代码:
getwangye.py:
  1. from tkinter import *
  2. from requests import get,exceptions
  3. import tkinter.messagebox
  4. import clipboard
  5. import tkinter.filedialog
  6. from bs4 import BeautifulSoup
  7. from lxml import etree
  8. class getmain:
  9.     def __init__(self,fm):
  10.         self.fm = fm
  11.         self.maincode()

  12.     def getting(self,url):
  13.         try:
  14.             headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36","Cookie":"oMVX_2132_saltkey=R74UkqJK; oMVX_2132_lastvisit=1595396069; oMVX_2132_auth=1693Do2EjOr2r8ngcWOluYDuwkzl3LKM8fJ4GL4MhznBYgtr4f80N8ED9JPlvRmsK4KaBbiuX%2FP92S7fwrGzxPc%2Fjnk; oMVX_2132_lastcheckfeed=881467%7C1595399689; oMVX_2132_atarget=1; oMVX_2132_onlineindex=1; oMVX_2132_lastviewtime=881467%7C1595500559; PHPSESSID=avg92ob7gcv06c47fam0im67m0; oMVX_2132_space_top_credit_881467_all=207; oMVX_2132_home_diymode=1; oMVX_2132_ignore_notice=1; oMVX_2132_smile=10D1; oMVX_2132_nofocus_forum=1; oMVX_2132_atlist=566159%2C854664%2C378930%2C702609%2C849582; oMVX_2132_ulastactivity=fc13JyOysM5ExnW8UyySLlDsqBBuncLY%2BI8IMERIoUu78vHR42tc; oMVX_2132_sid=SFa8Bf; oMVX_2132_lip=119.130.231.105%2C1596543963; oMVX_2132_st_t=881467%7C1596546913%7C2f0047256f62124008c3d602f9b61fd7; oMVX_2132_forum_lastvisit=D_39_1595489878D_171_1595680528D_360_1595923195D_243_1596006547D_33_1596011098D_188_1596099175D_38_1596534860D_173_1596546913; oMVX_2132_home_readfeed=1596546916; oMVX_2132_noticeTitle=1; acw_tc=781bad0915965482246555278e7efe279f8a3234011af062f779af11e2dc63; oMVX_2132_visitedfid=173D38D33D188D337D354D241D39D242D335; oMVX_2132_viewid=tid_176798; oMVX_2132_sendmail=1; oMVX_2132_checkpm=1; oMVX_2132_st_p=881467%7C1596548308%7Cc54b90b96d8abef819e979b0397a55cb; _fmdata=OQmTawF8D5QYYw5z1d7VRJZWZr08pj0Nh2V4cP0xTcWdnjXY%2BdGfHTtlF8ZCyqxRG6Ng5pc0cl9klF2pXVNj0STkj9ckn7q%2Fabe950w6FN4%3D; oMVX_2132_lastact=1596548308%09misc.php%09patch"}

  15.             res = get(url,headers=headers).text
  16.             soup = BeautifulSoup(res,'lxml')
  17.             vulstring = ""
  18.             for target in soup.find_all("table",class_="plhin"):
  19.                 content = str(etree.HTML(target.text).xpath('//*[@class="t_f"]/text()'))
  20.                 string = "%s:%s"% (target.find("div",class_="pls favatar").div.div.a.text,content)
  21.                 vulstring += string
  22.                 vulstring += "\n"
  23.             
  24.             self.result.delete(0.0,END)
  25.             self.result.insert(0.0,vulstring)
  26.         except exceptions.MissingSchema:
  27.             tkinter.messagebox.showerror('错误','url有误')
  28.     def copy(self):
  29.         clipboard.copy(self.result.get(0.0,END))  
  30.     def savefile(self):
  31.         path = tkinter.filedialog.asksaveasfile()
  32.         path.write(str(self.result.get(0.0,END)))
  33.         path.close()
  34.     def closewindow(self):
  35.         self.fm.destroy()
  36.         exit()
  37.     def maincode(self):
  38.         frame1 = LabelFrame(self.fm,text="input")
  39.         
  40.         urllabel = Label(frame1,text="url is:   ")
  41.         urllabel.pack()
  42.         urllabel.grid(row=1,column=1)
  43.         self.url = Entry(frame1)
  44.         self.url.grid(row=1,column=2)
  45.         startButton = Button(frame1,text="start",command=lambda : self.getting(self.url.get()))
  46.         startButton.grid(row=1,column=3)
  47.         frame1.pack()
  48.         resultFrame = LabelFrame(self.fm,text="result")
  49.         self.result = Text(resultFrame,width=35,height=15)
  50.         resultcopy = Button(resultFrame,text="复制到剪贴板",command=self.copy)
  51.         self.result.pack()
  52.         resultcopy.pack()
  53.         resultFrame.pack()
复制代码


main.py:
  1. # -*- coding: utf-8 -*-
  2. from requests import get
  3. from re import search
  4. import tkinter as tk
  5. import tkinter.messagebox
  6. from threading import Thread
  7. import time as ti
  8. from getwangye import getmain

  9. # 注:\1 用于引用前面编号为 1 的子组
  10. class fishc_get:
  11.     def __init__(self):
  12.         self.a = []
  13.         self.root = tk.Tk()
  14.         self.root.title("求助帖提醒")
  15.         self.fm1 = tk.LabelFrame(self.root,text="get")
  16.         self.fm1.grid(row=1,column=1)
  17.         self.fm2 = tk.LabelFrame(self.root,text="get_tiezi")
  18.         self.fm2.grid(row=1,column=2)
  19.         self.t = tk.Text(self.fm1)
  20.         self.t.pack()
  21.         getmain(self.fm2)

  22.         


  23.     def load(self):
  24.         while True:
  25.             res = get(f"https://fishc.com.cn/bestanswer.php?mod=huzhu&type=undo").text
  26.             # 获取问题帖的名字
  27.             name = search(r'<a href="https://fishc.com.cn/thread-\d+?-1-1.html" target="_blank">(.+?)</a>', res).group(1)
  28.             # 获取问题帖的URL
  29.             url = "https://fishc.com.cn/thread-" + search(
  30.                 r'<a href="https://fishc.com.cn/thread-(.+?)-1-1.html" target="_blank"', res).group(1) + "-1-1.html"
  31.             # 获取回答数
  32.             ans = search(r'<font color="#999999">(\d+?)</font>', res).group(1)
  33.             # 获取时间
  34.             time = search(r'<font color="#999999">(\d+?-\d+?-\d+? \d+?:\d+?)</font>', res).group(1)
  35.             if name not in self.a:
  36.                 b = f" 标题:{name}\n 回答数:{ans}\n 提问时间:{time}\n 地址:{url}\n\n"
  37.                 self.t.insert(tk.END, b)  # 打印相应的内容
  38.                 tkinter.messagebox.showwarning("提示", b)
  39.                 self.a.append(name)
  40.             ti.sleep(10)
  41.     def duoxian(self):
  42.         try:
  43.             self.func = Thread(target=self.load)
  44.             self.func.setDaemon(True)
  45.             self.func.start()
  46.             self.root.mainloop()
  47.         except:
  48.             ti.sleep(30)
  49.             self.duoxian()

  50. if __name__ == "__main__":
  51.     cl = fishc_get()
  52.     cl.duoxian()
  53.    
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-8-5 15:21:27 | 显示全部楼层
xiaosi4081 发表于 2020-8-5 15:05
但我这个代码没法弄到帖子的内容,那个xpath返回的是一个空列表,有什么其他的办法吗?

getwangye.py改成这样就行

  1. from tkinter import *
  2. from requests import get,exceptions
  3. import tkinter.messagebox
  4. import clipboard
  5. import tkinter.filedialog
  6. from bs4 import BeautifulSoup
  7. from lxml import etree
  8. class getmain:
  9.     def __init__(self,fm):
  10.         self.fm = fm
  11.         self.maincode()

  12.     def getting(self,url):
  13.         try:
  14.             headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36","Cookie":"oMVX_2132_saltkey=R74UkqJK; oMVX_2132_lastvisit=1595396069; oMVX_2132_auth=1693Do2EjOr2r8ngcWOluYDuwkzl3LKM8fJ4GL4MhznBYgtr4f80N8ED9JPlvRmsK4KaBbiuX%2FP92S7fwrGzxPc%2Fjnk; oMVX_2132_lastcheckfeed=881467%7C1595399689; oMVX_2132_atarget=1; oMVX_2132_onlineindex=1; oMVX_2132_lastviewtime=881467%7C1595500559; PHPSESSID=avg92ob7gcv06c47fam0im67m0; oMVX_2132_space_top_credit_881467_all=207; oMVX_2132_home_diymode=1; oMVX_2132_ignore_notice=1; oMVX_2132_smile=10D1; oMVX_2132_nofocus_forum=1; oMVX_2132_atlist=566159%2C854664%2C378930%2C702609%2C849582; oMVX_2132_ulastactivity=fc13JyOysM5ExnW8UyySLlDsqBBuncLY%2BI8IMERIoUu78vHR42tc; oMVX_2132_sid=SFa8Bf; oMVX_2132_lip=119.130.231.105%2C1596543963; oMVX_2132_st_t=881467%7C1596546913%7C2f0047256f62124008c3d602f9b61fd7; oMVX_2132_forum_lastvisit=D_39_1595489878D_171_1595680528D_360_1595923195D_243_1596006547D_33_1596011098D_188_1596099175D_38_1596534860D_173_1596546913; oMVX_2132_home_readfeed=1596546916; oMVX_2132_noticeTitle=1; acw_tc=781bad0915965482246555278e7efe279f8a3234011af062f779af11e2dc63; oMVX_2132_visitedfid=173D38D33D188D337D354D241D39D242D335; oMVX_2132_viewid=tid_176798; oMVX_2132_sendmail=1; oMVX_2132_checkpm=1; oMVX_2132_st_p=881467%7C1596548308%7Cc54b90b96d8abef819e979b0397a55cb; _fmdata=OQmTawF8D5QYYw5z1d7VRJZWZr08pj0Nh2V4cP0xTcWdnjXY%2BdGfHTtlF8ZCyqxRG6Ng5pc0cl9klF2pXVNj0STkj9ckn7q%2Fabe950w6FN4%3D; oMVX_2132_lastact=1596548308%09misc.php%09patch"}

  15.             res = get(url,headers=headers).text
  16.             soup = BeautifulSoup(res,'lxml')
  17.             vulstring = ""
  18.             content = str(etree.HTML(res).xpath('//*[@class="t_f"]/text()'))
  19.             for target in soup.find_all("table",class_="plhin"):
  20.                
  21.                 string = "%s:%s"% (target.find("div",class_="pls favatar").div.div.a.text,content)
  22.                 vulstring += string
  23.                 vulstring += "\n"
  24.             
  25.             self.result.delete(0.0,END)
  26.             self.result.insert(0.0,vulstring)
  27.         except exceptions.MissingSchema:
  28.             tkinter.messagebox.showerror('错误','url有误')
  29.     def copy(self):
  30.         clipboard.copy(self.result.get(0.0,END))  
  31.     def savefile(self):
  32.         path = tkinter.filedialog.asksaveasfile()
  33.         path.write(str(self.result.get(0.0,END)))
  34.         path.close()
  35.     def closewindow(self):
  36.         self.fm.destroy()
  37.         exit()
  38.     def maincode(self):
  39.         frame1 = LabelFrame(self.fm,text="input")
  40.         
  41.         urllabel = Label(frame1,text="url is:   ")
  42.         urllabel.pack()
  43.         urllabel.grid(row=1,column=1)
  44.         self.url = Entry(frame1)
  45.         self.url.grid(row=1,column=2)
  46.         startButton = Button(frame1,text="start",command=lambda : self.getting(self.url.get()))
  47.         startButton.grid(row=1,column=3)
  48.         frame1.pack()
  49.         resultFrame = LabelFrame(self.fm,text="result")
  50.         self.result = Text(resultFrame,width=35,height=15)
  51.         resultcopy = Button(resultFrame,text="复制到剪贴板",command=self.copy)
  52.         self.result.pack()
  53.         resultcopy.pack()
  54.         resultFrame.pack()
复制代码
小甲鱼最新课程 -> https://ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-6-24 20:33

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表