[已解决]selenium新手求助

March2615 · 发表于 2020-4-29 20:40:54

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

想通过selenium下载文献
举例网址 https://sci-hub.tw/10.1021/cm0608903

图片是检查下载按钮的元素

这是报错信息

d = c.find_element_by_xpath('/html/body/viewer-pdf-toolbar//div[1]/div[1]/div[2]/cr-icon-button[2]//div/iron-icon')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/viewer-pdf-toolbar//div[1]/div[1]/div[2]/cr-icon-button[2]//div/iron-icon"}
(Session info: chrome=81.0.4044.122)

复制代码

是我哪里出错了吗
请大佬指教或者贴一个能实现保存的代码？谢谢了

最佳答案

月排行榜 / 总排行榜

颜栩栩

2020-4-30 11:24:06

你好，我尝试用selenium实现了你说的下载pdf 的功能，网页打开可能会有一些慢。
我这里下载pdf文件使用的是界面左边的↓save按钮。如果出现了↓save按钮但是网页还在加载中，可以先停止加载，有这个按钮就能实现下载功能啦！

from selenium import webdriver
import bs4
browser = webdriver.Chrome()
browser.get("https://sci-hub.tw/10.1021/cm0608903")
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
browser.find_element_by_xpath("//div[@id='buttons']/ul/li/a").click()

复制代码

跳转到最佳答案楼层

xiangjianshinan · 发表于 2020-4-30 08:53:06

c.find_element_by_xpath 建议你再去看下说明！！！

不需要从顶层开始的。

尝试： c.find_element_by_xpath(//cr-icon-button[@id='download'])

March2615 · 发表于 2020-4-30 08:58:35

xiangjianshinan 发表于 2020-4-30 08:53
c.find_element_by_xpath 建议你再去看下说明！！！

不需要从顶层开始的。

试过了，一样的结果，我才尝试完整路径的

xiangjianshinan · 发表于 2020-4-30 08:59:55

可以给代码及网址吗？

我刚在上星期六、日做了一个小程序，用来自动打开网页，查询相关内容。

March2615 · 发表于 2020-4-30 09:05:56

xiangjianshinan 发表于 2020-4-30 08:59
可以给代码及网址吗？

我刚在上星期六、日做了一个小程序，用来自动打开网页，查询相关内容。

网址就是上面的举例网址
代码是直接在IDLE里敲的，因为只是初学，想试试能不能直接下载
如果能的话再写完整的爬取（然而并不能）

March2615 · 发表于 2020-4-30 09:12:40

xiangjianshinan 发表于 2020-4-30 08:59
可以给代码及网址吗？

我刚在上星期六、日做了一个小程序，用来自动打开网页，查询相关内容。

from selenium import webdriver
client = webdriver.Chrome()
client.get('https://sci-hub.tw/10.1021/cm0608903')
download_button = client.find_element_by_xpath("//cr-icon-button[@id='download']")

复制代码

只有这么点，就报错了
报错信息如下

Traceback (most recent call last):
File "E:/Data_storage/PyCharm/learn/LeetCode/Explore/test2.py", line 5, in <module>
download_button = client.find_element_by_xpath("//cr-icon-button[@id='download']")
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "D:\Install\Python-3.5.2\install\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//cr-icon-button[@id='download']"}
(Session info: chrome=81.0.4044.129)

复制代码

还有，是不是找到之后用.click()就能实现下载呢

xiangjianshinan · 发表于 2020-4-30 11:20:35

import time
from selenium import webdriver
chrome_driver = r'C:\Users\Administrator\AppData\Local\Google\Chrome\Application\chromedriver.exe'
#chromedriver的文件位置
client = webdriver.Chrome(executable_path = chrome_driver)
client.get('https://sci-hub.tw/10.1021/cm0608903')
print("打开网页中……等待30秒！因为我这里网速很慢，下面的等待时间按你的网速自动调整。")
time.sleep(30)
#download_button = client.find_element_by_xpath("//cr-icon-button[@id='download']")
#<a href = #
download_button = client.find_element_by_xpath("//a[@href='#']")
download_button.click()
print('已经点击了，有效果吗？下面也是等待下载的时间。')
time.sleep(30)
client.close()
print('已经退出啦！！')

复制代码

我听本论坛的大神说，vscode好用，故这几天在弄这个。

我被你带到沟里了，你需要看网页源码。而不是其他的。

请尝试！！！

颜栩栩 · 发表于 2020-4-30 11:24:06

你好，我尝试用selenium实现了你说的下载pdf 的功能，网页打开可能会有一些慢。
我这里下载pdf文件使用的是界面左边的↓save按钮。如果出现了↓save按钮但是网页还在加载中，可以先停止加载，有这个按钮就能实现下载功能啦！

from selenium import webdriver
import bs4
browser = webdriver.Chrome()
browser.get("https://sci-hub.tw/10.1021/cm0608903")
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
browser.find_element_by_xpath("//div[@id='buttons']/ul/li/a").click()

复制代码

March2615 · 发表于 2020-4-30 14:17:23

颜栩栩发表于 2020-4-30 11:24
你好，我尝试用selenium实现了你说的下载pdf 的功能，网页打开可能会有一些慢。
我这里下载pdf文件使用的 ...

那个save按钮不能自己设置文件名和文件路径，
所以我才想用右边PDF的下载图标
因为批量下载需要设置文件名

March2615 · 发表于 2020-4-30 14:18:40

xiangjianshinan 发表于 2020-4-30 11:20
我听本论坛的大神说，vscode好用，故这几天在弄这个。

我被你带到沟里了，你需要看网页源码。而不 ...

这个方法是可以，但是点击之后就直接保存了
我需要批量下载的同时给文件重命名以及给特定文件夹
右边的下载按钮可以选择，左边的save按钮不知道怎么选择

颜栩栩 · 发表于 2020-4-30 14:20:22

【pdf下载】csdn上获取下载pdf文件的方法，侵删

import requests
# python requests 的pdf 文档
requests_pdf_url = "https://buildmedia.readthedocs.org/media/pdf/requests/master/requests.pdf"
r = requests.get(requests_pdf_url)
filename = "requests.pdf"
with open(filename, 'wb+') as f:
f.write(r.content)

复制代码

可以试下with open，这个可以指定保存的文件名、文件路径

颜栩栩 · 发表于 2020-4-30 14:21:53

然后pdf的url地址可以在源码这个地方获取到

March2615 · 发表于 2020-4-30 14:40:27

颜栩栩发表于 2020-4-30 14:21
然后pdf的url地址可以在源码这个地方获取到

OK谢谢
那我就不需要用selenium了

账号		自动登录	找回密码
密码			立即注册

[已解决]selenium新手求助

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块