selenium的安装及其基本使用

H原子 · 发表于 2021-8-5 10:51:01

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 H原子于 2021-8-5 10:56 编辑

编辑时间：2021.8.5
作者：H原子
>>开篇语
Selenium是一个自动化测试工具，利用它我们可以驱动浏览器执行特定的动作，如点击、下拉等操作。对于一些javaScript渲染的页面来说，这种抓取方式非常有效。

>>Selenium的安装

1.相关链接
官方网站：http://www.seleniumhq.org
GitHub：http://github.com/SeleniumHQ/selenium/tree/master/py
PyPI：http://pypi.python.org/pypi/selenium
官方文档：http://selenium-python.readthedocs.io
中文文档：http://selenium-python-zh.readthedocs.io

2.pip安装
pip install selenium

>>ChromeDriver的安装

1.相关链接
官方网站：https://sites.google.com/a/chromium.org/chromedriver
下载地址：https://chromedriver.storage.googleapis.com/index.html

2.查看自己的chrome版本
点击Chrome菜单“帮助”——“关于Google Chrome”，即可查看Chrome的版本号，如图：

chrome版本信息

**注意**：请记住Chrome版本号，因为选择ChromeDriver版本时需要用到。

3.下载 ChromeDriver
打开下载地址链接（见上述相关链接），选择与自己chrome版本相匹配的，如图：

chromedriver下载界面

**注意**：在windows下，建议直接将chromedriver.exe文件拖到python的Scripts目录下，
此外，也可以单独将其所在路径配置到环境变量

>>selenium的基本使用

以下是我的一些笔记（后续会介绍一个基于selenium的12306的抢票脚本）
更多方法请参考：http://selenium-python.readthedocs.io/api.html
（1）获取网页源代码
driver.page_source#通过page_source获取网页源代码

（2）关闭页面或浏览器
driver.close()#关闭当前的页面
driver.quit()#关闭整个浏览器

（3）查找元素
find_element_by_id:根据id来查找某个元素。
find_element_by_class_name:根据类名查找元素。
find_element_by_name:根据name属性的值来查找元素。
find_element_by_tag_name:根据标签名来查找元素
find_element_by_xpath:根据xpath语法来获取元素
find_element_by_css_selector:根据css选择器选择元素

另一种写法：from selenium.webdriver.common.by import By
find_element(By.ID,'id')

**注意**：find_element是获取第一个满足条件的元素，
find_elements是获取所有满足条件的元素

（4）选择select
#导入类
from selenium.webdriver.support.ui import Select
#选中这个标签，然后使用Select创建对象
selectTag = Select(driver.find_element_by_name("jumpMenu"))
#根据索引选择
selectTag.select_by_index(1)
#根据值选择
selectTag.select_by_value("http://www....")
#根据可视的文本选择
selectTag.select_by_visible_text("文本内容")

（5）行为链
from selenium.webdriver.common.action_chains import ActionChains
实例：
inputTag = driver.find_element_by_id('kw')
searchTag = driver.find_element_by_id('su')
actions = ActionChains(driver)
actions.move_to_element(inputTag)
actions.send_keys_to_element(inputTag,'python之父')
actions.move_to_element(searchTag)
actions.click(searchTag)
actions.perform()

更多的鼠标相关操作：
elem.send_keys():向表单填充数据
elem.click():模拟鼠标点击
elem.click_and_hold():点击但不松开鼠标
elem.context_click():点击右键
elem.double_click():双击

**注意**：需要先找到对应标签(elem)才能调用这些方法模拟事件

键盘操作：
from selenium.webdriver.common.keys import Keys
elem.send_keys(Keys.RETURN)#回车键
实例：按ctrl+c
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
key_down()#按住键
key_up()#松开键
补充：
elem.clear()#清空元素中默认文本
elem.text #获取元素中的文本内容

（6）操作cookie
driver.get_cookies():获取所有的cookie
driver.get_cookie(key):根据cookie的key获取cookie
driver.delete_all_cookies():删除所有的cookie
driver.delete_cookie(key):根据cookie的key删除cookie
driver.add_cookie({"name":"username","value":"abc"}):添加cookie

（7）隐式等待（不常用知道就行）
driver.implicitly_wait(10)#seconds

（8）显式等待
导入类：
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
实例：
#等待指定元素(WaitElement)出现
WebDriverWait(driver,10).until(
EC.presence_of_element_located((By.ID,"WaitElement"))
)
#等待指定元素中的value=WaitText(特定文本)
WebDriverWait(driver,10).until(
EC.text_to_be_present_in_element_value((By.ID,WaitElement),WaitText)
)
#等待页面跳转至指定url页面
WebDriverWait(driver,10).until(
EC.url_to_be(WaitURL)/url_contains(WaitURL)
)
#等待目标按钮可以点击
WebDriverWait(driver,10).until(
EC.element_to_be_clickable((By.ID,WaitElement))
)

（9）异常
from selenium.common.exceptions import *****Exception
已遇到：
NoSuchElementException:找不到元素时抛出。
ElementNotVisibleException:当元素存在于 DOM 上时抛出，但它不可见，因此无法与之交互。在尝试单击或阅读隐藏在视图中的元素的文本时最常遇到。
ElementNotInteractableException:当元素存在于 DOM 中但与该元素的交互将命中另一个元素时抛出

（10）打开多窗口
selenium中没有专门的打开新窗口的方法，是通过window.execute_script()来执行js脚本的形式来打开新窗口的
driver.execute_script("window.open('https://www.douban.com/')")#执行js代码打开新窗口

（11）页面切换
driver.switch_to.window(driver.window_handles[1])#driver.window_handles将所有页面以列表的形式返回

（12）设置代理ip
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server=http://110.52.235.53:9999")
driver = webdriver.Chrome(executable_path=driver_path,chrome_options=options)

（13）其它
webelement.get_property:获取html的官方属性对应的值
webelement.get_attribute:获取这个标签的某个属性的值（包含自定义的属性）
driver.save_screenshot("1.png"):获取当前页面的截图(只能在driver上使用)

>>结束语

希望大家能帮忙一起补充完善，尤其是异常处理和显示等待方面，让本帖充实起来

手动比心^_^

账号		自动登录	找回密码
密码			立即注册

[技术交流] selenium的安装及其基本使用

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块