selenium爬取ajax网页遇到问题,Python交流,编程语言专区,鱼C论坛

zouyin 发表于 2021-6-11 23:54:52

selenium爬取ajax网页遇到问题

from selenium import webdriver
import time
from selenium.webdriver import ActionChains #滑动
from selenium.webdriver.common.by import By #选择器
from selenium.webdriver.common.by import By #按照什么方式查找，By.ID,By.CSS_SELECTOR
from selenium.webdriver.common.keys import Keys #键盘按键操作
from selenium.webdriver.support import expected_conditions as EC#等待所有标签加载完毕
from selenium.webdriver.support.wait import WebDriverWait #等待页面加载完毕寻找某些元素
from lxml import etree

# url = input('请输入网址：')
# 创建 WebDriver 对象，指明使用chrome浏览器驱动
wd = webdriver.Chrome(r'g:\chromedriver.exe')
wd.implicitly_wait(5)
# print(url)
# 调用WebDriver 对象的get方法可以让浏览器打开指定网址
wd.get("https://author.baidu.com/home?from=favor&hdegrade=1&context=%7B%22uk%22%3A%22Shg5be_YsjogfCgY4omgOw%22%7D")
# 根据id选择元素，返回的就是该元素对应的WebElement对象
# elements = wd.find_elements_by_css_selector("#app > div > div.app-module_contentWrapper_12u0y > div > div.app-module_leftSection_2GBVu > div.index-module_articleContainer_32gOp > div.index-module_contentContainer_3mQeg > div > span > span > span > span > span")
# for element in elements:
# print(element.get_attribute("outerHTML"))
while True:
element = wd.find_elements_by_class_name('more')
for page in element:
      # print(page.text)
      wd.implicitly_wait(5)
      page.click()
      handles = wd.window_handles             #获取当前浏览器的所有标签页
      a = len(handles)
      for handle in range(a):
         wd.switch_to.window(handles)    #把每个标签页都按一遍
         # time.sleep(0.5)
      titles = wd.find_elements_by_css_selector("#app > div > div.app-module_contentWrapper_12u0y > div > div.app-module_leftSection_2GBVu > div.index-module_articleContainer_32gOp > div.index-module_contentContainer_3mQeg > div > span > span > span > span > span")
      images = wd.find_elements_by_css_selector("#app > div > div.app-module_contentWrapper_12u0y > div > div.app-module_leftSection_2GBVu > div.index-module_articleContainer_32gOp > div.index-module_contentContainer_3mQeg > div > div > div > div > div > div > div > img")
      for title in titles:
         print(title.get_attribute("textContent"))
      for image in images:
         print(image.get_attribute("src"))
      wd.close()
      wd.switch_to.window(handles)
js1 = "window.scrollTo(0, document.body.scrollHeight)"# 滑动滚动条到底部
wd.execute_script(js1)
time.sleep(2)

刚学习selenium，就是这爬取网站内容，但是遇到这样的网站，最后一步鼠标滑到底部后，无法定位新加载的class元素直接报错，selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <span class="more">...</span> is not clickable at point (537, 10). Other element would receive the click: <div class="pc-topbar">...</div>
请问大佬要是让他一直翻页，获取新的页面，代码应该怎么写，希望有能力的大哥，能指点一二

wp231957 发表于 2021-6-12 09:47:52

ajax 这玩意貌似用selenium不太对路ajax 还是用requests库比较合适

zouyin 发表于 2021-6-12 09:55:49

wp231957 发表于 2021-6-12 09:47
ajax 这玩意貌似用selenium不太对路ajax 还是用requests库比较合适

requests遇到复杂的URL不知道怎么处理，https://author.baidu.com/home?from=favor&hdegrade=1&context=%7B%22uk%22%3A%22BYrOFjZ0lTgLtXnRAUNRjA%22%7D
https://author.baidu.com/home?type=profile&action=profile&context=%7B%22uk%22%3A%22Shg5be_YsjogfCgY4omgOw%22%2C%22from%22%3A%22dynamic%22%2C%22tab%22%3A%22dynamic%22%7D

这个两个网站，大佬能分析一下URL吗，我就是不会构造URL，规律一点的还可以，遇到这种就不会了

页: [1]

鱼C论坛's Archiver

selenium爬取ajax网页遇到问题