爬虫selinum,for循环失效,Python交流,编程语言专区,鱼C论坛

白本羽 发表于 2021-5-30 16:51:02

爬虫selinum,for循环失效

本帖最后由白本羽于 2021-5-30 16:54 编辑

目标:自动遍历该网页下的题目,并提交,之后爬取正确答案,前面没有问题

我的代码:
from selenium import webdriver
from lxml import etree
import requests
import time
import random
import json

from selenium.webdriver.remote.webelement import WebElement

headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
               "Chrome/90.0.4430.93 Safari/537.36 Edg/90.0.818.56 "
}
browser = webdriver.Edge(executable_path = "MicrosoftWebDriver.exe")
# 让浏览器发起一个指定url的请求
browser.get("https://www.yooc.me/login")
# 定位标签
account_input = browser.find_element_by_xpath('/html/body/div/div/div/div/div/div/form/div/input')
account_input.send_keys('账号')
password_input = browser.find_element_by_xpath('/html/body/div/div/div/div/div/div/form/div/input')
password_input.send_keys('密码')
# 用page_source获取当前页面的源码数据
response = browser.page_source
tree = etree.HTML(response)
code_url = tree.xpath('/html/body/div/div/div/div/div/div/form/div/img/@src')
text_response = requests.get(url = code_url, headers = headers).content
with open("./code_text.jpg", "wb") as fp:
fp.write(text_response)
code_text = input("请查看验证码,并在30秒内输入:")

code_text_input = browser.find_element_by_xpath('/html/body/div/div/div/div/div/div/form/div/input')
# 与标签交互,输入文本
code_text_input.send_keys(code_text)

login = browser.find_element_by_id('submit')
login.click()
time.sleep(2)

topic_url = browser.find_element_by_xpath('/html/body/div/div/table/tbody/tr/td/div/div/div/a')
topic_url.click()
time.sleep(2)

handles = browser.window_handles
browser.switch_to.window(handles)
exam_url = browser.find_element_by_xpath('/html/body/section/section/div/div/a')
exam_url.click()
time.sleep(2)

exam_detail = browser.find_element_by_xpath('/html/body/section/section/div/div/ul/li/div/a')
exam_detail.click()
time.sleep(2)

confirm_btn = browser.find_element_by_xpath('/html/body/div/div/div/div')
confirm_btn.click()
time.sleep(5)

bodylist = browser.find_elements_by_xpath('/html/body/section/section/div/div[@class=question-board]')

for each in bodylist: # 定位题目选项的标签并随机单击一个.此处整个循环不执行,也不报错,直接忽略了.
print(each.text)
templist = each.find_elements_by_tag_name('label')
islist = random.choice(templist)
islist.click()
print("选项已勾选!")
以下为页面选项的格式代码:
<div class="question-board" id="question-31960407">

         <p class="q-cnt crt">2、<span></span> </p><p class="q-cnt crt">随着对外开放的进一步扩大，中共中央和国务院在1990年作出的战略举措是（）。</p><p></p><ol><li>
               <input id="31960407_1_0" type="radio" name="31960407_1" value="0"><label for="31960407_1_0">A. 建立厦门经济特区</label>
            </li>

            <li>
               <input id="31960407_1_1" type="radio" name="31960407_1" value="1"><label for="31960407_1_1">B. 建立珠海经济特区</label>
            </li>

            <li>
               <input id="31960407_1_3" type="radio" name="31960407_1" value="3"><label for="31960407_1_3">C. 开发开放上海浦东</label>
            </li>

            <li>
               <input id="31960407_1_2" type="radio" name="31960407_1" value="2"><label for="31960407_1_2">D. 开发、开放海南经济特区</label>
            </li></ol>
      </div>

suchocolate 发表于 2021-5-30 20:52:51

bodylist估计是个空。

白本羽 发表于 2021-6-1 15:07:15

suchocolate 发表于 2021-5-30 20:52
bodylist估计是个空。

确实为空列表,是上面的xpath定位少了"",导致是空值,现在好了.

页: [1]

鱼C论坛's Archiver

爬虫selinum,for循环失效