这段python爬虫程序，为啥输出是空的,Python交流,编程语言专区,鱼C论坛

smxcfslh 发表于 2021-4-30 15:24:06

这段python爬虫程序，为啥输出是空的

# coding:utf-8

from selenium import webdriver
from bs4 import BeautifulSoup

driver=webdriver.Chrome(executable_path=r'E:\python\chromedriver.exe')
driver.get('http://www.toutiao.com')
wbdata=driver.page_source
soup=BeautifulSoup(wbdata,'lxml')
news_list=soup.find_all('a',attrs={'target':'_blank','rel':'noopener'})
for new in news_list:
title=new.get('title')
link=new.get('href')
data={'标题':title,
'链接':link
}
print(data)

以下是运行结果：
E:\python\python.exe G:/Python/spider/sele.py

Process finished with exit code 0
请问，有谁知道上面的代码有什么问题吗？为啥上面的代码运行的结果为空？

wp231957 发表于 2021-4-30 17:14:31

告诉你一个调试方法，逐层往下查，或者逐层往上查

狗宁发表于 2021-4-30 17:30:50

debugyyds

Stubborn 发表于 2021-4-30 17:48:07

没有打印，就应该没有东西，news_list应该是一个空列表。重新定制下规则

suchocolate 发表于 2021-4-30 18:07:49

要等资源加载完的，selenium没有那么快：from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as expected

from bs4 import BeautifulSoup

url = 'http://www.toutiao.com'
browser = webdriver.Firefox()
browser.get(url)
wait = WebDriverWait(browser, 10) # 创建等待对象
wait.until(expected.visibility_of_element_located((By.CLASS_NAME, 'feed-card-wrapper'))) # 必须新闻出现时才继续
wbdata = browser.page_source
soup = BeautifulSoup(wbdata, 'lxml')
news_list = soup.find_all('a', attrs={'target': '_blank', 'rel': 'noopener'})
for new in news_list:
title = new.get('title')
link = new.get('href')
data = {'标题': title,
'链接': link
}
print(data)

页: [1]

鱼C论坛's Archiver

这段python爬虫程序，为啥输出是空的