scrapy框架问题,Python交流,编程语言专区,鱼C论坛

城中城 发表于 2020-10-13 23:52:05

scrapy框架问题

如题
上网百度了，还是一知半解
scrapy框架中

from scrapy.http.response.html import HtmlResponse

中HtmlResponse的作用是什么

如下行代码
#driver.current_url 使用selenium模块获取当前页面url
#driver.page_source 使用selenium模块获取页面数据
#request<200 <200 https://www.网址>>
response = HtmlResponse(url=self.driver.current_url,body=source,request=request,encoding='utf-8')

其中的参数都能看得懂，就HtmlResponse不知道是用来干什么的求教

弱弱的佳佳 发表于 2020-10-14 17:02:55

用来请求网页的，返回网页源代码

城中城 发表于 2020-10-14 18:32:00

弱弱的佳佳发表于 2020-10-14 17:02
用来请求网页的，返回网页源代码

那么#driver.page_source 使用selenium模块获取页面数据
这个不是已经获取到网页源代码了么

body=source
这个参数就是网页的源代码啊

弱弱的佳佳 发表于 2020-10-15 08:58:53

城中城发表于 2020-10-14 18:32
那么#driver.page_source 使用selenium模块获取页面数据
这个不是已经获取到网页源代码了么

driver.page_source是获取到了网页源代码，但是不能从中提取数据，所以要进行转换成HtmlResponse对象，相当于再进行一次请求，对响应进行处理，比如xpath，提取数据！

城中城 发表于 2020-10-15 10:10:00

弱弱的佳佳发表于 2020-10-15 08:58
driver.page_source是获取到了网页源代码，但是不能从中提取数据，所以要进行转换成HtmlResponse对象，相 ...

好的非常感谢

页: [1]

鱼C论坛's Archiver

scrapy框架问题