[已解决]爬取frame网页的问题

heywilliam · 发表于 2018-3-29 14:43:27

您需要登录才可以下载或查看，没有账号？立即注册

x

刚刚开始学爬虫,准备爬取公司内部的一个网页把网页内容提取出来存档, 网页效果如下, 我准备爬取左边导航栏每个选项点击进去后的内容, 里面分好几层

现在还只是开始学写爬虫的初期, 写了下面一点点准备看一下能不能把网页打开

import requests
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
}
response = requests.get(url,auth=HTTPBasicAuth("username","password")) #这里把网页地址和用户名密码隐藏了不方便展示
soup = BeautifulSoup(response.text,'lxml')
print(response.text)

复制代码

但是代码运行后只出现了以下输出, 貌似并没有对frameset里面的内容进行展开, 我也无法抓取里面的链接, 请问我需要学习哪部分的知识才能解决这个问题, 给我个思路就可以~~

谢谢!!

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<meta name="Microsoft Border" content="none">
<title>FPPS Home</title>
</head>
<frameset framespacing="0" border="false" frameborder="0" rows="128,*">
<frame name="banner" scrolling="auto" noresize target="contents"
src="home/&frames_home/home_top.htm" style="border-bottom: 2px none rgb(0,0,255)"
marginwidth="0" marginheight="0">
<frameset cols="18*,85%">
<frame name="contents" target="main" src="home/&frames_home/home_left.htm"
scrolling="auto" marginwidth="0" marginheight="0" style="border: 2px none rgb(0,0,255)">
<frame name="contents1" src="home/home_home.htm" scrolling="auto" marginwidth="1"
marginheight="1" style="border: 0px none; padding-left: 5; padding-top: 0">
</frameset>
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>

复制代码

最佳答案

大头目

2018-3-29 16:21:35

学scrapy
或者用bs抓第二层网页再用requests打开

大头目 · 发表于 2018-3-29 16:21:35

学scrapy
或者用bs抓第二层网页再用requests打开

heywilliam · 发表于 2018-3-29 16:28:22

大头目发表于 2018-3-29 16:21
学scrapy
或者用bs抓第二层网页再用requests打开

好的谢谢!!

账号		自动登录	找回密码
密码			立即注册