|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 whl19910402 于 2018-9-24 20:49 编辑
有大佬最近是过哔哩哔哩的爬虫吗?网页都爬不下载 code是200 (第14行) 但内容是 抱歉,您正在使用不支持的浏览器访问个人空间(第19行)
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
- 'X-Requested-With': 'XMLHttpRequest',
- 'Referer': 'http://space.bilibili.com/6758258/',
- 'Origin': 'http://space.bilibili.com',
- 'Host': 'space.bilibili.com',
- 'AlexaToolbar-ALX_NS_PH': 'AlexaToolbar/alx-4.0',
- 'Accept-Language': 'zh-CN,zh;q=0.8,en;q=0.6,ja;q=0.4',
- 'Accept': 'application/json, text/javascript, */*; q=0.01'}
- import requests
- url = 'https://space.bilibili.com/304176852/#/video?tid=0&page=1'
- re=requests.get(url, headers=headers)
- re.status_code
- Out[28]: 200
- re.text
- Out[29]: '<!DOCTYPE html><html><head><meta name=spm_prefix content=333.999><meta charset=UTF-8><meta http-equiv=X-UA-Compatible content="IE=edge,chrome=1"><meta name=renderer content=webkit|ie-comp|ie-stand><link rel=stylesheet type=text/css href=//at.alicdn.com/t/font_438759_ivzgkauwm755qaor.css><script type=text/javascript>var ua = window.navigator.userAgent\n var agents = ["Android","iPhone","SymbianOS","Windows Phone","iPod"]\n var pathname = /\\d+/.exec(window.location.pathname)\n var getCookie = function(sKey) {\n return decodeURIComponent(\n document.cookie.replace(\n new RegExp(\'(?:(?:^|.*;)\\\\s*\' +\n encodeURIComponent(sKey).replace(/[\\-\\.\\+\\*]/g, \'\\\\\') +\n \'\\\\s*\\\\=\\\\s*([^;]*).*$)|^.*$\'),\n \'$1\'\n )\n ) || null\n }\n\n var DedeUserID = getCookie(\'DedeUserID\')\n var mid = pathname ? +pathname[0] : DedeUserID === null ? 0 : +DedeUserID\n if (mid < 1) {\n window.location.href = \'https://passport.bilibili.com/login?gourl=https://space.bilibili.com\'\n } else {\n window._bili_space_mid = mid\n window._bili_space_mymid = DedeUserID === null ? 0 : +DedeUserID\n var prefix = /^\\/v/.test(pathname) ? \'/v\' : \'\'\n window.history.replaceState({}, \'\', prefix + \'/\' + mid + \'/\' + (pathname ? window.location.hash : \'#/\'))\n\n for (var i = 0; i < agents.length; i++) {\n if (ua.indexOf(agents[i]) > -1) {\n window.location.href = \'https://m.bilibili.com/space/\' + mid\n break\n }\n }\n }</script><link rel=prefetch as=script href=//static.hdslb.com/player/js/video.js><link href=//s1.hdslb.com/bfs/static/jinkela/space/css/space.25.6fb5759c1fe2317fd22bb489b6196925ff16b8af.css rel=stylesheet><link href=//s1.hdslb.com/bfs/static/jinkela/space/css/space.26.6fb5759c1fe2317fd22bb489b6196925ff16b8af.css rel=stylesheet>\n <title>kidkidkid123的个人空间 - 哔哩哔哩 ( ゜- ゜)つロ 乾杯~ Bilibili</title><meta name="keywords" content="kidkidkid123,B站,弹幕,字幕,AMV,MAD,MTV,ANIME,动漫,动漫音乐,游戏,游戏解说,ACG,galgame,动画,番组,新番,初音,洛天依,vocaloid"/>\n <meta name="description" content="kidkidkid123,just do it,bilibili是国内知名的视频弹幕网站,这里有最及时的动漫新番,最棒的ACG氛围,最有创意的Up主。大家可以在这里找到许多欢乐。"/>\n </head><body><div class="z-top-container has-top-search"></div><!--[if lt IE 9]><div id="browser-version-tip">\n <div class="wrapper">\n
- 抱歉,您正在使用不支持的浏览器访问个人空间。推荐您(这里是楼主手动换行醒目)
- <a href="//www.google.cn/chrome/browser/desktop/index.html">安装 Chrome 浏览器</a>以获得更好的体验 ヾ(o◕∀◕)ノ\n </div>\n </div><![endif]--><div id=space-app></div><script type=text/javascript>//日志上报\n window.spaceReport = {}\n window.reportConfig = {\n sample: 1,\n scrollTracker: true,\n msgObjects: \'spaceReport\'\n }\n var reportScript = document.createElement(\'script\')\n reportScript.src = \'//s1.hdslb.com/bfs/seed/log/report/log-reporter.js\'\n document.getElementsByTagName(\'body\')[0].appendChild(reportScript)\n reportScript.onerror = function() {\n console.warn(\'log-reporter.js加载失败,放弃上报\')\n var noop = function() {}\n window.reportObserver = {\n sendPV: noop,\n forceCommit: noop\n }\n }</script><script src=//s1.hdslb.com/bfs/static/jinkela/long/js/jquery/jquery1.7.2.min.js></script><script src=//s1.hdslb.com/bfs/seed/jinkela/header/header.js></script><script type=text/javascript src=//s1.hdslb.com/bfs/static/jinkela/space/manifest.6fb5759c1fe2317fd22bb489b6196925ff16b8af.js></script><script type=text/javascript src=//s1.hdslb.com/bfs/static/jinkela/space/vendor.6fb5759c1fe2317fd22bb489b6196925ff16b8af.js></script><script type=text/javascript src=//s1.hdslb.com/bfs/static/jinkela/space/space.6fb5759c1fe2317fd22bb489b6196925ff16b8af.js></script></body></html>'
复制代码
......异步加载,但是不知道你要爬什么信息 假如是爬取视频标题,时长,以及播放数量等等的话 我已经简单的分析了一下
如上图,https://space.bilibili.com/ajax/member/getSubmitVideos?mid=304176852&pagesize=30&tid=0&page=2&keyword=&order=pubdate
访问该页面,可以获取到json格式的文件,不过需要你自己处理一下编码
这样就可以提取数据了
|
|