python beautifulsoup求助
本帖最后由 dreamyeyu 于 2020-9-28 21:27 编辑请问下各位大佬们 这个应该如何解决
代码如下
from bs4 import BeautifulSoup
html = open("job.html","r")
bs = BeautifulSoup(html,"html.parser")
eldiv = bs.select()
print(eldiv)
报错原因
Traceback (most recent call last):
File "D:/shixun/51job/testkeyword.py", line 21, in <module>
eldiv = bs.select()
TypeError: select() missing 1 required positional argument: 'selector'
如果是下面这样写就显示空集
from bs4 import BeautifulSoup
html = open("job.html","r")
bs = BeautifulSoup(html,"html.parser")
eldiv = bs.select(".el > .t1 > span > a")
print(eldiv)
跟教学视频一样的代码,但是视频中能正常显示,我把select去掉print(bs)也正常 select里不传参数,肯定报错了,这个就是相当于拾取器,你要告诉他在哪里取数据
不知道job.html的内容是什么,要么传个附件上来看看 疾风怪盗 发表于 2020-9-28 21:50
select里不传参数,肯定报错了,这个就是相当于拾取器,你要告诉他在哪里取数据
不知道job.html的内容是 ...
大佬好,我没找到哪里上传附件 方便加个vx直接发吗 那个job.html就是在51job爬的一页保存下来的,print(bs)都能正常,但是一用select就没了 dreamyeyu 发表于 2020-9-28 21:58
大佬好,我没找到哪里上传附件 方便加个vx直接发吗
回复里,高级模式,有个回形针样式,不就是附件么?一般都是这个标志 疾风怪盗 发表于 2020-9-28 22:00
回复里,高级模式,有个回形针样式,不就是附件么?一般都是这个标志
还是没看到 我直接把网页源码复制上来可以吗 由于代码太长 删了一部分发的
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<link rel="icon" href="/favicon.ico" type="image/x-icon"/>
<meta http-equiv="Content-Type" content="text/html; charset=gbk">
<title>【Python开发工程师招聘,求职】-前程无忧</title>
<meta name="description" content="前程无忧为您提供最新最全的Python开发工程师招聘,求职信息。网聚全国各城市的人才信息,找好工作,找好员工,上前程无忧。">
<meta name="keywords" content="找工作,求职,人才,招聘">
<meta name="mobile-agent" content="format=html5; url=https://m.51job.com/search/joblist.php?jobarea=000000&keyword=Python%E5%BC%80%E5%8F%91%E5%B7%A5%E7%A8%8B%E5%B8%88&partner=webmeta">
<meta name="mobile-agent" content="format=xhtml; url=https://m.51job.com/search/joblist.php?jobarea=000000&keyword=Python%E5%BC%80%E5%8F%91%E5%B7%A5%E7%A8%8B%E5%B8%88&partner=webmeta">
<meta name="robots" content="all">
<meta http-equiv="Expires" content="0">
<meta http-equiv="Cache-Control" content="no-cache">
<meta http-equiv="Pragma" content="no-cache">
<link rel="dns-prefetch" href="//js.51jobcdn.com">
<link rel="dns-prefetch" href="//img01.51jobcdn.com">
<link rel="dns-prefetch" href="//img02.51jobcdn.com">
<link rel="dns-prefetch" href="//img03.51jobcdn.com">
<link rel="dns-prefetch" href="//img04.51jobcdn.com">
<link rel="dns-prefetch" href="//img05.51jobcdn.com">
<link rel="dns-prefetch" href="//img06.51jobcdn.com">
<script language="javascript" src="//js.51jobcdn.com/in/js/2016/jquery.js?20180319"></script>
<script language="javascript">
var _tkd = _tkd || []; //点击量统计用
var lang = [];
var supporthttps = 1; //浏览器是否支持https
var currenthttps = (window.location.protocol === 'https:') ? '1' : '0'; //当前是否为https
var systemtime = 1601282264107;
var d_system_client_time = systemtime - new Date().getTime();
var trackConfig = {
'guid': '',
'ip': '161.81.27.31',
'accountid': '',
'refpage': '',
'refdomain': '',
'domain': 'search.51job.com',
'pageName': 'index.php',
'partner': '',
'islanding': '0',
};
window.cfg = {
lang:'c',
domain : {
my : 'http://my.51job.com',
login : 'https://login.51job.com',
search : 'https://search.51job.com',
www : '//www.51job.com',
jobs : 'https://jobs.51job.com',
jianli : 'https://jianli.51job.com',
company : '//company.51job.com',
i : '//i.51job.com',
jc : '//jc.51job.com',
map : 'https://map.51job.com',
m : 'https://m.51job.com',
cdn : '//js.51jobcdn.com',
help : 'https://help.51job.com',
img : '//img02.51jobcdn.com',
dj : '//dj.51job.com',
mdj : '//mdj.51job.com',
mq : '//mq.51job.com',
mmq : '//mmq.51job.com',
kbc : 'https://kbc.51job.com',
mtr : 'https://medu.51job.com',
tr : 'https://edu.51job.com',
}
};
window.cfg.lang = 'c';
window.cfg.fullLang = 'Chinese';
window.cfg.url = {
root : 'https://search.51job.com',
image : '//img04.51jobcdn.com/im/2009',
image_search : '//img01.51jobcdn.com/im/2009/search',
i : '//i.51job.com'
}
window.cfg.fileName = 'index.php';
window.cfg.root = 'https://search.51job.com';
window.cfg.root_userset_ajax = '//i.51job.com/userset/ajax';
window.cfg.isSearchDomain = '1';
window.cfg.langs = {
sqzwml : 'applyjob',
qzzwqdg : '请在要选择的职位前打勾!',
myml : 'my',
ts_qxjzw : '请选择职位',
queren : '确认',
guanbi : '关闭',
nzdnxj : '您最多能选择',
xiang : '项',
xzdq : '选择地区',
xj_xg : '选择/修改',
zycs : '主要城市',
sysf : '所有省份',
tspd : '特殊频道',
buxian : '不限',
qingxj : '请选择',
yixuan : '已选',
znlb : '职能类别',
hylb : '行业类别',
gzdd : '工作地点',
quanbu : '全部',
zhineng : '职能',
hangye : '行业',
didian : '地点',
qsrgjz : '请输入关键字',
srpcgjz : '输入排除关键字'
}
window.cfg.stype = '1';
window.cfg.isJobview = '1';
</script>
<script type="text/javascript" src="//js.51jobcdn.com/in/js/2016/pointtrack.js?20180605"></script>
<script language="javascript" src="//js.51jobcdn.com/in/js/2016/login/jquery.placeholder.min.js"></script>
<link href="//js.51jobcdn.com/in/resource/css/2020/search/common.34599d2d.css" rel="stylesheet">
<link href="//js.51jobcdn.com/in/resource/css/2020/search/index.b02876f6.css" rel="stylesheet">
<link href="//js.51jobcdn.com/in/resource/css/2020/search/utils.header.91d43fda.css" rel="stylesheet">
</head>
<body>
<script type="text/javascript" src="//js.51jobcdn.com/in/resource/js/2020/search/utils.header.2bf76207.js"></script>
<div class="header">
<!-- bar start -->
<div class="bar">
<div class="in">
<div class="language">
<ul id="languagelist">
<li class="tle"><span class="list">简</span></li><li><a href="http://big5.51job.com/gate/big5/www.51job.com/" rel="external nofollow">繁</a></li><li class="last"><a href="//www.51job.com/default-e.php" rel="external nofollow">EN</a></li> <script language="javascript">
if(location.hostname == "big5.51job.com")
{
$('#languagelist li span').html("繁");
$('#languagelist li:nth-child(2) a').html("简");
$('#languagelist li:nth-child(2) a').attr('href','javascript:void(0)');
$('#languagelist li:nth-child(2) a').click(function(){location.href=window.cfg.domain.www});
$('#languagelist li:nth-child(3) a').attr('href','javascript:void(0)');
$('#languagelist li:nth-child(3) a').click(function(){location.href=window.cfg.domain.www+"/default-e.php"});
}
</script>
</ul>
</div>
<span class="l"> </span>
<div class="app">
<ul>
<li><em class="e_icon"></em><a href="http://app.51job.com/index.html">APP下载</a></li>
<li>
<img width="80" height="80" src="//img06.51jobcdn.com/im/2016/code/new_app.png" alt="app download">
<p><a href="http://app.51job.com/index.html">APP下载</a></a></p>
</li>
</ul>
</div>
<div class="uer">
<p class="op">
<a href="https://login.51job.com/login.php?lang=c&url=http%3A%2F%2Fsearch.51job.com%2Flist%2F000000%2C000000%2C0000%2C00%2C9%2C99%2CPython%2525E5%2525BC%252580%2525E5%25258F%252591%2525E5%2525B7%2525A5%2525E7%2525A8%25258B%2525E5%2525B8%252588%2C2%2C1.html" rel="external nofollow">登录</a> / <a href="https://login.51job.com/register.php?lang=c&url=http%3A%2F%2Fsearch.51job.com%2Flist%2F000000%2C000000%2C0000%2C00%2C9%2C99%2CPython%2525E5%2525BC%252580%2525E5%25258F%252591%2525E5%2525B7%2525A5%2525E7%2525A8%25258B%2525E5%2525B8%252588%2C2%2C1.html" rel="external nofollow">注册</a> </p>
</div>
<p class="rlk">
<a href="//baike.51job.com" target="_blank">职场百科</a>
<span class="lb"> </span>
<a href="//wenku.51job.com" target="_blank">职场文库</a>
<span class="lb"> </span>
<a href="https://jobs.51job.com" target="_blank">招聘信息</a> <span class="lb"> </span>
<a href="https://ehire.51job.com" target="_blank">企业服务</a> </p>
</div>
</div>
<!-- top end -->
<!-- 英文版为body添加class -->
<script>
</script>
<!-- nag start -->
<div class="pop-city" style="display:none;position: absolute; z-index: 1000;" id="area_channel_layer">
<div class="tle">
地区选择 <em class="close" onclick="jvascript:$('#area_channel_layer,#area_channel_layer_backdrop').hide();"></em>
</div>
<div class="pcon">
<div class="ht">
<label>热门城市</label><a href="//www.51job.com/beijing/">北京</a><a href="//www.51job.com/shanghai/">上海</a><a href="//www.51job.com/guangzhou/">广州</a><a href="//www.51job.com/shenzhen/">深圳</a><a href="//www.51job.com/wuhan/">武汉</a><a href="//www.51job.com/xian/">西安</a><a href="//www.51job.com/hangzhou/">杭州</a><a href="//www.51job.com/nanjing/">南京</a><a href="//www.51job.com/chengdu/">成都</a><a href="//www.51job.com/chongqing/">重庆</a> </div>
<div class="cbox">
<ulid="area_channel_layer_list">
<li class="on" onclick="areaChannelChangeTab('abc', this)">A B C</li>
<li onclick="areaChannelChangeTab('def', this)">D E F</li>
<li onclick="areaChannelChangeTab('gh', this)">G H</li>
<li onclick="areaChannelChangeTab('jkl', this)">J K L</li>
<li onclick="areaChannelChangeTab('mnp', this)">M N P</li>
<li onclick="areaChannelChangeTab('qrs', this)">Q R S</li>
<li onclick="areaChannelChangeTab('twx', this)">T W X</li>
<li onclick="areaChannelChangeTab('yz', this)">Y Z</li>
</ul>
<div class="clst"id="area_channel_layer_all">
<div class="e" name="area_channel_div_abc">
<span><a href="//www.51job.com/anshan/">鞍山</a></span>
<span><a href="//www.51job.com/anqing/">安庆</a></span>
<span><a href="//www.51job.com/anyang/">安阳</a></span>
<span><a href="//www.51job.com/beijing/">北京</a></span>
<span><a href="//www.51job.com/baotou/">包头</a></span>
<span><a href="//www.51job.com/baoding/">保定</a></span>
<span><a href="//www.51job.com/bengbu/">蚌埠</a></span>
<span><a href="//www.51job.com/baoji/">宝鸡</a></span>
<span><a href="//www.51job.com/binzhou/">滨州</a></span>
<span><a href="//www.51job.com/changchun/">长春</a></span>
<span><a href="//www.51job.com/changsha/">长沙</a></span>
<span><a href="//www.51job.com/chengdu/">成都</a></span>
<span><a href="//www.51job.com/chongqing/">重庆</a></span>
<span><a href="//www.51job.com/changzhou/">常州</a></span>
<span><a href="//www.51job.com/changde/">常德</a></span>
<span><a href="//www.51job.com/changshu/">常熟</a></span>
<span><a href="//www.51job.com/cangzhou/">沧州</a></span>
<span><a href="//www.51job.com/chaozhou/">潮州</a></span>
<span><a href="//www.51job.com/chenzhou/">郴州</a></span>
<span><a href="//www.51job.com/chifeng/">赤峰</a></span>
<span><a href="//www.51job.com/chuzhou/">滁州</a></span>
<span><a href="//www.51job.com/changzhi/">长治</a></span>
</div>
<div class="e" name="area_channel_div_def" style="display:none">
<span><a href="//www.51job.com/dalian/">大连</a></span>
<span><a href="//www.51job.com/dongguan/">东莞</a></span>
<span><a href="//www.51job.com/dandong/">丹东</a></span>
<span><a href="//www.51job.com/daqing/">大庆</a></span>
<span><a href="//www.51job.com/dazhou/">达州</a></span>
<span><a href="//www.51job.com/datong/">大同</a></span>
<span><a href="//www.51job.com/deyang/">德阳</a></span>
<span><a href="//www.51job.com/dezhou/">德州</a></span>
<span><a href="//www.51job.com/dongying/">东营</a></span>
<span><a href="//www.51job.com/errduosi/">鄂尔多斯</a></span>
<span><a href="//www.51job.com/ezhou/">鄂州</a></span>
<span><a href="//www.51job.com/fuzhou/">福州</a></span>
<span><a href="//www.51job.com/foshan/">佛山</a></span>
<span><a href="//www.51job.com/fushun/">抚顺</a></span>
<span><a href="//www.51job.com/fuzhoue/">抚州</a></span>
<span><a href="//www.51job.com/fuyang/">阜阳</a></span>
</div>
<div class="e" name="area_channel_div_gh" style="display:none">
<span><a href="//www.51job.com/guangzhou/">广州</a></span>
<span><a href="//www.51job.com/guiyang/">贵阳</a></span>
<span><a href="//www.51job.com/ganzhou/">赣州</a></span>
<span><a href="//www.51job.com/guangan/">广安</a></span>
<span><a href="//www.51job.com/guangyuan/">广元</a></span>
<span><a href="//www.51job.com/guigang/">贵港</a></span>
<span><a href="//www.51job.com/guilin/">桂林</a></span>
<span><a href="//www.51job.com/harbin/">哈尔滨</a></span>
<span><a href="//www.51job.com/hangzhou/">杭州</a></span>
<span><a href="//www.51job.com/hefei/">合肥</a></span>
<span><a href="//www.51job.com/haikou/">海口</a></span>
<span><a href="//www.51job.com/huhhot/">呼和浩特</a></span>
<span><a href="//www.51job.com/huizhou/">惠州</a></span>
<span><a href="//www.51job.com/hengyang/">衡阳</a></span>
<span><a href="//www.51job.com/huaian/">淮安</a></span>
<span><a href="//www.51job.com/huzhou/">湖州</a></span>
<span><a href="//www.51job.com/handan/">邯郸</a></span>
<span><a href="//www.51job.com/hanzhong/">汉中</a></span>
<span><a href="//www.51job.com/heyuan/">河源</a></span>
<span><a href="//www.51job.com/heze/">菏泽</a></span>
<span><a href="//www.51job.com/hengshui/">衡水</a></span>
<span><a href="//www.51job.com/huaihua/">怀化</a></span>
<span><a href="//www.51job.com/huaibei/">淮北</a></span>
<span><a href="//www.51job.com/huainan/">淮南</a></span>
<span><a href="//www.51job.com/huanggang/">黄冈</a></span>
<span><a href="//www.51job.com/huangshi/">黄石</a></span>
</div>
</script>
疾风怪盗 发表于 2020-9-28 22:00
回复里,高级模式,有个回形针样式,不就是附件么?一般都是这个标志
- -实在没找到这个选项 疾风怪盗 发表于 2020-9-28 22:00
回复里,高级模式,有个回形针样式,不就是附件么?一般都是这个标志
截取部分代码也发不出来。。 https://search.51job.com/list/000000,000000,0000,00,9,99,python,2,1.html
原链接是这个 dreamyeyu 发表于 2020-9-28 22:11
截取部分代码也发不出来。。
光看你发的网页代码,也不知道你要获取什么数据
尝试获取你说的51job网页,get到的网页源码也和网页上显示的不一样,源码里没有具体的数据
没办法 本帖最后由 疾风怪盗 于 2020-9-29 00:22 编辑
不要用beautifulsoup了,用xpath或者正则
获取的网页源代码和网页上看到的不一样,用浏览器的复制地址功能不行,自己写又麻烦,还不如正则匹配来的快
下面试着获取了第一页的数据(50个),你自己看看吧,想获取后面页码的,没找到方法
import requests,re
url='https://search.51job.com/list/020000,000000,0000,00,9,99,Python开发工程师,2,2.html'
headers={'User-Agent': 'User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 Edg/85.0.564.63',
}
response=requests.get(url,headers=headers)
html=response.content.decode()
print(html)
job_name=re.findall(r"<label class='c_red'>(.*?)</label>",html)
print(job_name)
company_name=re.findall(r"<aside>(.*?)</aside>",html)
print(company_name)
print(len(company_name)) 疾风怪盗 发表于 2020-9-28 23:51
光看你发的网页代码,也不知道你要获取什么数据
尝试获取你说的51job网页,get到的网页源码也和网页上 ...
谢谢大佬 想获取一页50个点进去的链接= =另外想请问下 要是被发现爬虫伪装之后咋办 dreamyeyu 发表于 2020-9-29 07:40
谢谢大佬 想获取一页50个点进去的链接= =另外想请问下 要是被发现爬虫伪装之后咋办
这个就复杂了,反爬虫要看的,有的是headers加参数,有的要用selenium自动化模块之类的,有的是获取json数据更方便,要看情况的,还有的无能为力无法获取 疾风怪盗 发表于 2020-9-29 10:14
这个就复杂了,反爬虫要看的,有的是headers加参数,有的要用selenium自动化模块之类的,有的是获取json ...
好的谢谢大佬 那如果是获取每个链接的正则应该怎样写呢 我试了几次都不对。。 dreamyeyu 发表于 2020-9-29 07:40
谢谢大佬 想获取一页50个点进去的链接= =另外想请问下 要是被发现爬虫伪装之后咋办
把我代码里额网址替换成你的网址,也能获取第一页的数据,不过也就一页了,之后不知道是不是要登陆才行 dreamyeyu 发表于 2020-9-29 10:44
好的谢谢大佬 那如果是获取每个链接的正则应该怎样写呢 我试了几次都不对。。
link_address=re.findall(r'<a href="(.*?)" class',html)
print(link_address) 疾风怪盗 发表于 2020-9-29 10:52
好的 谢谢大佬 dreamyeyu 发表于 2020-9-29 11:22
好的 谢谢大佬
没有缺项数据的话,正则很简单,一起获取了,再对应起来就行了
而且匹配正则,简单点,就是这样写,把要的数据用(.*?)替代,两边多写一点,就能完美匹配到了
页:
[1]