51job简历python爬虫
之前求职的时候,作为练习项目,爬了一下51job的招聘信息,为避免遗忘,现在记录一下.爬取目标:
https://search.51job.com/list/000000,000000,0000,00,9,99,python爬虫,2,1.html
爬取字段:
https://upload-images.jianshu.io/upload_images/10004381-6cb4102d2b31c265.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/683/format/webp
先上最终爬取结果图示:
https://upload-images.jianshu.io/upload_images/10004381-9d11a7149ee49c42.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/973/format/webp
所用到的包:
from lxml import etree
import requests
import time
import pymysql
相关元素的xpath定位:
node_list = html.xpath("//div[@class='dw_table']")
for node in node_list:
'''
Position 职位名称
Company公司名称
Place 工作地区
Wages 薪 资
Time 发布时间
Link 详情链接
'''
Position = node.xpath("./div/p/span/a/@title")
Company = node.xpath("./div/span[@class='t2']/a/text()")
Place = node.xpath("./div[@class='el']/span/text()")
Wages = node.xpath("./div[@class='el']/span/text()")
Time = node.xpath("./div[@class='el']/span/text()")
Link = node.xpath("./div/p/span/a/@href")
文章中使用了Mysql数据库,如果想尝试运行代码,请先创建匹配的数据表:
CREATE TABLE `51job` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`职位名称` text,
`公司名称` text,
`工作地区` text,
`薪资` text,
`发布时间` text,
`详情页` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
代码全文:
**** Hidden Message *****
学习 谢谢LZ的无私奉献,学习一下! 感谢分享
感谢分享
123
学习一下!
学习了 51job简历python爬虫 cc 学习~ {:5_106:} 66666666 请问如何创建匹配的数据表? File "C:/Users/wf/Desktop/job.py", line 8, in <module>
class My51job():
File "C:/Users/wf/Desktop/job.py", line 25, in My51job
charset='utf8')
File "D:\Python37\lib\site-packages\pymysql\__init__.py", line 94, in Connect
return Connection(*args, **kwargs)
File "D:\Python37\lib\site-packages\pymysql\connections.py", line 261, in __init__
self.password = self.password.encode('latin1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) 请问可以在phpstudy 里面创造数据库吗 51job爬虫这个 怎么用?
上面的可以了,如果想限制地区和职业要怎么设置? 学习啊! 顶