|
50鱼币
目的是获取window.__SEARCH_RESULT__ = 这里面的数据
怎么将里面的job_name等数据单个提出存储在字典里并保存
https://search.51job.com/list/000000,000000,7501,00,9,99,+,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare=
这个是我下载源代码的网站
希望大佬可以帮帮忙,拯救一下爬虫萌新
给整吐了。。
- import requests
- import re
- import json
- url='https://search.51job.com/list/000000,000000,7501,00,9,99,+,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
- headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36 Edg/86.0.622.38'}
- response=requests.get(url=url,headers=headers)
- html_str=response.content.decode('gbk')
- pattern=re.compile(r'window.__SEARCH_RESULT__ =(.*?)</script>')
- data=pattern.findall(html_str)[0]
- data=json.loads(data)
- print(data)
复制代码
|
-
-
ym.zip
17.76 KB, 下载次数: 0
文件在压缩包里
|