鱼C论坛

 找回密码
 立即注册
查看: 687|回复: 2

[已解决]如何提取数据

[复制链接]
发表于 2020-6-6 12:51:15 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
这个网页的数据我,没看懂,不是表格也不是字典,正则也不能用。请指教,谢谢大家。
比如 我想提取该网页的   {"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},的最后的  "S-GF801638"。应该怎么操作。

原网页
http://data.ngtc.com.cn/N/Q?c=S- ... ;callback=fillResul

————————————————————————————————
网页内容:
fillResult({

"isExist":1,

"certData":[

{"status":"0","CHName":"证书名称","ENName":"Cert Name","value":"镶嵌钻石分级鉴定证书(对折)MDG CERT-F"},

{"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},

{"status":"2","CHName":"检验结论Conclusion","ENName":"Conclusion","value":"18K金钻石戒指"},

{"status":"0","CHName":"总质量Total Mass","ENName":"Total Mass","value":"2.9528g  "},

{"status":"0","CHName":"形状Shape","ENName":"Shape","value":"圆钻形(配镶钻石)"},

{"status":"0","CHName":"颜色级别Color Grade","ENName":"Color Grade","value":"参考:K-L"},

{"status":"0","CHName":"净度级别Clarity Grade","ENName":"Clarity Grade","value":"参考:VS"},

{"status":"0","CHName":"台宽比%Table size","ENName":"Table size","value":"--"},

{"status":"0","CHName":"亭深比%Pavilion depth","ENName":"Pavilion depth","value":"--"},

{"status":"0","CHName":"贵金属检测Precious Metal","ENName":"Precious Metal","value":"18K金"},

{"status":"0","CHName":"备注Remarks","ENName":"Remarks","value":"--"},

{"status":"0","CHName":"备注*Remarks*","ENName":"Remarks*","value":"印记标称:D1.002ct d0.199ct"},

{"status":"0","CHName":"检验依据Normative References","ENName":"Normative References","value":"--"},

{"status":"1","CHName":" ","ENName":" ","value":"http://data.ngtc.com.cn/testPic/201904/S-GF801638.jpg"},

{"status":"0","CHName":"检验人","ENName":"Tester","value":"汪海燕"},

{"status":"0","CHName":"审核人","ENName":"Supervisor","value":"张光辉"},

{"status":"0","CHName":"检测时间","ENName":"Test Date","value":"2019-04-26"},

{"status":"0","CHName":"校验编码","ENName":"Verification Code","value":"14877583"},

{"status":"0","CHName":"检测依据","ENName":"Normative References","value":"GB/T 18043-2013;GB 11887-2012;GB/T 16553-2017;GB/T 16552-2017;GB/T 16554-2017;"}

]

})
——————————————————————————————————————————————————————————————————————————————————————————————


最佳答案
2020-6-6 12:51:16
import requests
import re

url = 'http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
request = requests.get(url,headers=headers)
html_data = request.text
values = re.findall(r'"ENName":"No","value":"(.+?)"',html_data)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-6-6 12:59:25 | 显示全部楼层
直接字符串切片+json,然后遍历certdata,步长为2(用range,不直接遍历),然后取出字典里的value属性就好了
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-6-6 12:51:16 | 显示全部楼层    本楼为最佳答案   
import requests
import re

url = 'http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
request = requests.get(url,headers=headers)
html_data = request.text
values = re.findall(r'"ENName":"No","value":"(.+?)"',html_data)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-1-20 20:13

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表