如何提取数据
这个网页的数据我,没看懂,不是表格也不是字典,正则也不能用。请指教,谢谢大家。比如 我想提取该网页的 {"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},的最后的"S-GF801638"。应该怎么操作。
原网页
http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul
————————————————————————————————
网页内容:
fillResult({
"isExist":1,
"certData":[
{"status":"0","CHName":"证书名称","ENName":"Cert Name","value":"镶嵌钻石分级鉴定证书(对折)MDG CERT-F"},
{"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},
{"status":"2","CHName":"检验结论Conclusion","ENName":"Conclusion","value":"18K金钻石戒指"},
{"status":"0","CHName":"总质量Total Mass","ENName":"Total Mass","value":"2.9528g"},
{"status":"0","CHName":"形状Shape","ENName":"Shape","value":"圆钻形(配镶钻石)"},
{"status":"0","CHName":"颜色级别Color Grade","ENName":"Color Grade","value":"参考:K-L"},
{"status":"0","CHName":"净度级别Clarity Grade","ENName":"Clarity Grade","value":"参考:VS"},
{"status":"0","CHName":"台宽比%Table size","ENName":"Table size","value":"--"},
{"status":"0","CHName":"亭深比%Pavilion depth","ENName":"Pavilion depth","value":"--"},
{"status":"0","CHName":"贵金属检测Precious Metal","ENName":"Precious Metal","value":"18K金"},
{"status":"0","CHName":"备注Remarks","ENName":"Remarks","value":"--"},
{"status":"0","CHName":"备注*Remarks*","ENName":"Remarks*","value":"印记标称:D1.002ct d0.199ct"},
{"status":"0","CHName":"检验依据Normative References","ENName":"Normative References","value":"--"},
{"status":"1","CHName":" ","ENName":" ","value":"http://data.ngtc.com.cn/testPic/201904/S-GF801638.jpg"},
{"status":"0","CHName":"检验人","ENName":"Tester","value":"汪海燕"},
{"status":"0","CHName":"审核人","ENName":"Supervisor","value":"张光辉"},
{"status":"0","CHName":"检测时间","ENName":"Test Date","value":"2019-04-26"},
{"status":"0","CHName":"校验编码","ENName":"Verification Code","value":"14877583"},
{"status":"0","CHName":"检测依据","ENName":"Normative References","value":"GB/T 18043-2013;GB 11887-2012;GB/T 16553-2017;GB/T 16552-2017;GB/T 16554-2017;"}
]
})
——————————————————————————————————————————————————————————————————————————————————————————————
直接字符串切片+json,然后遍历certdata,步长为2(用range,不直接遍历),然后取出字典里的value属性就好了 import requests
import re
url = 'http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
request = requests.get(url,headers=headers)
html_data = request.text
values = re.findall(r'"ENName":"No","value":"(.+?)"',html_data)
页:
[1]