python羊 发表于 2020-6-6 12:51:15

如何提取数据

这个网页的数据我,没看懂,不是表格也不是字典,正则也不能用。请指教,谢谢大家。
比如 我想提取该网页的   {"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},的最后的"S-GF801638"。应该怎么操作。

原网页
http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul

————————————————————————————————
网页内容:
fillResult({

"isExist":1,

"certData":[

{"status":"0","CHName":"证书名称","ENName":"Cert Name","value":"镶嵌钻石分级鉴定证书(对折)MDG CERT-F"},

{"status":"0","CHName":"证书编号","ENName":"No","value":"S-GF801638"},

{"status":"2","CHName":"检验结论Conclusion","ENName":"Conclusion","value":"18K金钻石戒指"},

{"status":"0","CHName":"总质量Total Mass","ENName":"Total Mass","value":"2.9528g"},

{"status":"0","CHName":"形状Shape","ENName":"Shape","value":"圆钻形(配镶钻石)"},

{"status":"0","CHName":"颜色级别Color Grade","ENName":"Color Grade","value":"参考:K-L"},

{"status":"0","CHName":"净度级别Clarity Grade","ENName":"Clarity Grade","value":"参考:VS"},

{"status":"0","CHName":"台宽比%Table size","ENName":"Table size","value":"--"},

{"status":"0","CHName":"亭深比%Pavilion depth","ENName":"Pavilion depth","value":"--"},

{"status":"0","CHName":"贵金属检测Precious Metal","ENName":"Precious Metal","value":"18K金"},

{"status":"0","CHName":"备注Remarks","ENName":"Remarks","value":"--"},

{"status":"0","CHName":"备注*Remarks*","ENName":"Remarks*","value":"印记标称:D1.002ct d0.199ct"},

{"status":"0","CHName":"检验依据Normative References","ENName":"Normative References","value":"--"},

{"status":"1","CHName":" ","ENName":" ","value":"http://data.ngtc.com.cn/testPic/201904/S-GF801638.jpg"},

{"status":"0","CHName":"检验人","ENName":"Tester","value":"汪海燕"},

{"status":"0","CHName":"审核人","ENName":"Supervisor","value":"张光辉"},

{"status":"0","CHName":"检测时间","ENName":"Test Date","value":"2019-04-26"},

{"status":"0","CHName":"校验编码","ENName":"Verification Code","value":"14877583"},

{"status":"0","CHName":"检测依据","ENName":"Normative References","value":"GB/T 18043-2013;GB 11887-2012;GB/T 16553-2017;GB/T 16552-2017;GB/T 16554-2017;"}

]

})
——————————————————————————————————————————————————————————————————————————————————————————————


qiuyouzhi 发表于 2020-6-6 12:59:25

直接字符串切片+json,然后遍历certdata,步长为2(用range,不直接遍历),然后取出字典里的value属性就好了

Twilight6 发表于 2020-6-6 12:51:16

import requests
import re

url = 'http://data.ngtc.com.cn/N/Q?c=S-GF801638&v=14877583&r=f&callback=fillResul'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}
request = requests.get(url,headers=headers)
html_data = request.text
values = re.findall(r'"ENName":"No","value":"(.+?)"',html_data)
页: [1]
查看完整版本: 如何提取数据