hellolouis 发表于 2021-3-3 11:15:38

一段字符串,如何提取里面的列表?

本帖最后由 hellolouis 于 2021-3-3 11:16 编辑

弄个查违章的小程序,其中代码如下:

data = urllib.parse.urlencode(data).encode('utf-8')

response = urllib.request.urlopen(url,data)
html = response.read().decode('utf-8')

print(html)


返回的是:
jQuery1102012268154905248219_1614675001322({"msg":"\u6210\u529f",
"status":0,
"data":{"hphm":"\u7ca4S898S9",
"city":"dongguan",
"city_name":"\u4e1c\u839e",
"lists":[{"time":"2021-02-21 11:26:27",
"address":"\u5e7f\u6cb3\u9ad8\u901f2\u516c\u91cc230\u7c73\uff08\u5e7f\u6cb3\u9ad8\u901f\u51e4\u51f0\u5c71\u96a7\u9053\u8def\u6bb5\uff09\uff08\u4e1c\u5f80\u897f\uff09",
"fine":150,
"point":3,
"handled":0,
"violation_type":"\u9a7e\u9a76\u4e2d\u578b\u4ee5\u4e0a\u8f7d\u5ba2\u8f7d\u8d27\u6c7d\u8f66\u3001\u5371\u9669\u7269\u54c1\u8fd0\u8f93\u8f66\u8f86\u4ee5\u5916\u7684\u5176\u4ed6\u673a\u52a8\u8f66\u884c\u9a76\u8d85\u8fc7\u89c4\u5b9a\u65f6\u901f10%25\u672a\u8fbe20%25\u7684"}],
"company":{"show_rate":"0",
"site_name":"\u8f66\u884c\u6613",
"site_logo":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/89c5ff278ed0.png",
"site_qrcode":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/3f62dee41ae6.png",
"site_qrcode_desc":"\u7528\u767e\u5ea6APP\u626b\u63cf\u4e8c\u7ef4\u7801\uff0c\u624b\u673a\u67e5\u8be2\u4f53\u9a8c\u66f4\u4f73",
"site_url":"http:\/\/m.cx580.com\/bdsearch",
"params":{"xcxParams":{"fromQuery":{"carBrand":"carNumber",
"carEngine":"engineNumber",
"carNumber":"carCode"}},
"h5Params":{"inquirePath":"https:\/\/bl.cx580.com\/illegal?",
"fromQuery":{"carBrand":"carNumber",
"carEngine":"cardrive",
"carNumber":"carcode"}},
"carno":"hphm",
"vin":"body",
"engine":"engine",
"car_type":"hpzl"},
"mip":{"mip_flag":"1",
"mip_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/violation.html",
"home_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/index.html"},
"cambrian":{"type":"cambrian",
"logo":"http:\/\/t10.baidu.com\/it\/u=2001762891,
2940990159&fm=58",
"title":"\u9f50\u8f66\u5927\u5723\u670d\u52a1",
"wishes":"\u63d0\u4f9b4S\u5e97\u53ca\u6c7d\u8f66\u884c\u4e1a\u8d44\u8baf",
"url":"https:\/\/author.baidu.com\/home\/1570621902200357",
"appid":"1570621902200357",
"des":"\u5173\u6ce8\u9f50\u8f66\u5927\u5723\u718a\u638c\u53f7\uff0c\u968f\u65f6\u67e5\u8be2\u8fdd\u7ae0\u4fe1\u606f"}},
"resource":"chexingyi",
"count":1,
"fine":150,
"point":3,
"jump_token":"Ye6sJfAsXfWRwuhMPhxcw756fjEgeMH9DGd8RSoKKzQsU2tfR7naLKXXjcIg%2B4y3GUFtUdBS0ZGyJdS8JTqcubliKw%2BSCBmOrT9CE%2BMzdmPUM3AsmXrxwEqmayiAO11QUV8B5TuKv93rcuylulfBvg%3D%3D"}})


有几个疑问,
如何才能打印出Unicode编码转成中文呢?
如何才能提取返回里06-11行的列表呢?

wp231957 发表于 2021-3-3 12:14:45

把网址发出来啊

>>> address="\u5e7f\u6cb3\u9ad8\u901f2\u516c\u91cc230\u7c73\uff08\u5e7f\u6cb3\u9ad8\u901f\u51e4\u51f0\u5c71\u96a7\u9053\u8def\u6bb5\uff09\uff08\u4e1c\u5f80\u897f\uff09"
>>> print(address.encode(encoding="utf-8").decode(encoding="utf-8"))
广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)
>>>

wp231957 发表于 2021-3-3 12:19:39

jsontxt={"msg":"\u6210\u529f",
"status":0,
"data":{"hphm":"\u7ca4S898S9",
"city":"dongguan",
"city_name":"\u4e1c\u839e",
"lists":[{"time":"2021-02-21 11:26:27",
"address":"\u5e7f\u6cb3\u9ad8\u901f2\u516c\u91cc230\u7c73\uff08\u5e7f\u6cb3\u9ad8\u901f\u51e4\u51f0\u5c71\u96a7\u9053\u8def\u6bb5\uff09\uff08\u4e1c\u5f80\u897f\uff09",
"fine":150,
"point":3,
"handled":0,
"violation_type":"\u9a7e\u9a76\u4e2d\u578b\u4ee5\u4e0a\u8f7d\u5ba2\u8f7d\u8d27\u6c7d\u8f66\u3001\u5371\u9669\u7269\u54c1\u8fd0\u8f93\u8f66\u8f86\u4ee5\u5916\u7684\u5176\u4ed6\u673a\u52a8\u8f66\u884c\u9a76\u8d85\u8fc7\u89c4\u5b9a\u65f6\u901f10%25\u672a\u8fbe20%25\u7684"}],
"company":{"show_rate":"0",
"site_name":"\u8f66\u884c\u6613",
"site_logo":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/89c5ff278ed0.png",
"site_qrcode":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/3f62dee41ae6.png",
"site_qrcode_desc":"\u7528\u767e\u5ea6APP\u626b\u63cf\u4e8c\u7ef4\u7801\uff0c\u624b\u673a\u67e5\u8be2\u4f53\u9a8c\u66f4\u4f73",
"site_url":"http:\/\/m.cx580.com\/bdsearch",
"params":{"xcxParams":{"fromQuery":{"carBrand":"carNumber",
"carEngine":"engineNumber",
"carNumber":"carCode"}},
"h5Params":{"inquirePath":"https:\/\/bl.cx580.com\/illegal?",
"fromQuery":{"carBrand":"carNumber",
"carEngine":"cardrive",
"carNumber":"carcode"}},
"carno":"hphm",
"vin":"body",
"engine":"engine",
"car_type":"hpzl"},
"mip":{"mip_flag":"1",
"mip_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/violation.html",
"home_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/index.html"},
"cambrian":{"type":"cambrian",
"logo":'''http:\/\/t10.baidu.com\/it\/u=2001762891,
2940990159&fm=58''',
"title":"\u9f50\u8f66\u5927\u5723\u670d\u52a1",
"wishes":"\u63d0\u4f9b4S\u5e97\u53ca\u6c7d\u8f66\u884c\u4e1a\u8d44\u8baf",
"url":'''https:\/\/author.baidu.com\/home\/1570621902200357''',
"appid":"1570621902200357",
"des":"\u5173\u6ce8\u9f50\u8f66\u5927\u5723\u718a\u638c\u53f7\uff0c\u968f\u65f6\u67e5\u8be2\u8fdd\u7ae0\u4fe1\u606f"}},
"resource":"chexingyi",
"count":1,
"fine":150,
"point":3,
"jump_token":"Ye6sJfAsXfWRwuhMPhxcw756fjEgeMH9DGd8RSoKKzQsU2tfR7naLKXXjcIg%2B4y3GUFtUdBS0ZGyJdS8JTqcubliKw%2BSCBmOrT9CE%2BMzdmPUM3AsmXrxwEqmayiAO11QUV8B5TuKv93rcuylulfBvg%3D%3D"}}

print(jsontxt["data"]["lists"])
这里注意,有两处字符串表达式不合规,自己简单的修改一下

运行效果:

D:\wp>py wp2.py
[{'time': '2021-02-21 11:26:27', 'address': '广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)', 'fine': 150, 'point': 3, 'handled': 0, 'violation_type': '驾驶中型以上载客载货汽车、危险物
品运输车辆以外的其他机动车行驶超过规定时速10%25未达20%25的'}]

D:\wp>

wp231957 发表于 2021-3-3 12:21:53

忽然发现,python 自己会识别这种编码的

>>> address="\u5e7f\u6cb3\u9ad8\u901f2\u516c\u91cc230\u7c73\uff08\u5e7f\u6cb3\u9ad8\u901f\u51e4\u51f0\u5c71\u96a7\u9053\u8def\u6bb5\uff09\uff08\u4e1c\u5f80\u897f\uff09"
>>> print(address.encode(encoding="utf-8").decode(encoding="utf-8"))
广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)
>>> address
'广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)'
>>> print(address)
广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)
>>>

kogawananari 发表于 2021-3-3 13:06:57

wp231957 发表于 2021-3-3 12:21
忽然发现,python 自己会识别这种编码的

>>> address="%u5e7f%u6cb3%u9ad8%u901f2%u516c%u91cc230%u7c7 ...

>>> "\\u5e7f\\u6cb3\\u9ad8\\u901f2\\u516c\\u91cc230\\u7c73\\uff08\\u5e7f\\u6cb3\\u9ad8\\u901f\\u51e4\\u51f0\\u5c71\\u96a7\\u9053\\u8def\\u6bb5\\uff09\\uff08\\u4e1c\\u5f80\\u897f\\uff09".encode('utf8').decode('unicode_escape')
'广河高速2公里230米(广河高速凤凰山隧道路段)(东往西)'
>>>

qq1151985918 发表于 2021-3-3 20:00:00

text = """jQuery1102012268154905248219_1614675001322({"msg":"\u6210\u529f",
"status":0,
"data":{"hphm":"\u7ca4S898S9",
"city":"dongguan",
"city_name":"\u4e1c\u839e",
"lists":[{"time":"2021-02-21 11:26:27",
"address":"\u5e7f\u6cb3\u9ad8\u901f2\u516c\u91cc230\u7c73\uff08\u5e7f\u6cb3\u9ad8\u901f\u51e4\u51f0\u5c71\u96a7\u9053\u8def\u6bb5\uff09\uff08\u4e1c\u5f80\u897f\uff09",
"fine":150,
"point":3,
"handled":0,
"violation_type":"\u9a7e\u9a76\u4e2d\u578b\u4ee5\u4e0a\u8f7d\u5ba2\u8f7d\u8d27\u6c7d\u8f66\u3001\u5371\u9669\u7269\u54c1\u8fd0\u8f93\u8f66\u8f86\u4ee5\u5916\u7684\u5176\u4ed6\u673a\u52a8\u8f66\u884c\u9a76\u8d85\u8fc7\u89c4\u5b9a\u65f6\u901f10%25\u672a\u8fbe20%25\u7684"}],
"company":{"show_rate":"0",
"site_name":"\u8f66\u884c\u6613",
"site_logo":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/89c5ff278ed0.png",
"site_qrcode":"https:\/\/cityservice-hb.cdn.bcebos.com\/amis\/img\/3f62dee41ae6.png",
"site_qrcode_desc":"\u7528\u767e\u5ea6APP\u626b\u63cf\u4e8c\u7ef4\u7801\uff0c\u624b\u673a\u67e5\u8be2\u4f53\u9a8c\u66f4\u4f73",
"site_url":"http:\/\/m.cx580.com\/bdsearch",
"params":{"xcxParams":{"fromQuery":{"carBrand":"carNumber",
"carEngine":"engineNumber",
"carNumber":"carCode"}},
"h5Params":{"inquirePath":"https:\/\/bl.cx580.com\/illegal?",
"fromQuery":{"carBrand":"carNumber",
"carEngine":"cardrive",
"carNumber":"carcode"}},
"carno":"hphm",
"vin":"body",
"engine":"engine",
"car_type":"hpzl"},
"mip":{"mip_flag":"1",
"mip_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/violation.html",
"home_host":"\/wishwing\/c\/s\/mys4s.cn\/static\/vio\/xz\/index.html"},
"cambrian":{"type":"cambrian",
"logo":"http:\/\/t10.baidu.com\/it\/u=2001762891,
2940990159&fm=58",
"title":"\u9f50\u8f66\u5927\u5723\u670d\u52a1",
"wishes":"\u63d0\u4f9b4S\u5e97\u53ca\u6c7d\u8f66\u884c\u4e1a\u8d44\u8baf",
"url":"https:\/\/author.baidu.com\/home\/1570621902200357",
"appid":"1570621902200357",
"des":"\u5173\u6ce8\u9f50\u8f66\u5927\u5723\u718a\u638c\u53f7\uff0c\u968f\u65f6\u67e5\u8be2\u8fdd\u7ae0\u4fe1\u606f"}},
"resource":"chexingyi",
"count":1,
"fine":150,
"point":3,
"jump_token":"Ye6sJfAsXfWRwuhMPhxcw756fjEgeMH9DGd8RSoKKzQsU2tfR7naLKXXjcIg%2B4y3GUFtUdBS0ZGyJdS8JTqcubliKw%2BSCBmOrT9CE%2BMzdmPUM3AsmXrxwEqmayiAO11QUV8B5TuKv93rcuylulfBvg%3D%3D"}})"""

import re
data = re.findall(r"\[.*\]",text,re.S)
print(data)
print(data)
print(eval(data))

页: [1]
查看完整版本: 一段字符串,如何提取里面的列表?