Pythonnewers 发表于 2020-5-6 09:26:56

爬百度时候无法获取死亡人数或治疗人数

我已经获取到了整个代码,但是因为那是在一群大括号里(?)那我应该怎么获取?
!

zltzlt 发表于 2020-5-6 09:27:28

本帖最后由 zltzlt 于 2020-5-6 09:30 编辑

用 json 模块将 JSON 格式的数据转化为 Python 的字典

qiuyouzhi 发表于 2020-5-6 09:29:05

如果是字典的话,用json。

Pythonnewers 发表于 2020-5-6 09:52:23

qiuyouzhi 发表于 2020-5-6 09:29
如果是字典的话,用json。

但不全是字典,前面都正常的代码,只是中间那主要的地方才有字典,json可以自动判断哪里需要转吗?

qiuyouzhi 发表于 2020-5-6 09:54:56

Pythonnewers 发表于 2020-5-6 09:52
但不全是字典,前面都正常的代码,只是中间那主要的地方才有字典,json可以自动判断哪里需要转吗?

不知道,你可以试试

颜栩栩 发表于 2020-5-6 09:56:35

你爬的是这个网址吗?https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3
这个网址我之前写过一点点 你参考下~

from selenium import webdriver
import requests
import bs4

browser = webdriver.Chrome()
browser.get("https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3")
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
targets = soup.select("#nationTable tr.VirusTable_1-1-257_3m6Ybq")
for each in targets:   
    info=""
    targets2=each.select("span")
    info+=targets2.text+" "
    targets3=each.select("td.VirusTable_1-1-257_3x1sDV.VirusTable_1-1-257_2bK5NN")
    info+=targets3.text+" "
    targets4=each.select("td.VirusTable_1-1-257_3x1sDV")
    info+=targets4.text+" "
    info+=targets4.text+" "
    targets5=each.select("td.VirusTable_1-1-257_EjGi8c")
    info+=targets5.text+" "
    info+=targets5.text+" "
    print(info)


如果你爬的不是这个网页,麻烦把网址发出来我看一下啦{:10_254:}

Pythonnewers 发表于 2020-5-6 10:23:29

颜栩栩 发表于 2020-5-6 09:56
你爬的是这个网址吗?https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3
这个网址 ...

是这个,但是我只用request,因为你这个我了解,但是没学过{:10_266:}

颜栩栩 发表于 2020-5-6 11:02:56

Pythonnewers 发表于 2020-5-6 10:23
是这个,但是我只用request,因为你这个我了解,但是没学过

方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦{:10_297:}

Pythonnewers 发表于 2020-5-6 11:25:20

颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦

import requests
url = "https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.14 Safari/537.36 Edg/83.0.478.13"}
html = requests.get(url=url,headers=headers).content.decode()
print(html)
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面{:10_262:}

Pythonnewers 发表于 2020-5-6 11:26:56

颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦

本来我想在js里找的,但是找不到{:10_247:}

Pythonnewers 发表于 2020-5-6 11:32:58

颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦

{"status":0,"message":"\u6210\u529f","result":{"moveInList":[{"city_name":"\u6210\u90fd\u5e02","province_name":"\u56db\u5ddd\u7701","value":2.73,"city_code":"510100"},{"city_name":"\u6df1\u5733\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.64,"city_code":"440300"},{"city_name":"\u5e7f\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.53,"city_code":"440100"},{"city_name":"\u4e1c\u839e\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.23,"city_code":"441900"},{"city_name":"\u4e0a\u6d77\u5e02","province_name":"\u4e0a\u6d77\u5e02","value":2.17,"city_code":"310000"},{"city_name":"\u90d1\u5dde\u5e02","province_name":"\u6cb3\u5357\u7701","value":1.94,"city_code":"410100"},{"city_name":"\u957f\u6c99\u5e02","province_name":"\u6e56\u5357\u7701","value":1.87,"city_code":"430100"},{"city_name":"\u897f\u5b89\u5e02","province_name":"\u9655\u897f\u7701","value":1.83,"city_code":"610100"},{"city_name":"\u4f5b\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.75,"city_code":"440600"},{"city_name":"\u676d\u5dde\u5e02","province_name":"\u6d59\u6c5f\u7701","value":1.74,"city_code":"330100"},{"city_name":"\u82cf\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.67,"city_code":"320500"},{"city_name":"\u5317\u4eac\u5e02","province_name":"\u5317\u4eac\u5e02","value":1.6,"city_code":"110000"},{"city_name":"\u5357\u4eac\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.24,"city_code":"320100"},{"city_name":"\u5408\u80a5\u5e02","province_name":"\u5b89\u5fbd\u7701","value":1.23,"city_code":"340100"},{"city_name":"\u6d4e\u5357\u5e02","province_name":"\u5c71\u4e1c\u7701","value":1.18,"city_code":"370100"},{"city_name":"\u6b66\u6c49\u5e02","province_name":"\u6e56\u5317\u7701","value":1.11,"city_code":"420100"},{"city_name":"\u91cd\u5e86\u5e02","province_name":"\u91cd\u5e86\u5e02","value":1.01,"city_code":"500000"},{"city_name":"\u60e0\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.99,"city_code":"441300"},{"city_name":"\u8d35\u9633\u5e02","province_name":"\u8d35\u5dde\u7701","value":0.97,"city_code":"520100"},{"city_name":"\u5929\u6d25\u5e02","province_name":"\u5929\u6d25\u5e02","value":0.91,"city_code":"120000"},{"city_name":"\u6606\u660e\u5e02","province_name":"\u4e91\u5357\u7701","value":0.9,"city_code":"530100"},{"city_name":"\u65e0\u9521\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.87,"city_code":"320200"},{"city_name":"\u4e2d\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.84,"city_code":"442000"},{"city_name":"\u9752\u5c9b\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.78,"city_code":"370200"},{"city_name":"\u592a\u539f\u5e02","province_name":"\u5c71\u897f\u7701","value":0.76,"city_code":"140100"},{"city_name":"\u5357\u5b81\u5e02","province_name":"\u5e7f\u897f\u58ee\u65cf\u81ea\u6cbb\u533a","value":0.74,"city_code":"450100"},{"city_name":"\u53a6\u95e8\u5e02","province_name":"\u798f\u5efa\u7701","value":0.72,"city_code":"350200"},{"city_name":"\u6c88\u9633\u5e02","province_name":"\u8fbd\u5b81\u7701","value":0.7,"city_code":"210100"},{"city_name":"\u77f3\u5bb6\u5e84\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.69,"city_code":"130100"},{"city_name":"\u5b81\u6ce2\u5e02","province_name":"\u6d59\u6c5f\u7701","value":0.67,"city_code":"330200"}],"moveOutList":[{"city_name":"\u5e7f\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.64,"city_code":"440100"},{"city_name":"\u6210\u90fd\u5e02","province_name":"\u56db\u5ddd\u7701","value":1.59,"city_code":"510100"},{"city_name":"\u6df1\u5733\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.58,"city_code":"440300"},{"city_name":"\u4e1c\u839e\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.41,"city_code":"441900"},{"city_name":"\u4e0a\u6d77\u5e02","province_name":"\u4e0a\u6d77\u5e02","value":1.34,"city_code":"310000"},{"city_name":"\u897f\u5b89\u5e02","province_name":"\u9655\u897f\u7701","value":1.32,"city_code":"610100"},{"city_name":"\u82cf\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.2,"city_code":"320500"},{"city_name":"\u676d\u5dde\u5e02","province_name":"\u6d59\u6c5f\u7701","value":1.15,"city_code":"330100"},{"city_name":"\u4f5b\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.1,"city_code":"440600"},{"city_name":"\u90d1\u5dde\u5e02","province_name":"\u6cb3\u5357\u7701","value":1.09,"city_code":"410100"},{"city_name":"\u5317\u4eac\u5e02","province_name":"\u5317\u4eac\u5e02","value":1.05,"city_code":"110000"},{"city_name":"\u60e0\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.04,"city_code":"441300"},{"city_name":"\u957f\u6c99\u5e02","province_name":"\u6e56\u5357\u7701","value":1.04,"city_code":"430100"},{"city_name":"\u91cd\u5e86\u5e02","province_name":"\u91cd\u5e86\u5e02","value":0.9,"city_code":"500000"},{"city_name":"\u5357\u4eac\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.78,"city_code":"320100"},{"city_name":"\u5408\u80a5\u5e02","province_name":"\u5b89\u5fbd\u7701","value":0.75,"city_code":"340100"},{"city_name":"\u8d35\u9633\u5e02","province_name":"\u8d35\u5dde\u7701","value":0.75,"city_code":"520100"},{"city_name":"\u6d4e\u5357\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.71,"city_code":"370100"},{"city_name":"\u5929\u6d25\u5e02","province_name":"\u5929\u6d25\u5e02","value":0.7,"city_code":"120000"},{"city_name":"\u5eca\u574a\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.7,"city_code":"131000"},{"city_name":"\u4fdd\u5b9a\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.69,"city_code":"130600"},{"city_name":"\u6606\u660e\u5e02","province_name":"\u4e91\u5357\u7701","value":0.68,"city_code":"530100"},{"city_name":"\u54b8\u9633\u5e02","province_name":"\u9655\u897f\u7701","value":0.68,"city_code":"610400"},{"city_name":"\u65e0\u9521\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.67,"city_code":"320200"},{"city_name":"\u5357\u5b81\u5e02","province_name":"\u5e7f\u897f\u58ee\u65cf\u81ea\u6cbb\u533a","value":0.63,"city_code":"450100"},{"city_name":"\u4e34\u6c82\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.63,"city_code":"371300"},{"city_name":"\u4e2d\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.62,"city_code":"442000"},{"city_name":"\u5357\u901a\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.62,"city_code":"320600"},{"city_name":"\u77f3\u5bb6\u5e84\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.6,"city_code":"130100"},{"city_name":"\u5f90\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.58,"city_code":"320300"}],"cityCode":0,"time":"20200504"}}
在XHR发现了这个!但是问题就在这...这要怎么获取最后一个字典的值{:10_247:}
还是这些\u91cd\这些编码格式(?)

颜栩栩 发表于 2020-5-6 11:57:38

Pythonnewers 发表于 2020-5-6 11:25
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面

好的 我知道啦 刚刚看了一下这个是可以解决的 你的代码已经获取到了你需要的信息~下午写一下 {:10_281:}

Pythonnewers 发表于 2020-5-6 12:07:57

颜栩栩 发表于 2020-5-6 11:57
好的 我知道啦 刚刚看了一下这个是可以解决的 你的代码已经获取到了你需要的信息~下午写一下

谢谢啦{:10_250:}

颜栩栩 发表于 2020-5-6 14:03:04

Pythonnewers 发表于 2020-5-6 11:25
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面

在你写的基础上改写啦,就是一个转为dict 的过程~{:10_327:}
import requests
import bs4
import json

url = "https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.14 Safari/537.36 Edg/83.0.478.13"}
html = requests.get(url=url,headers=headers).content.decode()
soup = bs4.BeautifulSoup(html, "html.parser")
targets = soup.select("#captain-config")
info=json.loads(targets.text)
for city in info['component']['caseList']:
    print(city['area'],city['confirmedRelative'],city['curConfirm'],city['confirmed'],city['crued'],city['died'])

Pythonnewers 发表于 2020-5-6 14:25:00

颜栩栩 发表于 2020-5-6 14:03
在你写的基础上改写啦,就是一个转为dict 的过程~

D:\Python源码>C:/Users/Administrator/AppData/Local/Programs/Python/Python38/python.exe d:/Python源码/photo/text.py
Traceback (most recent call last):
File "d:/Python源码/photo/text.py", line 10, in <module>
    info=json.loads(targets.text)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

为什么我纯复制也会报错{:10_247:}

颜栩栩 发表于 2020-5-6 14:32:41

Pythonnewers 发表于 2020-5-6 14:25
为什么我纯复制也会报错

{:10_266:}最上面几个import加载了吗,然后可以看一下targets.text 这个数据存不存在

Pythonnewers 发表于 2020-5-6 15:19:06

颜栩栩 发表于 2020-5-6 14:32
最上面几个import加载了吗,然后可以看一下targets.text 这个数据存不存在

谢谢了,解决了,你这个行不通但是给了思路,我用json和正则表达式搞出来字典了,
(我的bug真的多caselist我一直写成carrylist总是过不去){:10_262:}

颜栩栩 发表于 2020-5-6 15:23:13

Pythonnewers 发表于 2020-5-6 15:19
谢谢了,解决了,你这个行不通但是给了思路,我用json和正则表达式搞出来字典了,
(我的bug真的多caselist我 ...

{:10_298:}做出来就好啦

Pythonnewers 发表于 2020-5-6 15:32:07

颜栩栩 发表于 2020-5-6 15:23
做出来就好啦

我要瞎了{:10_262:}
我怀疑爬虫工程师的眼神和眼镜度数{:10_266:}

Pythonnewers 发表于 2020-5-6 23:17:22

颜栩栩 发表于 2020-5-6 15:23
做出来就好啦

想起来你那个获取json用的是.text,那个是什么,如果用bs4获取字符串应该是string或者strings或者get_text,text是编码啊?!
页: [1] 2
查看完整版本: 爬百度时候无法获取死亡人数或治疗人数