鱼C论坛

 找回密码
 立即注册
查看: 1056|回复: 21

[已解决]爬百度时候无法获取死亡人数或治疗人数

[复制链接]
发表于 2020-5-6 09:26:56 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
我已经获取到了整个代码,但是因为那是在一群大括号里(?)那我应该怎么获取?
!
最佳答案
2020-5-6 14:03:04
Pythonnewers 发表于 2020-5-6 11:25
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面

在你写的基础上改写啦,就是一个转为dict 的过程~
import requests
import bs4
import json

url = "https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.14 Safari/537.36 Edg/83.0.478.13"}
html = requests.get(url=url,headers=headers).content.decode()
soup = bs4.BeautifulSoup(html, "html.parser")
targets = soup.select("#captain-config")
info=json.loads(targets[0].text)
for city in info['component'][0]['caseList']:
    print(city['area'],city['confirmedRelative'],city['curConfirm'],city['confirmed'],city['crued'],city['died'])
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-5-6 09:27:28 | 显示全部楼层
本帖最后由 zltzlt 于 2020-5-6 09:30 编辑

用 json 模块将 JSON 格式的数据转化为 Python 的字典
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 09:29:05 | 显示全部楼层
如果是字典的话,用json。
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 09:52:23 | 显示全部楼层
qiuyouzhi 发表于 2020-5-6 09:29
如果是字典的话,用json。

但不全是字典,前面都正常的代码,只是中间那主要的地方才有字典,json可以自动判断哪里需要转吗?
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 09:54:56 | 显示全部楼层
Pythonnewers 发表于 2020-5-6 09:52
但不全是字典,前面都正常的代码,只是中间那主要的地方才有字典,json可以自动判断哪里需要转吗?

不知道,你可以试试
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 09:56:35 | 显示全部楼层
你爬的是这个网址吗?https://voice.baidu.com/act/newp ... ia/?from=osari_pc_3
这个网址我之前写过一点点 你参考下~
from selenium import webdriver
import requests
import bs4

browser = webdriver.Chrome()
browser.get("https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3")
soup = bs4.BeautifulSoup(browser.page_source, "html.parser")
targets = soup.select("#nationTable tr.VirusTable_1-1-257_3m6Ybq")
for each in targets:   
    info=""
    targets2=each.select("span")
    info+=targets2[1].text+" "
    targets3=each.select("td.VirusTable_1-1-257_3x1sDV.VirusTable_1-1-257_2bK5NN")
    info+=targets3[0].text+" "
    targets4=each.select("td.VirusTable_1-1-257_3x1sDV")
    info+=targets4[1].text+" "
    info+=targets4[2].text+" "
    targets5=each.select("td.VirusTable_1-1-257_EjGi8c")
    info+=targets5[0].text+" "
    info+=targets5[1].text+" "
    print(info)

如果你爬的不是这个网页,麻烦把网址发出来我看一下啦
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 10:23:29 | 显示全部楼层
颜栩栩 发表于 2020-5-6 09:56
你爬的是这个网址吗?https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_3
这个网址 ...

是这个,但是我只用request,因为你这个我了解,但是没学过
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 11:02:56 | 显示全部楼层
Pythonnewers 发表于 2020-5-6 10:23
是这个,但是我只用request,因为你这个我了解,但是没学过

方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 11:25:20 | 显示全部楼层
颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦
import requests
url = "https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.14 Safari/537.36 Edg/83.0.478.13"}
html = requests.get(url=url,headers=headers).content.decode()
print(html)
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 11:26:56 | 显示全部楼层
颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦

本来我想在js里找的,但是找不到
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 11:32:58 | 显示全部楼层
颜栩栩 发表于 2020-5-6 11:02
方便把你用request写的请求贴一下代码出来吗 我看一下你获取的内容啦

{"status":0,"message":"\u6210\u529f","result":{"moveInList":[{"city_name":"\u6210\u90fd\u5e02","province_name":"\u56db\u5ddd\u7701","value":2.73,"city_code":"510100"},{"city_name":"\u6df1\u5733\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.64,"city_code":"440300"},{"city_name":"\u5e7f\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.53,"city_code":"440100"},{"city_name":"\u4e1c\u839e\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":2.23,"city_code":"441900"},{"city_name":"\u4e0a\u6d77\u5e02","province_name":"\u4e0a\u6d77\u5e02","value":2.17,"city_code":"310000"},{"city_name":"\u90d1\u5dde\u5e02","province_name":"\u6cb3\u5357\u7701","value":1.94,"city_code":"410100"},{"city_name":"\u957f\u6c99\u5e02","province_name":"\u6e56\u5357\u7701","value":1.87,"city_code":"430100"},{"city_name":"\u897f\u5b89\u5e02","province_name":"\u9655\u897f\u7701","value":1.83,"city_code":"610100"},{"city_name":"\u4f5b\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.75,"city_code":"440600"},{"city_name":"\u676d\u5dde\u5e02","province_name":"\u6d59\u6c5f\u7701","value":1.74,"city_code":"330100"},{"city_name":"\u82cf\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.67,"city_code":"320500"},{"city_name":"\u5317\u4eac\u5e02","province_name":"\u5317\u4eac\u5e02","value":1.6,"city_code":"110000"},{"city_name":"\u5357\u4eac\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.24,"city_code":"320100"},{"city_name":"\u5408\u80a5\u5e02","province_name":"\u5b89\u5fbd\u7701","value":1.23,"city_code":"340100"},{"city_name":"\u6d4e\u5357\u5e02","province_name":"\u5c71\u4e1c\u7701","value":1.18,"city_code":"370100"},{"city_name":"\u6b66\u6c49\u5e02","province_name":"\u6e56\u5317\u7701","value":1.11,"city_code":"420100"},{"city_name":"\u91cd\u5e86\u5e02","province_name":"\u91cd\u5e86\u5e02","value":1.01,"city_code":"500000"},{"city_name":"\u60e0\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.99,"city_code":"441300"},{"city_name":"\u8d35\u9633\u5e02","province_name":"\u8d35\u5dde\u7701","value":0.97,"city_code":"520100"},{"city_name":"\u5929\u6d25\u5e02","province_name":"\u5929\u6d25\u5e02","value":0.91,"city_code":"120000"},{"city_name":"\u6606\u660e\u5e02","province_name":"\u4e91\u5357\u7701","value":0.9,"city_code":"530100"},{"city_name":"\u65e0\u9521\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.87,"city_code":"320200"},{"city_name":"\u4e2d\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.84,"city_code":"442000"},{"city_name":"\u9752\u5c9b\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.78,"city_code":"370200"},{"city_name":"\u592a\u539f\u5e02","province_name":"\u5c71\u897f\u7701","value":0.76,"city_code":"140100"},{"city_name":"\u5357\u5b81\u5e02","province_name":"\u5e7f\u897f\u58ee\u65cf\u81ea\u6cbb\u533a","value":0.74,"city_code":"450100"},{"city_name":"\u53a6\u95e8\u5e02","province_name":"\u798f\u5efa\u7701","value":0.72,"city_code":"350200"},{"city_name":"\u6c88\u9633\u5e02","province_name":"\u8fbd\u5b81\u7701","value":0.7,"city_code":"210100"},{"city_name":"\u77f3\u5bb6\u5e84\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.69,"city_code":"130100"},{"city_name":"\u5b81\u6ce2\u5e02","province_name":"\u6d59\u6c5f\u7701","value":0.67,"city_code":"330200"}],"moveOutList":[{"city_name":"\u5e7f\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.64,"city_code":"440100"},{"city_name":"\u6210\u90fd\u5e02","province_name":"\u56db\u5ddd\u7701","value":1.59,"city_code":"510100"},{"city_name":"\u6df1\u5733\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.58,"city_code":"440300"},{"city_name":"\u4e1c\u839e\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.41,"city_code":"441900"},{"city_name":"\u4e0a\u6d77\u5e02","province_name":"\u4e0a\u6d77\u5e02","value":1.34,"city_code":"310000"},{"city_name":"\u897f\u5b89\u5e02","province_name":"\u9655\u897f\u7701","value":1.32,"city_code":"610100"},{"city_name":"\u82cf\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":1.2,"city_code":"320500"},{"city_name":"\u676d\u5dde\u5e02","province_name":"\u6d59\u6c5f\u7701","value":1.15,"city_code":"330100"},{"city_name":"\u4f5b\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.1,"city_code":"440600"},{"city_name":"\u90d1\u5dde\u5e02","province_name":"\u6cb3\u5357\u7701","value":1.09,"city_code":"410100"},{"city_name":"\u5317\u4eac\u5e02","province_name":"\u5317\u4eac\u5e02","value":1.05,"city_code":"110000"},{"city_name":"\u60e0\u5dde\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":1.04,"city_code":"441300"},{"city_name":"\u957f\u6c99\u5e02","province_name":"\u6e56\u5357\u7701","value":1.04,"city_code":"430100"},{"city_name":"\u91cd\u5e86\u5e02","province_name":"\u91cd\u5e86\u5e02","value":0.9,"city_code":"500000"},{"city_name":"\u5357\u4eac\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.78,"city_code":"320100"},{"city_name":"\u5408\u80a5\u5e02","province_name":"\u5b89\u5fbd\u7701","value":0.75,"city_code":"340100"},{"city_name":"\u8d35\u9633\u5e02","province_name":"\u8d35\u5dde\u7701","value":0.75,"city_code":"520100"},{"city_name":"\u6d4e\u5357\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.71,"city_code":"370100"},{"city_name":"\u5929\u6d25\u5e02","province_name":"\u5929\u6d25\u5e02","value":0.7,"city_code":"120000"},{"city_name":"\u5eca\u574a\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.7,"city_code":"131000"},{"city_name":"\u4fdd\u5b9a\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.69,"city_code":"130600"},{"city_name":"\u6606\u660e\u5e02","province_name":"\u4e91\u5357\u7701","value":0.68,"city_code":"530100"},{"city_name":"\u54b8\u9633\u5e02","province_name":"\u9655\u897f\u7701","value":0.68,"city_code":"610400"},{"city_name":"\u65e0\u9521\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.67,"city_code":"320200"},{"city_name":"\u5357\u5b81\u5e02","province_name":"\u5e7f\u897f\u58ee\u65cf\u81ea\u6cbb\u533a","value":0.63,"city_code":"450100"},{"city_name":"\u4e34\u6c82\u5e02","province_name":"\u5c71\u4e1c\u7701","value":0.63,"city_code":"371300"},{"city_name":"\u4e2d\u5c71\u5e02","province_name":"\u5e7f\u4e1c\u7701","value":0.62,"city_code":"442000"},{"city_name":"\u5357\u901a\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.62,"city_code":"320600"},{"city_name":"\u77f3\u5bb6\u5e84\u5e02","province_name":"\u6cb3\u5317\u7701","value":0.6,"city_code":"130100"},{"city_name":"\u5f90\u5dde\u5e02","province_name":"\u6c5f\u82cf\u7701","value":0.58,"city_code":"320300"}],"cityCode":0,"time":"20200504"}}
在XHR发现了这个!但是问题就在这...这要怎么获取最后一个字典的值
还是这些\u91cd\这些编码格式(?)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 11:57:38 | 显示全部楼层
Pythonnewers 发表于 2020-5-6 11:25
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面

好的 我知道啦 刚刚看了一下这个是可以解决的 你的代码已经获取到了你需要的信息~下午写一下
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 12:07:57 | 显示全部楼层
颜栩栩 发表于 2020-5-6 11:57
好的 我知道啦 刚刚看了一下这个是可以解决的 你的代码已经获取到了你需要的信息~下午写一下

谢谢啦
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 14:03:04 | 显示全部楼层    本楼为最佳答案   
Pythonnewers 发表于 2020-5-6 11:25
特简单,虽然我知道这可能不对,但是治疗人数之类的数值确实在里面

在你写的基础上改写啦,就是一个转为dict 的过程~
import requests
import bs4
import json

url = "https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_pc_1"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.14 Safari/537.36 Edg/83.0.478.13"}
html = requests.get(url=url,headers=headers).content.decode()
soup = bs4.BeautifulSoup(html, "html.parser")
targets = soup.select("#captain-config")
info=json.loads(targets[0].text)
for city in info['component'][0]['caseList']:
    print(city['area'],city['confirmedRelative'],city['curConfirm'],city['confirmed'],city['crued'],city['died'])
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 14:25:00 | 显示全部楼层
颜栩栩 发表于 2020-5-6 14:03
在你写的基础上改写啦,就是一个转为dict 的过程~
D:\Python源码>C:/Users/Administrator/AppData/Local/Programs/Python/Python38/python.exe d:/Python源码/photo/text.py
Traceback (most recent call last):
  File "d:/Python源码/photo/text.py", line 10, in <module>
    info=json.loads(targets[0].text)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
为什么我纯复制也会报错
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 14:32:41 | 显示全部楼层
Pythonnewers 发表于 2020-5-6 14:25
为什么我纯复制也会报错


最上面几个import加载了吗,然后可以看一下targets[0].text 这个数据存不存在
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 15:19:06 | 显示全部楼层
颜栩栩 发表于 2020-5-6 14:32
最上面几个import加载了吗,然后可以看一下targets[0].text 这个数据存不存在

谢谢了,解决了,你这个行不通但是给了思路,我用json和正则表达式搞出来字典了,
(我的bug真的多caselist我一直写成carrylist总是过不去)
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

发表于 2020-5-6 15:23:13 | 显示全部楼层
Pythonnewers 发表于 2020-5-6 15:19
谢谢了,解决了,你这个行不通但是给了思路,我用json和正则表达式搞出来字典了,
(我的bug真的多caselist我 ...

做出来就好啦
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 15:32:07 | 显示全部楼层

我要瞎了
我怀疑爬虫工程师的眼神和眼镜度数
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-5-6 23:17:22 | 显示全部楼层

想起来你那个获取json用的是.text,那个是什么,如果用bs4获取字符串应该是string或者strings或者get_text,text是编码啊?!
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-11-27 05:33

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表