scikit中手写数字库
本帖最后由 波大大12138 于 2021-7-9 22:52 编辑#digits手写字体识别数据集
#导入相关模块
import matplotlib
import matplotlib.pyplot as plt
from sklearn import datasets
#digits手写字体识别数据集
#导入相关模块
import matplotlib
import matplotlib.pyplot as plt
from sklearn import datasets
数据集结构
#获取手写数据集
digits=datasets.load_digits()
#查看数据集结构
digits.keys()
dict_keys(['data', 'target', 'target_names', 'images', 'DESCR'])
#获取目标变量种类
#获取目标变量种类
digits['target_names']
array()
digits['images']
array([[[ 0.,0.,5., ...,1.,0.,0.],
[ 0.,0., 13., ..., 15.,5.,0.],
[ 0.,3., 15., ..., 11.,8.,0.],
...,
[ 0.,4., 11., ..., 12.,7.,0.],
[ 0.,2., 14., ..., 12.,0.,0.],
[ 0.,0.,6., ...,0.,0.,0.]],
[[ 0.,0.,0., ...,5.,0.,0.],
[ 0.,0.,0., ...,9.,0.,0.],
[ 0.,0.,3., ...,6.,0.,0.],
...,
[ 0.,0.,1., ...,6.,0.,0.],
[ 0.,0.,1., ...,6.,0.,0.],
[ 0.,0.,0., ..., 10.,0.,0.]],
[[ 0.,0.,0., ..., 12.,0.,0.],
[ 0.,0.,3., ..., 14.,0.,0.],
[ 0.,0.,8., ..., 16.,0.,0.],
...,
[ 0.,9., 16., ...,0.,0.,0.],
[ 0.,3., 13., ..., 11.,5.,0.],
[ 0.,0.,0., ..., 16.,9.,0.]],
...,
[[ 0.,0.,1., ...,1.,0.,0.],
[ 0.,0., 13., ...,2.,1.,0.],
[ 0.,0., 16., ..., 16.,5.,0.],
...,
[ 0.,0., 16., ..., 15.,0.,0.],
[ 0.,0., 15., ..., 16.,0.,0.],
[ 0.,0.,2., ...,6.,0.,0.]],
[[ 0.,0.,2., ...,0.,0.,0.],
[ 0.,0., 14., ..., 15.,1.,0.],
[ 0.,4., 16., ..., 16.,7.,0.],
...,
[ 0.,0.,0., ..., 16.,2.,0.],
[ 0.,0.,4., ..., 16.,2.,0.],
[ 0.,0.,5., ..., 12.,0.,0.]],
[[ 0.,0., 10., ...,1.,0.,0.],
[ 0.,2., 16., ...,1.,0.,0.],
[ 0.,0., 15., ..., 15.,0.,0.],
...,
[ 0.,4., 16., ..., 16.,6.,0.],
[ 0.,8., 16., ..., 16.,8.,0.],
[ 0.,1.,8., ..., 12.,1.,0.]]])
digits['target']
array()
实例
#获取所有实例特征数据
x=digits['data']
x
array([[ 0.,0.,5., ...,0.,0.,0.],
[ 0.,0.,0., ..., 10.,0.,0.],
[ 0.,0.,0., ..., 16.,9.,0.],
...,
[ 0.,0.,1., ...,6.,0.,0.],
[ 0.,0.,2., ..., 12.,0.,0.],
[ 0.,0., 10., ..., 12.,1.,0.]])
len(x)
1797
x,x
(array([ 0.,0.,5., 13.,9.,1.,0.,0.,0.,0., 13., 15., 10.,
15.,5.,0.,0.,3., 15.,2.,0., 11.,8.,0.,0.,4.,
12.,0.,0.,8.,8.,0.,0.,5.,8.,0.,0.,9.,8.,
0.,0.,4., 11.,0.,1., 12.,7.,0.,0.,2., 14.,5.,
10., 12.,0.,0.,0.,0.,6., 13., 10.,0.,0.,0.]),
array([ 0.,0.,0., 12., 13.,5.,0.,0.,0.,0.,0., 11., 16.,
9.,0.,0.,0.,0.,3., 15., 16.,6.,0.,0.,0.,7.,
15., 16., 16.,2.,0.,0.,0.,0.,1., 16., 16.,3.,0.,
0.,0.,0.,1., 16., 16.,6.,0.,0.,0.,0.,1., 16.,
16.,6.,0.,0.,0.,0.,0., 11., 16., 10.,0.,0.]))
len(x)
64
#将每个实例转换为图形矩阵
image_matrix=x.reshape(8,8)
image_matrix
array([[ 0.,0.,5., 13.,9.,1.,0.,0.],
[ 0.,0., 13., 15., 10., 15.,5.,0.],
[ 0.,3., 15.,2.,0., 11.,8.,0.],
[ 0.,4., 12.,0.,0.,8.,8.,0.],
[ 0.,5.,8.,0.,0.,9.,8.,0.],
[ 0.,4., 11.,0.,1., 12.,7.,0.],
[ 0.,2., 14.,5., 10., 12.,0.,0.],
[ 0.,0.,6., 13., 10.,0.,0.,0.]])
plt.imshow(image_matrix,cmap=matplotlib.cm.binary)
<matplotlib.image.AxesImage at 0x2997ce08518>
digits['images']
array([[[ 0.,0.,5., ...,1.,0.,0.],
[ 0.,0., 13., ..., 15.,5.,0.],
[ 0.,3., 15., ..., 11.,8.,0.],
...,
[ 0.,4., 11., ..., 12.,7.,0.],
[ 0.,2., 14., ..., 12.,0.,0.],
[ 0.,0.,6., ...,0.,0.,0.]],
[[ 0.,0.,0., ...,5.,0.,0.],
[ 0.,0.,0., ...,9.,0.,0.],
[ 0.,0.,3., ...,6.,0.,0.],
...,
[ 0.,0.,1., ...,6.,0.,0.],
[ 0.,0.,1., ...,6.,0.,0.],
[ 0.,0.,0., ..., 10.,0.,0.]],
[[ 0.,0.,0., ..., 12.,0.,0.],
[ 0.,0.,3., ..., 14.,0.,0.],
[ 0.,0.,8., ..., 16.,0.,0.],
...,
[ 0.,9., 16., ...,0.,0.,0.],
[ 0.,3., 13., ..., 11.,5.,0.],
[ 0.,0.,0., ..., 16.,9.,0.]],
...,
[[ 0.,0.,1., ...,1.,0.,0.],
[ 0.,0., 13., ...,2.,1.,0.],
[ 0.,0., 16., ..., 16.,5.,0.],
...,
[ 0.,0., 16., ..., 15.,0.,0.],
[ 0.,0., 15., ..., 16.,0.,0.],
[ 0.,0.,2., ...,6.,0.,0.]],
[[ 0.,0.,2., ...,0.,0.,0.],
[ 0.,0., 14., ..., 15.,1.,0.],
[ 0.,4., 16., ..., 16.,7.,0.],
...,
[ 0.,0.,0., ..., 16.,2.,0.],
[ 0.,0.,4., ..., 16.,2.,0.],
[ 0.,0.,5., ..., 12.,0.,0.]],
[[ 0.,0., 10., ...,1.,0.,0.],
[ 0.,2., 16., ...,1.,0.,0.],
[ 0.,0., 15., ..., 15.,0.,0.],
...,
[ 0.,4., 16., ..., 16.,6.,0.],
[ 0.,8., 16., ..., 16.,8.,0.],
[ 0.,1.,8., ..., 12.,1.,0.]]])
#
y=digits['target']
y
array()
len(y)
1797
问题描述:1这个image存的是实例化数字对应的图形矩阵,为什么单独看某个实例数字还要把data转为图形矩阵,我可以直接对image直接操作吗?怎么操作呢?
2:#将每个实例转换为图形矩阵
image_matrix=x.reshape(8,8)
image_matrix
这个说是对每一个实例转为图形矩阵,我理解是只对x【0】这一个实例?
3:y
array()
这个查看目标变量后面多出8什么意思?
4我观察了每个数字对应得矩阵信息的二值化数值,两边数字小,中间数字大,而实际两边是白色,中间是黑色,不应该是两边对应的数字要大吗,中间黑色对应二值化数字小吗?
5以往的数据集中,有特征数据,特征变量名,目标数据,目标变量名,这个数据集中为什么只有特征数据,没有特征名,我想用pandas去看看这个数据集数据都怎么对应保存的? 本帖最后由 学渣李某人 于 2021-7-9 11:05 编辑
1. image是一维的矩阵, 图像是二维矩阵, 所以要reshape一下
2. x是训练集, 就是训练数据, x是测试数据, 就是测试准确度的
3. target指的是这些数据实际分别对应的数字, 并不是多出8, 如果都打印出来, 你会发现基本是随机的 学渣李某人 发表于 2021-7-9 11:03
1. image是一维的矩阵, 图像是二维矩阵, 所以要reshape一下
2. x是训练集, 就是训练数据, x是测试数 ...
谢谢,大体理解了,但是对于mage_matrix=x.reshape(8,8)这条代码,我还是有疑问,他这条代码的结果只有64个二值化数字,对应于一个实际数字,而不是所有,这个image是库中已经集成好的一个类别吗,本身存储就是二维矩阵形式是吗 学渣李某人 发表于 2021-7-9 11:03
1. image是一维的矩阵, 图像是二维矩阵, 所以要reshape一下
2. x是训练集, 就是训练数据, x是测试数 ...
我很想看看所有实例数字在里面怎么存的? 波大大12138 发表于 2021-7-9 22:37
我很想看看所有实例数字在里面怎么存的?
你等下, 我运行下你的代码 波大大12138 发表于 2021-7-9 22:19
谢谢,大体理解了,但是对于mage_matrix=x.reshape(8,8)这条代码,我还是有疑问,他这条代码的结果只 ...
我看错了, x就是单一的数字, 转化为8*8只是方便查看是否正确, 实际训练时用1*64的就可以
页:
[1]