Pandas库基础知识总结

TommyTimfy · 发表于 2017-8-13 12:47:01

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

由于是知识点，我就不贴在代码里了，自己动手丰衣足食！
（学习来源：中国大学MOOC）

import pandas as pd

# 两大数据类型：Series, DataFrame
# Pandas:扩展数据类型，关注数据的应用表达，数据与索引间关系

# Series

# 1.从列表创建
a0 = pd.Series([2, 4, 6, 8]) # index: 0 1 2 3
a1 = pd.Series([2, 4, 6, 8], index = ['a', 'b', 'c', 'd']) # index: a b c d
a2 = pd.Series(25, index = ['a', 'b', 'c']) # 一个value多个index时，index不能省略

# 2.从字典创建
b0 = pd.Series({'a':2, 'b':4, 'c':6})
# b1中，index多一个，则该索引'd'的value为NaN；顺序按照index排列
b1 = pd.Series({'a':2, 'b':4, 'c':6}, index = ['b', 'c', 'd', 'a'])

import numpy as np

# 3.从ndarray类型创建
c1 = pd.Series(np.arange(5)) # 0~4
c2 = pd.Series(np.arange(5), index = np.arange(9, 4, -1)) #index [9, 5]

# 4.基本操作
# 4.1 获取索引：c1.index  index类型
# 4.2 获取数据：c1.values array类型
# 4.3 部分索引：a1['a'] or a1[0]，两套索引体系并存，但是不能混合使用
# 4.4 切片：同字符串
# 4.5 e的幂：np.exp(a0)
# 4.6 筛选：a0[ao>a0.median()]
# 4.7 布尔类型：'c' in a1(True), 2 in a1(False), 2 in a1.values(True)
# 4.8 返回值： a1.get('c', 100), 若有，则返回对应值，若没有，则返回100

# 4.9 Series对齐操作
# Series类型在运算中会自动对齐不同索引的数据。a match a, b match b

# 5 Series修改：随时修改，即刻生效

# DataFrame类型
# 1. 特点：
# 1.1 表格型数据，索引对应多列数据，每列值可以不同
# 1.2 即有行索引 axis=0，又有列索引 axis=1
# 1.3 常用于表达二维数据

# 2. 从ndarray对象创建
d1 = pd.DataFrame(np.arange(10).reshape(2, 5))

# Out[21]:
# 0  1  2  3  4
# 0  0  1  2  3  4
# 1  5  6  7  8  9

# 3. 从一位ndarray对象字典创建
dt = {'one': pd.Series([1, 2, 3], index = ['a', 'b', 'c']),
   'two': pd.Series([4, 5, 6, 7], index = ['a', 'b', 'c', 'd'])}
d2 = pd.DataFrame(dt)

#Out[23]:
# one  two
#a  1.0 4
#b  2.0 5
#c  3.0 6
#d  NaN 7

# also the same:
dt = {'one': [1, 2, 3, 4],
   'two': [5, 6, 7, 8]}
d3 = pd.DataFrame(dt, index = ['a', 'b', 'c', 'd'])

# one  two
#a 1 5
#b 2 6
#c 3 7
#d 4 8

# 4. eg1:
dt = {'城市':['北京', '上海', '广州', '深圳', '沈阳'],
   '环比':[100, 101, 102, 103, 104],
   '同比':[120, 121, 122, 123, 124],
   '定基':[120, 121, 122, 123, 124]}
d4 = pd.DataFrame(dt, index = ['c1','c2','c3','c4','c5'])

# Out[28]:
#    同比城市  定基  环比
# c1  120  北京  120  100
# c2  121  上海  121  101
# c3  122  广州  122  102
# c4  123  深圳  123  103
# c5  124  沈阳  124  104

# 5.重新索引
d4 = d4.reindex(index=['c5','c4','c3','c2','c1'])
d4 = d4.reindex(columns=['城市', '环比', '同比', '定基'])

# Out[6]:
#    城市  环比同比  定基
# c5  沈阳  104  124  124
# c4  深圳  103  123  123
# c3  广州  102  122  122
# c2  上海  101  121  121
# c1  北京  100  120  120

new_c = d4.columns.insert(4, '新增') # 只是个index类型
new_d = d4.reindex(columns = new_c, fill_value = 200)

#Out[10]:
# 城市  环比同比  定基新增
#c5  沈阳  104  124  124  200
#c4  深圳  103  123  123  200
#c3  广州  102  122  122  200
#c2  上海  101  121  121  200
#c1  北京  100  120  120  200

# 6 delete, insert
da = d4.columns.delete(2) # 删除第二列
db = d4.index.insert(5,'c0') # 插入行
nd = d4.reindex(index=db, columns=da, method='ffill')

#Out[13]:
# 城市环比定基
#c5  沈阳  104  124
#c4  深圳  103  123
#c3  广州  102  122
#c2  上海  101  121
#c1  北京  100  120
#c0  北京  100  120

# 7. drop
nd.drop('c5') # 默认去除axis = 0 的这一行
nd.drop('同比', axis=1) #注明axis = 1

# 8. 算术运算：add,sub,mul,div
e1 = pd.DataFrame(np.arange(12).reshape(3,4))
e2 = pd.DataFrame(np.arange(20).reshape(4,5))

e2.add(e1, fill_value=100) # fill_value替代NaN
e2.mul(e1, fill_value=0)

# 9.比较运算
# 同维度运算，尺寸一致；不同维度，默认在1轴

账号		自动登录	找回密码
密码			立即注册

[技术交流] Pandas库基础知识总结

马上注册，结交更多好友，享用更多功能^_^

评分

本帖被以下淘专辑推荐:

浏览过的版块