| 
 | 
 
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册  
 
x
 
pandas的DataFrame的创建和增删改 
- import pandas as pd
 
 - import time
 
  
- idx = [1,2,3,5,6,7,9,4,8]
 
 - name = ["apple","pearl","orange", "apple","orange","orange","apple","pearl","orange"]
 
 - price = [5.20,3.50,7.30,5.00,7.50,7.30,5.20,3.70,7.30]
 
 - N = 1  # 数据越大,内存差距越大
 
 - df = pd.DataFrame({ "fruit": name*N , "price" : price*N}, index = idx*N)
 
 - print (df,"\n")
 
 - print ('memory_usage',df.memory_usage(),"\n")
 
 - print (df.dtypes)
 
 - print ("*" * 20)
 
 - df['fruit'] = df['fruit'].astype('category')
 
 - # 将fruit列由Series改为了category类型,通过codes和categories组合出fruit的values
 
 - # 第二种创建方法:
 
 - # cat = pd.categorial(name)
 
 - # df['fruit'] = cat
 
 - print (df)
 
 - print ('memory_usage',df.memory_usage(),"\n")
 
 - print (df.dtypes)
 
 - print('fruit.values:',df.fruit.values)
 
 - print('fruit.values.codes:',df.fruit.values.codes)
 
 - print('fruit.values.categories:',df.fruit.values.categories)
 
 - # categories数据的修改
 
 - df.fruit.values.categories = ["Pearl", "Orange", "Apple"]
 
 - df.fruit.values.rename_categories(["Apple", "Orange", "Pearl"],inplace = True)
 
 - # categories数据的增加
 
 - df_new = pd.DataFrame({"fruit":["watermelon"] * 3, 
 
 -                        "price":[2.75, 2.60, 2.55]},
 
 -                        index = [11, 12, 13])
 
 - df.fruit.values.add_categories("watermelon", inplace = True)
 
 - df = df.append(df_new)
 
 - # 这里需要注意的是add_categories函数需要在插入数据之前调用,
 
 - # 否则数据增加进去了但是codes并未更新都是-1。
 
 - # 下面是删除categories
 
 - df = df[df.fruit != "apple"]  # 利用布尔选择删除了所有的"apple"的记录
 
 - df.fruit.values.remove_categories("apple", inplace = True)
 
 - # 删除了df的fruit这个categorical data的categories里的种类"apple",如果注释掉此语句,codes则还是用原categories进行编码。
 
 - df.fruit.cat.remove_unused_categories(inplace = True)  # 删除未使用的categories
 
 - print('总的数据个数:',df.fruit.count())
 
 - print('categories分别出现的次数:',df.fruit.value_counts())
 
  复制代码 |   
 
 
 
 |