鱼C论坛

 找回密码
 立即注册
查看: 1905|回复: 1

[学习笔记] 特征选择--PCA与互信息的使用与比较

[复制链接]
发表于 2021-12-16 17:53:26 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
本帖最后由 Handsome_zhou 于 2021-12-16 18:31 编辑
  1. from sklearn.datasets import load_boston
  2. import pandas as pd
  3. import numpy as np
  4. d = load_boston()
  5. x = d.data
  6. y = d.target
  7. print(x[:10])
  8. print('形状:', x.shape)

  9. from sklearn.decomposition import PCA
  10. pca = PCA(n_components=10)
  11. x = pca.fit_transform(x)

  12. %time
  13. x = pd.DataFrame(d.data)
  14. y = pd.DataFrame(d.target)
  15. x = np.matrix(x)
  16. y = np.matrix(y)

  17. from sklearn.preprocessing import StandardScaler #标准化
  18. scaler = StandardScaler()
  19. X = scaler.fit_transform(X)
  20. y = scaler.fit_transform(y)

  21. from sklearn.model_selection import train_test_split
  22. X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)

  23. from sklearn.svm import SVR
  24. svr = SVR(gamma = 'scale')

  25. svr.fit(X_train,y_train)
  26. train_score = svr.score(X_train,y_train)
  27. test_score = svr.score(X_test,y_test)
  28. print('train score: {} ; test score: {}'.format(train_score,test_score))
复制代码

结果:
01.jpg



  1. from sklearn import datasets
  2. from sklearn.feature_selection import SelectKBest
  3. from sklearn.feature_selection import mutual_info_regression
  4. import pandas as pd
  5. import numpy as np
  6. from sklearn.datasets import load_boston
  7. d = load_boston()

  8. x = pd.DataFrame(d.data)
  9. y = pd.DataFrame(d.target)
  10. x = np.matrix(x)
  11. y = np.matrix(y)
  12. mi = mutual_info_regression(x,y)
  13. X = SelectKBest(mutual_info_regression, k=10).fit_transform(x,y)

  14. from sklearn.preprocessing import StandardScaler #标准化
  15. scaler = StandardScaler()
  16. X = scaler.fit_transform(X)
  17. y = scaler.fit_transform(y)

  18. from sklearn.model_selection import train_test_split
  19. X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)

  20. from sklearn.svm import SVR
  21. svr = SVR(gamma = 'scale')

  22. svr.fit(X_train,y_train)
  23. train_score = svr.score(X_train,y_train)
  24. test_score = svr.score(X_test,y_test)
  25. print('train score: {} ; test score: {}'.format(train_score,test_score))
复制代码

结果:
02.png

在用boston数据集得到的结果来看,PCA和互信息两种特征选择方法得到的数据在支持向量机下并没有明显的性能差别。
小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

发表于 2021-12-17 17:36:48 | 显示全部楼层
学习

小甲鱼最新课程 -> https://ilovefishc.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2025-4-24 19:21

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表