余小c真的很强 发表于 2020-6-3 18:55:25

波士顿住宅街数据模型问题

从网上找的机器学习例题
#boston住宅数据读入
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.DESCR)
这是输出:

Boston House Prices dataset
===========================

Notes
------
Data Set Characteristics:

    :Number of Instances: 506

    :Number of Attributes: 13 numeric/categorical predictive
   
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
      - CRIM   per capita crime rate by town
      - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
      - INDUS    proportion of non-retail business acres per town
      - CHAS   Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
      - NOX      nitric oxides concentration (parts per 10 million)
      - RM       average number of rooms per dwelling
      - AGE      proportion of owner-occupied units built prior to 1940
      - DIS      weighted distances to five Boston employment centres
      - RAD      index of accessibility to radial highways
      - TAX      full-value property-tax rate per $10,000
      - PTRATIOpupil-teacher ratio by town
      - B      1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
      - LSTAT    % lower status of the population
      - MEDV   Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
   
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

#用pandas模块的dataframe读入boston住宅街的数据
import pandas as pd
df = pd.DataFrame(boston.data,columns=boston.feature_naMSE)
df['MEDV'] = boston.target#目标变量读入
x = df.RM.to_frame()
y = df.MEDV

报错信息:
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in __getattr__(self, key)
   60         try:
---> 61             return self
   62         except KeyError:

KeyError: 'feature_naMSE'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-1277f379165e> in <module>()
      1 import pandas as pd
----> 2 df = pd.DataFrame(boston.data,columns=boston.feature_naMSE)
      3 df['MEDV'] = boston.target#目标变量读入
      4 x = df.RM.to_frame()
      5 y = df.MEDV

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in __getattr__(self, key)
   61             return self
   62         except KeyError:
---> 63             raise AttributeError(key)
   64
   65   def __setstate__(self, state):

AttributeError: feature_naMSE
新手上路,求大佬指教,另外有什么好的机器学习方法吗

余小c真的很强 发表于 2020-6-4 17:27:50

听过一天的自学最终找到了问题所在import pandas as pd
df = pd.DataFrame(boston.data,columns=boston.feature_naMSE)
df['MEDV'] = boston.target#目标变量读入
x = df.RM.to_frame()
y = df.MEDV
其中的feature_naMSE应该是key中的feature_name{:10_250:}{:10_250:}{:10_250:}
自学还是有好处的

java2python 发表于 2020-6-4 17:32:18

高大上
页: [1]
查看完整版本: 波士顿住宅街数据模型问题