鱼C论坛

 找回密码
 立即注册
查看: 962|回复: 2

[已解决]我的报错要怎么修改

[复制链接]
发表于 2020-4-9 20:07:57 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能^_^

您需要 登录 才可以下载或查看,没有账号?立即注册

x
报错
0.35633487519024176 4
Traceback (most recent call last):
  File "C:\Users\zmj佳佳佳\Desktop\特征选择.py", line 41, in <module>
    plt.plot(range(1,201),superpa)
  File "C:\Users\zmj佳佳佳\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\pyplot.py", line 2761, in plot
    return gca().plot(
  File "C:\Users\zmj佳佳佳\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_axes.py", line 1646, in plot
    lines = [*self._get_lines(*args, data=data, **kwargs)]
  File "C:\Users\zmj佳佳佳\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 216, in __call__
    yield from self._plot_args(this, kwargs)
  File "C:\Users\zmj佳佳佳\AppData\Local\Programs\Python\Python38\lib\site-packages\matplotlib\axes\_base.py", line 342, in _plot_args
    raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (200,) and (5,)
import pandas as pd #导入数据集
url=r"C:\Users\zmj佳佳佳\Desktop\第六步离散化测试.csv"
df = pd.read_csv(url, header = None,low_memory=False)#将数据集分为训练集和测试集
df.columns=["grade","dti","delinq_2yrs","earliest_cr_line","fico_range_low","inq_last_6mths",
            "mths_since_last_delinq","pub_rec","revol_bal","revol_util","mths_since_last_major_derog",
            "tot_cur_bal","open_acc_6m","open_il_12m","open_il_24m","mths_since_rcnt_il","open_rv_12m",
            "open_rv_24m","max_bal_bc","all_util","inq_last_12m","acc_open_past_24mths","avg_cur_bal",
            "bc_open_to_buy","mo_sin_old_il_acct","mo_sin_old_rev_tl_op","mo_sin_rcnt_rev_tl_op","mo_sin_rcnt_tl",
           "mort_acc","mths_since_recent_bc_dlq","mths_since_recent_inq","mths_since_recent_revol_delinq",
            "num_accts_ever_120_pd","num_actv_bc_tl","num_actv_rev_tl","num_bc_sats","num_bc_tl",
            "num_rev_accts","num_rev_tl_bal_gt_0","num_tl_90g_dpd_24m","num_tl_op_past_12m","pct_tl_nvr_dlq",
            "pub_rec_bankruptcies"]
#将数据集分为训练集和测试集
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
x, y = df.iloc[:, 1:].values, df.iloc[:, 0].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
feat_labels = df.columns[1:]
forest = RandomForestClassifier(min_samples_split=5,min_samples_leaf=3)
forest.fit(x_train, y_train)
#param={"n_estimators":[10,20],"max_depth":[5,8]}
#网格搜索与交叉验证
#gc=GridSearchCV(forest,param_grid=param,cv=3)
#gc.fit(x_train,y_train)
#print("准确率:",gc.score(x_test,y_test))
#print("查看选择的参数模型:",gc.best_params_)


#n_estimators的学习曲线

from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
superpa = []
for i in range(5):
    forest = RandomForestClassifier(n_estimators=i+1,n_jobs=-1)
    rfc_s = cross_val_score(forest,x,y,cv=2).mean()
    superpa.append(rfc_s)
print(max(superpa),superpa.index(max(superpa)))
plt.figure(figsize=[20,5])
plt.plot(range(1,201),superpa)
plt.show()


#特征重要性评估
import numpy as np
importances = forest.feature_importances_
indices = np.argsort(importances)[::-1]
for f in range(x_train.shape[1]):
    print("%2d) %-*s %f" % (f + 1, 30, feat_labels[indices[f]], importances[indices[f]]))
最佳答案
2020-4-9 20:26:05
你的superpa是一个含5个元素的列表,而x轴是1-200含两百个值,两者不匹配,画不了图
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复

使用道具 举报

发表于 2020-4-9 20:26:05 | 显示全部楼层    本楼为最佳答案   
你的superpa是一个含5个元素的列表,而x轴是1-200含两百个值,两者不匹配,画不了图
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

 楼主| 发表于 2020-4-9 20:42:00 | 显示全部楼层
BngThea 发表于 2020-4-9 20:26
你的superpa是一个含5个元素的列表,而x轴是1-200含两百个值,两者不匹配,画不了图

嗯嗯,可以了,谢谢
想知道小甲鱼最近在做啥?请访问 -> ilovefishc.com
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

小黑屋|手机版|Archiver|鱼C工作室 ( 粤ICP备18085999号-1 | 粤公网安备 44051102000585号)

GMT+8, 2024-11-26 05:48

Powered by Discuz! X3.4

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表