|
20鱼币
问题源于学习时常见的那个信用卡欺诈,
pd.DataFrame(index=range(len(c_param_range),2),columns=['C_parameter','Mean Recall score'])
这个里面index=range(len(c_param_range),2)不是特别懂,index不是应该写成range(len(c_param_range))即可吗,那2是干嘛用的?
==================================================================================================
- def print_kfold_scores(x_train_data,y_train_data):
- fold = KFold(5,shuffle=False)#数据数目,交叉折叠次数5次,不进行洗牌
- c_param_range = [0.01,0.1,1,10,100] #待选的模型参数
- #新建一个dataframe类型,列名是参数取值、平均召回率
- [color=Red] result_table = pd.DataFrame(index=range(len(c_param_range),2),columns=['C_parameter','Mean Recall score'])[/color]
- result_table['C_parameter']=c_param_range
- j=0
- for c_param in c_param_range:
- print "=============================="
- print "C parameter:",c_param
- print "------------------------------"
- recall_accs = []
- for iteration,indices in enumerate(fold.split(x_train_data)):
- lr = LogisticRegression(C=c_param,penalty='l1') #实例化逻辑回归模型
- lr.fit(x_train_data.iloc[indices[0],:],y_train_data.iloc[indices[0],:].values.ravel())
- y_pred_undersample = lr.predict(x_train_data.iloc[indices[1],:].values)
- recall_acc = recall_score(y_train_data.iloc[indices[1],:].values,y_pred_undersample)
- recall_accs.append(recall_acc)
- print "recall score=", recall_acc
- #the mean value of the recall scores is the metric we want to save and fet hold of.
- result_table.ix[j,'Mean Recall score'] = np.mean(recall_accs)
- j=+1
- print ''
- print "Mean Recall score:",np.mean(recall_accs)
- best_c = result_table.loc[result_table['Mean Recall score'].astype('float64').idxmax()]['C_parameter']
- #finally,we can check which C parameter is the best amongst the chosen
- print "**************************"
- print "Best model to choose from cross validation is with parameter= ",best_c
- print "**************************"
- return best_c
复制代码
本帖最后由 彩虹七号 于 2019-12-20 09:05 编辑
>>> c_param_range = [0.01]
>>> result_table = pd.DataFrame(index=range(len(c_param_range),2),columns=['C_parameter','Mean Recall score'])
>>> result_table
C_parameter Mean Recall score
1 NaN NaN
>>> result_table['C_parameter']=c_param_range
>>> result_table
C_parameter Mean Recall score
1 0.01 NaN
***************************************************************
>>> c_param_range = [0.01,0.1]
>>> result_table = pd.DataFrame(index=range(len(c_param_range),2),columns=['C_parameter','Mean Recall score'])
>>> result_table
Empty DataFrame
Columns: [C_parameter, Mean Recall score]
Index: []
>>> result_table['C_parameter']=c_param_range
>>> result_table
C_parameter Mean Recall score
0 0.01 NaN
1 0.10 NaN
当待选的模型参数中只有一个的时候 index设置为1,当大于1 的时候index设置从0开始 ,估计是这个意思,感觉没什么意义
|
最佳答案
查看完整内容
>>> c_param_range = [0.01]
>>> result_table = pd.DataFrame(index=range(len(c_param_range),2),columns=['C_parameter','Mean Recall score'])
>>> result_table
C_parameter Mean Recall score
1 NaN NaN
>>> result_table['C_parameter']=c_param_range
>>> result_table
C_parameter Mean Recall score
1 0.01 NaN
******************************************* ...
|