均值归一化如何排除id列,Python交流,编程语言专区,鱼C论坛

Terence888 发表于 2024-10-26 12:37:09

均值归一化如何排除id列

#对非dummy特征进行batchnormalization
import sklearn.preprocessing as preprocessing
scaler = preprocessing.StandardScaler()

non_dummy_features = full_dataset.select_dtypes(include=['int64', 'float64']).columns.drop('Id') # 识别非dummy特征，排除id列
full_dataset = scaler.fit_transform(full_dataset)# 对非dummy特征进行标准化

full_dataset
结果
Id MSSubClass LotArea OverallQual OverallCond YearBuilt YearRemodAdd 1stFlrSF 2ndFlrSF LowQualFinSF ... Electrical_SBrkr PavedDrive_N PavedDrive_P PavedDrive_Y SaleCondition_Abnorml SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 -1.731687 0.067355 -0.216437 0.649580 -0.507503 1.047231 0.897702 -0.783214 1.210473 -0.101232 ... True False False True False False False False True False
1 -1.730501 -0.873272 -0.069108 -0.061424 2.187374 0.155606 -0.394865 0.270789 -0.784638 -0.101232 ... True False False True False False False False True False
2 -1.729314 0.067355 0.142276 0.649580 -0.507503 0.981185 0.849829 -0.617065 1.238507 -0.101232 ... True False False True False False False False True False
3 -1.728128 0.302511 -0.075514 0.649580 -0.507503 -1.858805 -0.682102 -0.510627 0.981526 -0.101232 ... True False False True True False False False False False
4 -1.726941 0.067355 0.527891 1.360583 -0.507503 0.948161 0.754083 -0.032951 1.675376 -0.101232 ... True False False True False False False False True False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1454 1.726049 2.418921 -1.050955 -1.483430 1.289082 -0.042533 -0.682102 -1.587994 0.490925 -0.101232 ... True False False True False False False False True False
1455 1.727235 2.418921 -1.056336 -1.483430 -0.507503 -0.042533 -0.682102 -1.587994 0.490925 -0.101232 ... True False False True True False False False False False
1456 1.728422 -0.873272 1.263252 -0.772427 1.289082 -0.372764 0.562592 0.172139 -0.784638 -0.101232 ... True False False True True False False False False False
1457 1.729608 0.655246 0.038633 -0.772427 -0.507503 0.683976 0.371101 -0.487262 -0.784638 -0.101232 ... True False False True False False False False True False
1458 1.730795 0.067355 -0.065649 0.649580 -0.507503 0.716999 0.466846 -0.419764 1.560903 -0.101232 ... True False False True False False False False True False
2917 rows × 155 columns

上述代码如何修改，才能让id列不被归一化，其他数值型特征实行归一化

FishC 发表于 2024-10-26 12:43:11

要排除 `Id` 列并只对数值型特征实施归一化，可以先将 `Id` 列分离出来，然后对其他数值型特征进行归一化。下面是修改后的代码：

import pandas as pd
import sklearn.preprocessing as preprocessing

# 假设 full_dataset 是你的完整数据集
# 分离出 Id 列
id_column = full_dataset['Id']

# 选择数值型特征，并排除 Id 列
non_dummy_features = full_dataset.select_dtypes(include=).drop(columns='Id')

# 实例化标准化器
scaler = preprocessing.StandardScaler()

# 对数值特征进行标准化
scaled_features = scaler.fit_transform(non_dummy_features)

# 将标准化后的特征再组合成 DataFrame
scaled_df = pd.DataFrame(scaled_features, columns=non_dummy_features.columns)

# 将 Id 列加回去
scaled_df['Id'] = id_column

# 如果需要可以重新调整列顺序
scaled_df = scaled_df[['Id'] + non_dummy_features.columns.to_list()]

scaled_df.head()

这样就确保 `Id` 列不被归一化，其他数值型特征都被进行了标准化。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

Terence888 发表于 2024-10-26 12:57:15

FishC 发表于 2024-10-26 12:43
要排除 `Id` 列并只对数值型特征实施归一化，可以先将 `Id` 列分离出来，然后对其他数值型特征进行归一化。 ...

#对非dummy特征进行batchnormalization
import sklearn.preprocessing as preprocessing
scaler = preprocessing.StandardScaler()

id_column = full_dataset['Id'] #先排除id列
non_dummy_features = full_dataset.drop(columns='Id').select_dtypes(include=['int64', 'float64']) # 识别非dummy特征
full_dataset = scaler.fit_transform(full_dataset)# 对非dummy特征进行标准化
full_dataset['Id'] = id_column #把id列加回dataframe中
full_dataset[['Id'] + non_dummy_features.columns.to_list()]

报错如下
---------------------------------------------------------------------------
ValueError                            Traceback (most recent call last)
/tmp/ipykernel_30/3989952025.py in ?()
   3 scaler = preprocessing.StandardScaler()
   4
   5 id_column = full_dataset['Id'] #先排除id列
   6 non_dummy_features = full_dataset.drop(columns='Id').select_dtypes(include=['int64', 'float64']) # 识别非dummy特征
----> 7 full_dataset = scaler.fit_transform(full_dataset)# 对非dummy特征进行标准化
   8 full_dataset['Id'] = id_column #把id列加回dataframe中
   9 full_dataset[['Id'] + non_dummy_features.columns.to_list()]

/opt/conda/lib/python3.10/site-packages/pandas/core/frame.py in ?(self, key)
4085          return self._getitem_slice(key)
4086
4087       # Do we have a (boolean) DataFrame?
4088       if isinstance(key, DataFrame):
-> 4089          return self.where(key)
4090
4091       # Do we have a (boolean) 1d indexer?
4092       if com.is_bool_indexer(key):

/opt/conda/lib/python3.10/site-packages/pandas/core/generic.py in ?(self, cond, other, inplace, axis, level)
10980                      stacklevel=2,
10981                   )
10982
10983       other = common.apply_if_callable(other, self)
> 10984       return self._where(cond, other, inplace, axis, level)

/opt/conda/lib/python3.10/site-packages/pandas/core/generic.py in ?(self, cond, other, inplace, axis, level, warn)
10669                   raise ValueError(msg.format(dtype=cond.dtype))
10670          else:
10671             for _dt in cond.dtypes:
10672                   if not is_bool_dtype(_dt):
> 10673                      raise ValueError(msg.format(dtype=_dt))
10674             if cond._mgr.any_extension_types:
10675                   # GH51574: avoid object ndarray conversion later on
10676                   cond = cond._constructor(

ValueError: Boolean array expected for the condition, not int64

FishC 发表于 2024-10-26 12:57:24

Terence888 发表于 2024-10-26 12:57
#对非dummy特征进行batchnormalization
import sklea ...

您在对数据进行标准化时出现了一个错误。原因是 `fit_transform` 函数被错误地应用到了整个数据集，其中包括 `Id` 列。以下是修正后的代码：

import pandas as pd
import sklearn.preprocessing as preprocessing

# 假设 full_dataset 是你的完整数据集
# 分离出 Id 列
id_column = full_dataset['Id']

# 选择非dummy特征（数值型特征），排除 Id 列
non_dummy_features = full_dataset.drop(columns='Id').select_dtypes(include='number')

# 实例化标准化器
scaler = preprocessing.StandardScaler()

# 对非dummy特征进行标准化
scaled_features = scaler.fit_transform(non_dummy_features)

# 将标准化后的特征转换为 DataFrame
scaled_df = pd.DataFrame(scaled_features, columns=non_dummy_features.columns)

# 将 Id 列加回去
scaled_df['Id'] = id_column

# 如果需要可以重新调整列顺序
scaled_df = scaled_df[['Id'] + non_dummy_features.columns.to_list()]

scaled_df.head()

这样就能确保 `Id` 列未被归一化，而其他数值型特征都被成功标准化。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

页: [1]

鱼C论坛's Archiver

均值归一化如何排除id列