损失上百万如何解决

Terence888 · 发表于 2024-10-26 16:20:31

FishC 发表于 2024-10-26 16:02
出现这个错误的原因是数据框 `X_test` 仍然包含一些无法转换为数值型的数据。我们需要对 `X_test` 做进 ...

MSSubClass float64
LotArea float64
OverallQual float64
OverallCond float64
YearBuilt float64
...
SaleCondition_AdjLand bool
SaleCondition_Alloca bool
SaleCondition_Family bool
SaleCondition_Normal bool
SaleCondition_Partial bool
Length: 153, dtype: object

复制代码

以上是结果，说明数据类型都是数值型了吗

FishC · 发表于 2024-10-26 16:20:39

Terence888 发表于 2024-10-26 16:20
MSSubClass float64
LotArea ...

您提供的结果显示，除了 `bool` 类型外，其他都是 `float64` 类型，这些都可以被转换为数值类型。`bool` 类型的列也可以直接转换为 `int` 类型。

在将数据转换为 PyTorch 张量之前，可以按以下步骤进行处理：

# 将布尔型转为整数型
X_test = X_test.astype(int)
# 转换为 PyTorch 张量
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)
# 进行预测
with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten() # 将张量转换为 numpy 数组
# 假设 X_test 有一个 Id 列用于唯一标识，如果没有需要自行定义
submission = pd.DataFrame({'Id': X_test.index, 'SalePrice': test_output_numpy})
submission.to_csv('submission.csv', index=False)
print("Submission file created successfully.")

复制代码

这样，可以确保您的数据框中，所有的布尔型和浮点型数据都不会导致类型问题。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

Terence888 · 发表于 2024-10-26 17:01:51

FishC 发表于 2024-10-26 16:20
您提供的结果显示，除了 `bool` 类型外，其他都是 `float64` 类型，这些都可以被转换为数值类型。`bool ...

test_data['Id']
0 1461
1 1462
2 1463
3 1464
4 1465
...
1454 2915
1455 2916
1456 2917
1457 2918
1458 2919
Name: Id, Length: 1459, dtype: int64

复制代码

test_data的Id值如何作为新dataframe的id列，以下代码存在报错

#模型期望输入的是PyTorch 张量,先将 DataFrame 转换为张量
X_test = X_test.astype(int)# 将布尔型转为整数型
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)# 转换为 PyTorch 张量
with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten() # 将张量转换为 numpy 数组
submission = pd.DataFrame({'Id': test_data['Id'].values, 'SalePrice': test_output})
#submission.to_csv('submission.csv', index=False)

复制代码

错误如下：

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[94], line 9
7 test_output = dnn_model(X_test_tensor)
8 test_output_numpy = test_output.numpy().flatten() # 将张量转换为 numpy 数组
----> 9 submission = pd.DataFrame({'Id': test_data['Id'].values, 'SalePrice': test_output})
10 #submission.to_csv('submission.csv', index=False)
File /opt/conda/lib/python3.10/site-packages/pandas/core/frame.py:778, in DataFrame.__init__(self, data, index, columns, dtype, copy)
772 mgr = self._init_mgr(
773 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
774 )
776 elif isinstance(data, dict):
777 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 778 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
779 elif isinstance(data, ma.MaskedArray):
780 from numpy.ma import mrecords
File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:503, in dict_to_mgr(data, index, columns, dtype, typ, copy)
499 else:
500 # dtype check to exclude e.g. range objects, scalars
501 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays]
--> 503 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:119, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
116 index = ensure_index(index)
118 # don't force copy because getting jammed in an ndarray anyway
--> 119 arrays, refs = _homogenize(arrays, index, dtype)
120 # _homogenize ensures
121 # - all(len(x) == len(index) for x in arrays)
122 # - all(x.ndim == 1 for x in arrays)
(...)
125
126 else:
127 index = ensure_index(index)
File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:629, in _homogenize(data, index, dtype)
626 val = dict(val)
627 val = lib.fast_multiget(val, oindex._values, default=np.nan)
--> 629 val = sanitize_array(val, index, dtype=dtype, copy=False)
630 com.require_length_match(val, index)
631 refs.append(None)
File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:633, in sanitize_array(data, index, dtype, copy, allow_2d)
631 else:
632 data = np.array(data, copy=copy)
--> 633 return sanitize_array(
634 data,
635 index=index,
636 dtype=dtype,
637 copy=False,
638 allow_2d=allow_2d,
639 )
641 else:
642 _sanitize_non_ordered(data)
File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:659, in sanitize_array(data, index, dtype, copy, allow_2d)
656 subarr = cast(np.ndarray, subarr)
657 subarr = maybe_infer_to_datetimelike(subarr)
--> 659 subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
661 if isinstance(subarr, np.ndarray):
662 # at this point we should have dtype be None or subarr.dtype == dtype
663 dtype = cast(np.dtype, dtype)
File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:718, in _sanitize_ndim(result, data, dtype, index, allow_2d)
716 if allow_2d:
717 return result
--> 718 raise ValueError(
719 f"Data must be 1-dimensional, got ndarray of shape {data.shape} instead"
720 )
721 if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
722 # i.e. NumpyEADtype("O")
724 result = com.asarray_tuplesafe(data, dtype=np.dtype("object"))
ValueError: Data must be 1-dimensional, got ndarray of shape (1459, 1) instead

复制代码

FishC · 发表于 2024-10-26 17:07:21

Terence888 发表于 2024-10-26 17:01
test_data
0    1461
1    1462
2    146 ...

错误是因为 `test_output` 是一个二维张量。需要将其转换为一维数组。修正如下：

# 模型期望输入的是 PyTorch 张量，先将 DataFrame 转换为张量
X_test = X_test.astype(int) # 将布尔型转为整数型
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32) # 转换为 PyTorch 张量
with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten() # 将张量转换为一维 numpy 数组
submission = pd.DataFrame({'Id': test_data, 'SalePrice': test_output_numpy})
submission.to_csv('submission.csv', index=False)
print("Submission file created successfully.")

复制代码

### 修正要点：

1. **使用 `flatten()` 将张量展平**：
- 确保 `test_output` 在转换为 numpy 数组时是一维的。

2. **使用 `test_data` 作为 `Id`**：
- 确保 `test_data` 是一个包含唯一标识符的有效序列（例如 DataFrame 或 Series）。

这样，您的 `submission` DataFrame 将正确映射 `Id` 列和 `SalePrice` 列。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

账号		自动登录	找回密码
密码			立即注册

损失上百万如何解决

浏览过的版块