损失上百万如何解决,Python交流,编程语言专区,鱼C论坛

Terence888 发表于 2024-10-26 16:20:31

FishC 发表于 2024-10-26 16:02
出现这个错误的原因是数据框 `X_test` 仍然包含一些无法转换为数值型的数据。我们需要对 `X_test` 做进 ...

MSSubClass             float64
LotArea                float64
OverallQual          float64
OverallCond          float64
YearBuilt             float64
                     ...
SaleCondition_AdjLand    bool
SaleCondition_Alloca    bool
SaleCondition_Family    bool
SaleCondition_Normal    bool
SaleCondition_Partial    bool
Length: 153, dtype: object

以上是结果，说明数据类型都是数值型了吗

FishC 发表于 2024-10-26 16:20:39

Terence888 发表于 2024-10-26 16:20
MSSubClass float64
LotArea ...

您提供的结果显示，除了 `bool` 类型外，其他都是 `float64` 类型，这些都可以被转换为数值类型。`bool` 类型的列也可以直接转换为 `int` 类型。

在将数据转换为 PyTorch 张量之前，可以按以下步骤进行处理：

# 将布尔型转为整数型
X_test = X_test.astype(int)

# 转换为 PyTorch 张量
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)

# 进行预测
with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten()# 将张量转换为 numpy 数组

# 假设 X_test 有一个 Id 列用于唯一标识，如果没有需要自行定义
submission = pd.DataFrame({'Id': X_test.index, 'SalePrice': test_output_numpy})
submission.to_csv('submission.csv', index=False)

print("Submission file created successfully.")

这样，可以确保您的数据框中，所有的布尔型和浮点型数据都不会导致类型问题。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

Terence888 发表于 2024-10-26 17:01:51

FishC 发表于 2024-10-26 16:20
您提供的结果显示，除了 `bool` 类型外，其他都是 `float64` 类型，这些都可以被转换为数值类型。`bool ...

test_data['Id']
0    1461
1    1462
2    1463
3    1464
4    1465
   ...
1454 2915
1455 2916
1456 2917
1457 2918
1458 2919
Name: Id, Length: 1459, dtype: int64
test_data的Id值如何作为新dataframe的id列，以下代码存在报错
#模型期望输入的是PyTorch 张量,先将 DataFrame 转换为张量

X_test = X_test.astype(int)# 将布尔型转为整数型
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)# 转换为 PyTorch 张量

with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten() # 将张量转换为 numpy 数组
submission = pd.DataFrame({'Id': test_data['Id'].values, 'SalePrice': test_output})
#submission.to_csv('submission.csv', index=False)

错误如下：---------------------------------------------------------------------------
ValueError                            Traceback (most recent call last)
Cell In, line 9
   7 test_output = dnn_model(X_test_tensor)
   8 test_output_numpy = test_output.numpy().flatten() # 将张量转换为 numpy 数组
----> 9 submission = pd.DataFrame({'Id': test_data['Id'].values, 'SalePrice': test_output})
10 #submission.to_csv('submission.csv', index=False)

File /opt/conda/lib/python3.10/site-packages/pandas/core/frame.py:778, in DataFrame.__init__(self, data, index, columns, dtype, copy)
772 mgr = self._init_mgr(
773       data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
774 )
776 elif isinstance(data, dict):
777 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 778 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
779 elif isinstance(data, ma.MaskedArray):
780 from numpy.ma import mrecords

File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:503, in dict_to_mgr(data, index, columns, dtype, typ, copy)
499 else:
500       # dtype check to exclude e.g. range objects, scalars
501       arrays =
--> 503 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)

File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:119, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
116       index = ensure_index(index)
118 # don't force copy because getting jammed in an ndarray anyway
--> 119 arrays, refs = _homogenize(arrays, index, dtype)
120 # _homogenize ensures
121 #- all(len(x) == len(index) for x in arrays)
122 #- all(x.ndim == 1 for x in arrays)
(...)
125
126 else:
127 index = ensure_index(index)

File /opt/conda/lib/python3.10/site-packages/pandas/core/internals/construction.py:629, in _homogenize(data, index, dtype)
626       val = dict(val)
627 val = lib.fast_multiget(val, oindex._values, default=np.nan)
--> 629 val = sanitize_array(val, index, dtype=dtype, copy=False)
630 com.require_length_match(val, index)
631 refs.append(None)

File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:633, in sanitize_array(data, index, dtype, copy, allow_2d)
631 else:
632       data = np.array(data, copy=copy)
--> 633 return sanitize_array(
634       data,
635       index=index,
636       dtype=dtype,
637       copy=False,
638       allow_2d=allow_2d,
639 )
641 else:
642 _sanitize_non_ordered(data)

File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:659, in sanitize_array(data, index, dtype, copy, allow_2d)
656          subarr = cast(np.ndarray, subarr)
657          subarr = maybe_infer_to_datetimelike(subarr)
--> 659 subarr = _sanitize_ndim(subarr, data, dtype, index, allow_2d=allow_2d)
661 if isinstance(subarr, np.ndarray):
662 # at this point we should have dtype be None or subarr.dtype == dtype
663 dtype = cast(np.dtype, dtype)

File /opt/conda/lib/python3.10/site-packages/pandas/core/construction.py:718, in _sanitize_ndim(result, data, dtype, index, allow_2d)
716 if allow_2d:
717       return result
--> 718 raise ValueError(
719       f"Data must be 1-dimensional, got ndarray of shape {data.shape} instead"
720 )
721 if is_object_dtype(dtype) and isinstance(dtype, ExtensionDtype):
722 # i.e. NumpyEADtype("O")
724 result = com.asarray_tuplesafe(data, dtype=np.dtype("object"))

ValueError: Data must be 1-dimensional, got ndarray of shape (1459, 1) instead

FishC 发表于 2024-10-26 17:07:21

Terence888 发表于 2024-10-26 17:01
test_data
0    1461
1    1462
2    146 ...

错误是因为 `test_output` 是一个二维张量。需要将其转换为一维数组。修正如下：

# 模型期望输入的是 PyTorch 张量，先将 DataFrame 转换为张量
X_test = X_test.astype(int)# 将布尔型转为整数型
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)# 转换为 PyTorch 张量

with torch.no_grad():
test_output = dnn_model(X_test_tensor)
test_output_numpy = test_output.numpy().flatten()# 将张量转换为一维 numpy 数组
submission = pd.DataFrame({'Id': test_data, 'SalePrice': test_output_numpy})
submission.to_csv('submission.csv', index=False)

print("Submission file created successfully.")

### 修正要点：

1. **使用 `flatten()` 将张量展平**：
- 确保 `test_output` 在转换为 numpy 数组时是一维的。

2. **使用 `test_data` 作为 `Id`**：
- 确保 `test_data` 是一个包含唯一标识符的有效序列（例如 DataFrame 或 Series）。

这样，您的 `submission` DataFrame 将正确映射 `Id` 列和 `SalePrice` 列。

以上回复来自 FishC AI 小助理，如未能正确解答您的问题，请继续追问。

页: 1 [2]

鱼C论坛's Archiver