rplt 发表于 2022-2-25 03:18:19

在深度学习中读取csv数据时遇到的问题

在学习LSTM模型时,读取一个csv数据文件,里面包含有日期时间数据,用pandas读取时报错了:ValueError: could not convert string to float: '2017-01-01 00:00:00'
请问是怎么一回事呢?是因为包含空格吗?还是其他字符?请各位高手帮忙解答{:10_266:}

rsj0315 发表于 2022-2-25 08:11:35

贴下读取的代码看看撒

rplt 发表于 2022-2-25 13:21:47

rsj0315 发表于 2022-2-25 08:11
贴下读取的代码看看撒

df_data_5minute=pd.read_csv('raw_data.csv',error_bad_lines=False,encoding='gbk',lineterminator="\n" ,encoding_errors='ignore')
df_data_5minute.drop('Wind_plant', axis=1, inplace=True)


df=df_data_5minute
#df.drop(labels=['close'], axis=1,inplace = True)
#df.insert(0, 'close', close)
data_train =df.iloc[:30000, :]
data_test = df.iloc
print(data_train.shape, data_test.shape)


scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(data_train)

data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)


output_dim = 1
batch_size = 1000
epochs = 500
seq_len = 5
hidden_size = 128


TIME_STEPS = 5
INPUT_DIM = 14

lstm_units = 64
X_train = np.array( for i in range(data_train.shape - seq_len)])
y_train = np.array( for i in range(data_train.shape- seq_len)])
X_test = np.array( for i in range(data_test.shape- seq_len)])
y_test = np.array( for i in range(data_test.shape - seq_len)])
maxy=y_test.max();
miny=y_test.min();
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

inputs = Input(shape=(TIME_STEPS, INPUT_DIM))

x = Conv1D(filters = 32, kernel_size = 1, activation = 'relu')(inputs)#, padding = 'same'
x = MaxPooling1D(pool_size = 5)(x)
x = Dropout(0.1)(x)

lstm_out = Bidirectional(LSTM(lstm_units, activation='relu'), name='bilstm')(x)

output = Dense(1, activation='sigmoid')(lstm_out)

model = Model(inputs=inputs, outputs=output)

model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, shuffle=False)
model.save('model.h5')
#model=load_model('model.h5')


y_pred = model.predict(X_test)



data_train=scaler.inverse_transform(data_train);
data_test=scaler.inverse_transform(data_test);
y_test = np.array( for i in range(data_test.shape - seq_len)])
y_train = np.array( for i in range(data_train.shape- seq_len)])
y_raw=np.hstack((y_train,y_test))

#RMSE
print('MSE Train loss:', model.evaluate(X_train, y_train, batch_size=batch_size))
print('MSE Test loss:', model.evaluate(X_test, y_test, batch_size=batch_size))
Rmse = sqrt(mean_squared_error(y_test, y_pred))
print('RMSE: ', Rmse)


plt.plot(np.arange(len(y_raw)), np.hstack((y_train,y_test)) , 'b', label="Raw Data")
plt.plot(np.arange(len(y_train),len(y_raw)),y_pred,'r', label="Prediction")
plt.legend()
plt.show()

rplt 发表于 2022-2-25 14:54:09

有没有大神帮忙解答一下啊{:10_266:}{:10_266:}{:10_266:}

isdkz 发表于 2022-2-25 14:56:40

rplt 发表于 2022-2-25 14:54
有没有大神帮忙解答一下啊

把报错信息贴详细点

cflying 发表于 2022-2-25 20:30:25

想把数据直接读取成日期时间的,记得加参数dtype={'时间所在列名称':np.datetime64}

rplt 发表于 2022-2-26 01:47:18

isdkz 发表于 2022-2-25 14:56
把报错信息贴详细点

(30000, 20) (5040, 20)
Traceback (most recent call last):
File "C:\Users\lenovo\Desktop\LSTM\Tensorflow-Wind-Power-Prediction-master\Tensorflow-Wind-Power-Prediction-master\main.py", line 32, in <module>
    scaler.fit(data_train)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\preprocessing\_data.py", line 416, in fit
    return self.partial_fit(X, y)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\preprocessing\_data.py", line 453, in partial_fit
    X = self._validate_data(
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\base.py", line 566, in _validate_data
    X = check_array(X, **check_params)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\utils\validation.py", line 746, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
File "E:\anaconda\envs\tensorflow\lib\site-packages\pandas\core\generic.py", line 1993, in __array__
    return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: '2017-01-01 00:00:00'

rplt 发表于 2022-2-26 13:17:47

rplt 发表于 2022-2-25 14:54
有没有大神帮忙解答一下啊

df_data_5minute=pd.read_csv('raw_data.csv',error_bad_lines=False,encoding='gbk',lineterminator="\n" ,encoding_errors='ignore')   老哥是直接在这个语句后面加吗

cflying 发表于 2022-2-26 17:23:23

本帖最后由 cflying 于 2022-2-26 17:28 编辑

{:10_277:}你还没懂,对于“2017-01-01 00:00:00”这种数据,pandas默认读出来的是object,也就是python的str字符串,你要是能转换成float运算才奇了怪了,


df_data_5minute=pd.read_csv('raw_data.csv',dtype={'时间所在列名称':np.datetime64})加上这个则可以直接读取成日期时间,把np.date那个换成float则直接为float(劝你别直接试,“2017-01-01 00:00:00”这玩意儿就算你写了float也要报错,没法直接转换的格式肯定要报错),建议你就弄成日期时间进行处理

如果你的数据直接是1.111,2.34等等的话,你不用加dtpye直接读出来就是float,当然,也可以直接dtype={'时间所在列名称':object}读取为object

不信你可以用df.info()。读出来看看

rplt 发表于 2022-2-26 22:35:16

cflying 发表于 2022-2-26 17:23
你还没懂,对于“2017-01-01 00:00:00”这种数据,pandas默认读出来的是object,也就是python的s ...

好的谢谢老哥{:10_266:} 这种关于数据处理我该怎么学啊老哥是找书看 还是上csdn找例子看
页: [1]
查看完整版本: 在深度学习中读取csv数据时遇到的问题