在深度学习中读取csv数据时遇到的问题
在学习LSTM模型时,读取一个csv数据文件,里面包含有日期时间数据,用pandas读取时报错了:ValueError: could not convert string to float: '2017-01-01 00:00:00'请问是怎么一回事呢?是因为包含空格吗?还是其他字符?请各位高手帮忙解答{:10_266:} 贴下读取的代码看看撒 rsj0315 发表于 2022-2-25 08:11
贴下读取的代码看看撒
df_data_5minute=pd.read_csv('raw_data.csv',error_bad_lines=False,encoding='gbk',lineterminator="\n" ,encoding_errors='ignore')
df_data_5minute.drop('Wind_plant', axis=1, inplace=True)
df=df_data_5minute
#df.drop(labels=['close'], axis=1,inplace = True)
#df.insert(0, 'close', close)
data_train =df.iloc[:30000, :]
data_test = df.iloc
print(data_train.shape, data_test.shape)
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)
output_dim = 1
batch_size = 1000
epochs = 500
seq_len = 5
hidden_size = 128
TIME_STEPS = 5
INPUT_DIM = 14
lstm_units = 64
X_train = np.array( for i in range(data_train.shape - seq_len)])
y_train = np.array( for i in range(data_train.shape- seq_len)])
X_test = np.array( for i in range(data_test.shape- seq_len)])
y_test = np.array( for i in range(data_test.shape - seq_len)])
maxy=y_test.max();
miny=y_test.min();
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
x = Conv1D(filters = 32, kernel_size = 1, activation = 'relu')(inputs)#, padding = 'same'
x = MaxPooling1D(pool_size = 5)(x)
x = Dropout(0.1)(x)
lstm_out = Bidirectional(LSTM(lstm_units, activation='relu'), name='bilstm')(x)
output = Dense(1, activation='sigmoid')(lstm_out)
model = Model(inputs=inputs, outputs=output)
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, shuffle=False)
model.save('model.h5')
#model=load_model('model.h5')
y_pred = model.predict(X_test)
data_train=scaler.inverse_transform(data_train);
data_test=scaler.inverse_transform(data_test);
y_test = np.array( for i in range(data_test.shape - seq_len)])
y_train = np.array( for i in range(data_train.shape- seq_len)])
y_raw=np.hstack((y_train,y_test))
#RMSE
print('MSE Train loss:', model.evaluate(X_train, y_train, batch_size=batch_size))
print('MSE Test loss:', model.evaluate(X_test, y_test, batch_size=batch_size))
Rmse = sqrt(mean_squared_error(y_test, y_pred))
print('RMSE: ', Rmse)
plt.plot(np.arange(len(y_raw)), np.hstack((y_train,y_test)) , 'b', label="Raw Data")
plt.plot(np.arange(len(y_train),len(y_raw)),y_pred,'r', label="Prediction")
plt.legend()
plt.show() 有没有大神帮忙解答一下啊{:10_266:}{:10_266:}{:10_266:} rplt 发表于 2022-2-25 14:54
有没有大神帮忙解答一下啊
把报错信息贴详细点 想把数据直接读取成日期时间的,记得加参数dtype={'时间所在列名称':np.datetime64} isdkz 发表于 2022-2-25 14:56
把报错信息贴详细点
(30000, 20) (5040, 20)
Traceback (most recent call last):
File "C:\Users\lenovo\Desktop\LSTM\Tensorflow-Wind-Power-Prediction-master\Tensorflow-Wind-Power-Prediction-master\main.py", line 32, in <module>
scaler.fit(data_train)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\preprocessing\_data.py", line 416, in fit
return self.partial_fit(X, y)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\preprocessing\_data.py", line 453, in partial_fit
X = self._validate_data(
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\base.py", line 566, in _validate_data
X = check_array(X, **check_params)
File "E:\anaconda\envs\tensorflow\lib\site-packages\sklearn\utils\validation.py", line 746, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "E:\anaconda\envs\tensorflow\lib\site-packages\pandas\core\generic.py", line 1993, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: '2017-01-01 00:00:00' rplt 发表于 2022-2-25 14:54
有没有大神帮忙解答一下啊
df_data_5minute=pd.read_csv('raw_data.csv',error_bad_lines=False,encoding='gbk',lineterminator="\n" ,encoding_errors='ignore') 老哥是直接在这个语句后面加吗 本帖最后由 cflying 于 2022-2-26 17:28 编辑
{:10_277:}你还没懂,对于“2017-01-01 00:00:00”这种数据,pandas默认读出来的是object,也就是python的str字符串,你要是能转换成float运算才奇了怪了,
df_data_5minute=pd.read_csv('raw_data.csv',dtype={'时间所在列名称':np.datetime64})加上这个则可以直接读取成日期时间,把np.date那个换成float则直接为float(劝你别直接试,“2017-01-01 00:00:00”这玩意儿就算你写了float也要报错,没法直接转换的格式肯定要报错),建议你就弄成日期时间进行处理
如果你的数据直接是1.111,2.34等等的话,你不用加dtpye直接读出来就是float,当然,也可以直接dtype={'时间所在列名称':object}读取为object
不信你可以用df.info()。读出来看看 cflying 发表于 2022-2-26 17:23
你还没懂,对于“2017-01-01 00:00:00”这种数据,pandas默认读出来的是object,也就是python的s ...
好的谢谢老哥{:10_266:} 这种关于数据处理我该怎么学啊老哥是找书看 还是上csdn找例子看
页:
[1]