Thursday, February 15, 2018

Update On RNN's for Predicting Crypto Prices

This is a Recurrent Neural Network diagram from here

Sporadically, I have been working on this little project to both learn more about recurrent neural networks and build something useful to predict future cryptocurrency prices.  As I talked about before, I have been looking into ways of predicting the price on a rolling basis.  As of right now, I am predicting the next day's price from a history of 6 days before.  Let's take a look at what I did.

Recurrent Neural Networks are a good choice for this type of timeseries because they can incorporate new values and keep track of history in order to make new predictions.  I am using Keras to create the network and here is how I built it:

lookback=6
model = Sequential()
batch_size = 1
model.add(LSTM(4, input_shape=(lookback, 1)))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop', metrics=['mae'])

As you can see, I used 4 Long-Short Term Memory blocks and a lookback of 6 days.  I used "rmsprop" as my optimizer because it is essentially a more advanced gradient descent method which is usually fine for regression tasks.  The Loss Metric chosen was Mean Square Error, which is the classic loss function for regression problems.  I am also keeping track of Mean Absolute Error, just to confirm the results.

The data in this example consists of BTC/USD daily closes from January 2016 to February 2018.  This is the plot of that data.
Before training, I scale the data between 0 and 0.9 to account for higher prices in the future, with a Min-Max Scaler from Sci-kit Learn.  In the future, I may try dividing by 1 million instead, to better account for future prices (I don't see it hitting 1 million any time soon, but it could in the future).   Then I split the data into training and testing datasets with a 67% training split.  During the train, I also check a 20% validation set, just to watch how each iteration of the model performs.  I have plotted these values during the train.  This allows me to see at what point the model begins to over-train. We can see this by looking at the point at which the validation loss (MSE) significantly diverges from the training loss.  This is an image of that plot, with the validation loss filtered to discard noise:


In this example, I have trained to 1000 iterations.  It is kind of tough to see the divergence, but it happens around 125 iterations.  I am curious if I were to leave it training for 10,000 iterations, whether there might be a more clear divergence point.  Anyway, if we train to about 125 iterations, we get a result that looks like the one below.  The green line is the prediction of trained data and the red line is the prediction of the untrained portion of the data.  Although the result is clearly worse, I am pretty happy with how well it did.  


The results are as follows: 
- On Trained data the RMSE is 44.67
- On Test data the RMSE is 1342.08

The question is, how can I improve this result?  My initial thoughts are to experiment with different look-back values, and possibly more LSTM blocks.  However, I suspect that the most practical way to improve the result is to also add in open's, high's and low's as features as well.  This may vastly improve the model because it will be able to see momentum and other patterns at each timestep.  This where I will focus next.