Sunday, March 25, 2018

Update To AHRS with Mag!

Since my previous post on AHRS types, I have made some additional changes to the three systems.  Specifically, I have added the magnetometer data as an update to the yaw in each system.  My technique for adding the mag in each system was derived from this paper by Sebastian Madgwick. 

The idea is that the Earth's magnetic field can be described as having a components in a single horizontal axis and a single vertical axis.  By rotating these components inversely, by our quaternion, we can directly compare our results to the measured magnetic field vector.  This was especially helpful because it simplified the equation for R in the EKF and UKF, because I could directly use square of the rms-noise of the magnetometer sensors as the diagonals of the matrix. 

The toughest part of this problem was deriving each of the equations for Madgwick, the EKF and the UKF.  I would post it my work, but it is extremely tedious.  However, the results are very good and you can see that it clearly improved the yaw from the previous post in all three cases.

This was an extremely exciting result for me to see, because I worked very hard on the solutions.  The one area that I need to work on is when to provide updates to the EKF and UKF.  Currently, I use an accelerometer threshold, just to make sure there is no movement.  However, this causes the system to be very sensitive to slight movements.  Preferably, I would like to make a more robust system for performing updates.  That's all for now!

Tuesday, March 20, 2018

Testing Some AHRS Algorithms
In recent weeks, I have spent some time brushing up on as many types of Attitude and Heading Reference Systems (AHRS) as I can.  I wanted to code them, and compare them in a meaningful way and eventually implement them on an Arduino.  My initial goal has been to work with three types: The Madgwick Filter, an Extended Kalman Filter and an Unscented Kalman Filter.  As of right now, I have each of them working and am able to play back a few different types of csv datasets.  All of my code is on my github.

I should note that I collected data from my phone, which is a Google Pixel, that holds a BMI160 IMU sensor with a 3 axis accelerometer, 3 axis gyroscope, 3 axis magnetometer, a barometer and a temperature sensor.  I am sampling the data using an app called HyperIMU (available on the Google Play Store).  As of right now, I have only integrated the gyroscope and the accelerometer and am working on the magnetometer updates.

All of my implementations are very simple models, that do not yet introduce the gyroscope bias as parameter in the filter.  This will be added later, because it complicates the model.  My states are the quaternion values: w, x, y, z and all operations are done in quaternion space in order to avoid the singularity when the pitch is at 90 degrees.

The Madgwick Filter

The Madgwick Filter is based on this paper by Sebastian Madgwick.  Remarkably, it is a very new algorithm, but has been widely used across many systems.  The idea of this filter is to incorporate updates to the classic gyroscope integration via an optimization assumption.  The initial update is to correct for drift in the pitch and roll directions by taking advantage of the direction of gravity from the accelerometers.  Essentially, the algorithm forms an objective function between the gravitational vector rotated into the frame of the sensor and the acceleration vector in the frame of the sensor.  The idea is that at all times, the acceleration is an approximation of the gravity, even though there may be some acceleration due to movement and noise.  The optimization is solved with a gradient descent solution and is therefore, always attempting to correct any drift originating from the gyroscope in the gravity related directions.  

Here is an image of the results of the Madgwick filter when applied to my phone spinning along the three axes.  This is particular run is using the recommended beta gain value from the paper, however, I have found that setting it to between 0.04 and 0.2, allows it to converge faster and more accurately.   
As you can see in the image, the prediction of roll, pitch and yaw works well. In the roll and pitch directions, you can see that the filter is slowly converging back to 0 degrees.  If I increase the beta value, I can speed up that convergence, but it comes at the cost of factoring in any acceleration that is not due to gravity.  To see the divergence, I decided to compare residual between the estimate and the measurement.  What is interesting to see is what happens when we have actual movement of the phone and how it causes divergence in the filter values. 

The Extended Kalman Filter

The EKF is the standard equation for most estimation problems and it fits well for the AHRS, as well.  Essentially, the EKF is a typical Kalman filter that linearizes the prediction and update equations in order to estimate the uncertainty of each of the states.  The uncertainty is used to weight measurement updates in order to shrink the overall error of the system.  When the sensor is moving with extra acceleration, the gravity updates are far more damaging than they are in the Madgwick filter.  In order to mitigate this problem, I decided to only apply updates when the change in acceleration along all three axes is less than a threshold. This way, we know that the phone is stationary during this period.  In the future, I will work on a more robust way to find allowable update times.

Here is the results of the EKF.  The Euler plot shows fast convergence back to 0 degrees in the pitch and roll after rotations.  We can see a bit more noise in the solution than Madgwick Filter, but faster convergance.  This is probably because the linearized function does not approximate the uncertainty distribution as well.
I have only plotted the residuals when I have done updates.  As you can see, the residual is zero mean and has a nice error distribution after the updates.  This means that the filter is doing its job.

The Unscented Kalman Filter

The UKF was a curious addition to this batch of algorithms.  Typically, a UKF is used if there is an unclear distribution function.  It works by creating a distribution from a few "Sigma Points", which are projections of the system states with a fraction of the noise added back in.  This creates a pseudo space that approximates the distribution of the uncertainty of each state.  

This image was very helpful in my understanding of the UKF.  Basically, a proportion of the standard deviation of the uncertainty is added to each state and then either projected forward in time by the state transition matrix or rotated to the measurement frame.
Here are the results of the UKF.  Again, we are seeing good results in terms of convergence back to zero after large movements in pitch and roll.  We also see that we have somewhat Gaussian Error in the residual.  

Here are the histograms of the Roll and Pitch error.  Both are very Gaussian and has almost exactly the same amount of mean and error as the EKF.  

Now that I have this done, I have a few other things that I want to do to improve my results.  These things include:
  • Yaw correction via Magnetometer
  • Gyroscope bias correction, also with the Magnetometer and possibly the Temperature Sensor
  • Process and Measurement noise improvement via adaptive EKF
  • Implementation in C for realtime estimation on Arduino

Thursday, February 15, 2018

Update On RNN's for Predicting Crypto Prices

This is a Recurrent Neural Network diagram from here

Sporadically, I have been working on this little project to both learn more about recurrent neural networks and build something useful to predict future cryptocurrency prices.  As I talked about before, I have been looking into ways of predicting the price on a rolling basis.  As of right now, I am predicting the next day's price from a history of 6 days before.  Let's take a look at what I did.

Recurrent Neural Networks are a good choice for this type of timeseries because they can incorporate new values and keep track of history in order to make new predictions.  I am using Keras to create the network and here is how I built it:

model = Sequential()
batch_size = 1
model.add(LSTM(4, input_shape=(lookback, 1)))
model.compile(loss='mse', optimizer='rmsprop', metrics=['mae'])

As you can see, I used 4 Long-Short Term Memory blocks and a lookback of 6 days.  I used "rmsprop" as my optimizer because it is essentially a more advanced gradient descent method which is usually fine for regression tasks.  The Loss Metric chosen was Mean Square Error, which is the classic loss function for regression problems.  I am also keeping track of Mean Absolute Error, just to confirm the results.

The data in this example consists of BTC/USD daily closes from January 2016 to February 2018.  This is the plot of that data.
Before training, I scale the data between 0 and 0.9 to account for higher prices in the future, with a Min-Max Scaler from Sci-kit Learn.  In the future, I may try dividing by 1 million instead, to better account for future prices (I don't see it hitting 1 million any time soon, but it could in the future).   Then I split the data into training and testing datasets with a 67% training split.  During the train, I also check a 20% validation set, just to watch how each iteration of the model performs.  I have plotted these values during the train.  This allows me to see at what point the model begins to over-train. We can see this by looking at the point at which the validation loss (MSE) significantly diverges from the training loss.  This is an image of that plot, with the validation loss filtered to discard noise:

In this example, I have trained to 1000 iterations.  It is kind of tough to see the divergence, but it happens around 125 iterations.  I am curious if I were to leave it training for 10,000 iterations, whether there might be a more clear divergence point.  Anyway, if we train to about 125 iterations, we get a result that looks like the one below.  The green line is the prediction of trained data and the red line is the prediction of the untrained portion of the data.  Although the result is clearly worse, I am pretty happy with how well it did.  

The results are as follows: 
- On Trained data the RMSE is 44.67
- On Test data the RMSE is 1342.08

The question is, how can I improve this result?  My initial thoughts are to experiment with different look-back values, and possibly more LSTM blocks.  However, I suspect that the most practical way to improve the result is to also add in open's, high's and low's as features as well.  This may vastly improve the model because it will be able to see momentum and other patterns at each timestep.  This where I will focus next.

Tuesday, February 13, 2018

Some Computer Vision For Kicks

Over the past few weeks, I have been working on a few different projects to broaden my skills and learn about some new technologies.  One area that I have been jumping into is computer vision.  Recently, I have been working my way through the book "OpenCV with Python Blueprints".  Some of the projects I have done so far include building a Cartoonizer and some Feature Matching.  Let me show you!

This is me just after finishing the Cartoonizer.  As you can see, the program cartoonizes live video and shows the result in real-time.  The process involves:
- Applying a bilateral filter in order to smooth flat areas while keeping edges sharp
- Converting the original to grayscale
- Applying a median blur to reduce image noise
- Using an adaptive threshold to detect edges
- Combining the color image with the edge mask

The result is really great!

Another project I just completed is the Feature Matching project.  The idea here is to find robust features of an image, match them in another image and find the transform that converts between the two images.

Here is an examples of what that looks like in practice.  On the left, is the still image that I took from my computer and on the right is a live image of the scene.  The red lines show where the feature in the first frame is located in the second frame. This seemed to work pretty well for very similar scenes, but had some trouble when I changed the scene significantly.  However, it is not unexpected that it would fail on different scenes, because the features are not very similar at all.

Here is how I did it:
- First I used SURF (Speeded-Up Robust Features) to find some distinctive keypoints.
- Then, I used FLANN (Fast Library for Approximate Nearest Neighbors) to check whether the other frame has similar keypoints and then match them.
- Then, using outlier rejection, I was able to pare down the number of features to only the good ones.
-  Finally, using a perspective transform, I was able to warp the second image so that it matched the first (not shown here).

I am currently in the middle of the 3D scene reconstruction project.  This something I have been meaning to do for a long time and I am currently really enjoying working on it.

Tuesday, January 30, 2018

Cryptocurrency Analysis

It's been a while since I last posted, because I was working hard over at Navisens.  After about 3 years, I am now back on the market looking for a new position.  In the meantime, I have started working on another project that has fascinated me for a while.

For about a year now, I have been looking into and trading cryptocurrencies.  I find the whole market exciting to follow and very lucrative (if you do it right).  This is where my new project comes in.   I am building a tool for querying historical cryptocurrency price data in order to analyze and use it for making future price predictions.  My current progress is located on my Github.

To get started, I have built a simple api that uses the phenomenal ccxt api to query from tons of exchanges to build up data. Then, once I have a significant amount of data, I will test some machine learning algorithms on that data.

Here are some questions that I am starting to think about:

- How is one cryptocurrency related to another?  Can I use the data from one crypto to train a classifier/regressor for predicting a different crypto?

- What type of Machine Learning algorithms will work best on this time-series data? Neural Networks? Recurrent Neural Networks?  Decision Trees? Bayesian Estimators?

- What features should I use as inputs to the ML algorithms?  Do I need scaling? (probably)  How many features will be sufficient?

- What should I predict? A new price (regressor)? Whether it will go up or down (classifier)?

As I start to look at these problems more carefully, I will continue to write about the conclusions that I come to.  If you have questions or thoughts, I would love to hear them! Feel free to comment on this post or send me an email!