This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.
If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.
As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.
There are some obvious next things to work on:
It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
The code that decides whether an LED is expected to be visible in a given pose can use some improving.
Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.
This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD
In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.
Here’s the updated list:
The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
Multiple cameras interfere with each other.
Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.Partially Fixed
At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.Much Improved
Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.Fixed
Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
Other things I can’t think of right now.
As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:
1 thread shared between all sensors to capture USB packets and assemble them into frames
1 thread per sensor to analyse the frame and update poses
The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.
I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.
I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.
Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.
Some other bits:
Fixing numerical overflow issues in the OpenHMD maths routines
Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
A 1st order estimate of device velocity to help improve the next predicted position
I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.