Oculus Rift CV1 progress

In my last post here, I gave some background on how Oculus Rift devices work, and promised to talk more about Rift S internals. I’ll do that another day – today I want to provide an update on implementing positional tracking for the Rift CV1.

I was working on CV1 support quite a lot earlier in the year, and then I took a detour to get basic Rift S support in place. Now that the Rift S works as a 3DOF device, I’ve gone back to plugging away at getting full positional support on the older CV1 headset.

So, what have I been doing on that front? Back in March, I posted this video of a new tracking algorithm I was working on to match LED constellations to object models to get the pose of the headset and controllers:

The core of this matching is a brute-force search that (somewhat cleverly) takes permutations of observed LEDs in the video footage and tests them against permutations of LEDs from the device models. It then uses an implementation of the Lambda Twist P3P algorithm (courtesy of pH5) to compute the possible poses for each combination of LEDs. Next, it projects the points of the candidate pose to count how many other LED blobs will match LEDs in the 3D model and how closely. Finally, it computes a fitness metric and tracks the ‘best’ pose match for each tracked object.

For the video above, I had the algorithm implemented as a GStreamer plugin that ran offline to process a recording of the device movements. I’ve now merged it back into OpenHMD, so it runs against the live camera data. When it runs in OpenHMD, it also has access to the IMU motion stream, which lets it predict motion between frames – which can help with retaining the tracking lock when the devices move around.

This weekend, I used that code to close the loop between the camera and the IMU. It’s a little simple for now, but is starting to work. What’s in place at the moment is:

  • At startup time, the devices track their movement only as 3DOF devices with default orientation and position.
  • When a camera view gets the first “good” tracking lock on the HMD, it calls that the zero position for the headset, and uses it to compute the pose of the camera relative to the playing space.
  • Camera observations of the position and orientation are now fed back into the IMU fusion to update the position and correct for yaw drift on the IMU (vertical orientation is still taken directly from the IMU detection of gravity)
  • Between camera frames, the IMU fusion interpolates the orientation and position.
  • When a new camera frame arrives, the current interpolated pose is transformed back into the camera’s reference frame and used to test if we still have a visual lock on the device’s LEDs, and to label any newly appearing LEDs if they match the tracked pose
  • The device’s pose is refined using all visible LEDs and fed back to the IMU fusion.

With this simple loop in place, OpenHMD can now track multiple devices, and can do it using multiple cameras – somewhat. The first time the tracking block associated to a camera thinks it has a good lock on the HMD, it uses that to compute the pose of that camera. As long as the lock is genuinely good at that point, and the pose the IMU fusion is tracking is good – then the relative pose between all the cameras is consistent and the tracking is OK. However, it’s easy for that to go wrong and end up with an inconsistency between different camera views that leads to jittery or jumpy tracking….

In the best case, it looks like this:

Which I am pretty happy with 🙂

In that test, I was using a single tracking camera, and had the controller sitting on desk where the camera couldn’t see it, which is why it was just floating there. Despite the fact that SteamVR draws it with a Vive controller model, the various controller buttons and triggers work, but there’s still something weird going on with the controller tracking.

What next? I have a list of known problems and TODO items to tackle:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting.
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

A nice side effect of all this work is that it can all feed in later to Rift S support. The Rift S uses inside-out tracking to determine the headset’s position in the world – but the filtering to combine those observations with the IMU data will be largely the same, and once you know where the headset is, finding and tracking the controller LED constellations still looks a lot like the CV1’s system.

If you want to try it out, or take a look at the code – it’s up on Github. I’m working in the rift-correspondence-search branch of my OpenHMD repository at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search