It’s been a while since my last post about tracking support for the Oculus Rift in February. There’s been big improvements since then – working really well a lot of the time. It’s gone from “If I don’t make any sudden moves, I can finish an easy Beat Saber level” to “You can’t hide from me!” quality.
Equally, there are still enough glitches and corner cases that I think I’ll still be at this a while.
Here’s a video from 3 weeks ago of (not me) playing Beat Saber on Expert+ setting showing just how good things can be now:
Strap in. Here’s what I’ve worked on in the last 6 weeks:
Pose Matching improvements
Most of the biggest improvements have come from improving the computer vision algorithm that’s matching the observed LEDs (blobs) in the camera frames to the 3D models of the devices.
I split the brute-force search algorithm into 2 phases. It now does a first pass looking for ‘obvious’ matches. In that pass, it does a shallow graph search of blobs and their nearest few neighbours against LEDs and their nearest neighbours, looking for a match using a “Strong” match metric. A match is considered strong if expected LEDs match observed blobs to within 1.5 pixels.
Coupled with checks on the expected orientation (matching the Gravity vector detected by the IMU) and the pose prior (expected position and orientation are within predicted error bounds) this short-circuit on the search is hit a lot of the time, and often completes within 1 frame duration.
In the remaining tricky cases, where a deeper graph search is required in order to recover the pose, the initial search reduces the number of LEDs and blobs under consideration, speeding up the remaining search.
I also added an LED size model to the mix – for a candidate pose, it tries to work out how large (in pixels) each LED should appear, and use that as a bound on matching blobs to LEDs. This helps reduce mismatches as devices move further from the camera.
When a brute-force search for pose recovery completes, the system now knows the identity of various blobs in the camera image. One way it avoids a search next time is to transfer the labels into future camera observations using optical-flow tracking on the visible blobs.
The problem is that even sped-up the search can still take a few frame-durations to complete. Previously LED labels would be transferred from frame to frame as they arrived, but there’s now a unique ID associated with each blob that allows the labels to be transferred even several frames later once their identity is known.
IMU Gyro scale
One of the problems with reverse engineering is the guesswork around exactly what different values mean. I was looking into why the controller movement felt “swimmy” under fast motions, and one thing I found was that the interpretation of the gyroscope readings from the IMU was incorrect.
The touch controllers report IMU angular velocity readings directly as a 16-bit signed integer. Previously the code would take the reading and divide by 1024 and use the value as radians/second.
From teardowns of the controller, I know the IMU is an Invensense MPU-6500. From the datasheet, the reported value is actually in degrees per second and appears to be configured for the +/- 2000 °/s range. That yields a calculation of Gyro-rad/s = Gyro-°/s * (2000 / 32768) * (?/180) – or a divisor of 938.734.
The 1024 divisor was under-estimating rotation speed by about 10% – close enough to work until you start moving quickly.
If we don’t find a device in the camera views, the fusion filter predicts motion using the IMU readings – but that quickly becomes inaccurate. In the worst case, the controllers fly off into the distance. To avoid that, I added a limit of 500ms for ‘coasting’. If we haven’t recovered the device pose by then, the position is frozen in place and only rotation is updated until the cameras find it again.
I implemented a 1-Euro exponential smoothing filter on the output poses for each device. This is an idea from the Project Esky driver for Project North Star/Deck-X AR headsets, and almost completely eliminates jitter in the headset view and hand controllers shown to the user. The tradeoff is against introducing lag when the user moves quickly – but there are some tunables in the exponential filter to play with for minimising that. For now I’ve picked some values that seem to work reasonably.
Communications with the touch controllers happens through USB radio command packets sent to the headset. The main use of radio commands in OpenHMD is to read the JSON configuration block for each controller that is programmed in at the factory. The configuration block provides the 3D model of LED positions as well as initial IMU bias values.
Unfortunately, reading the configuration block takes a couple of seconds on startup, and blocks everything while it’s happening. Oculus saw that problem and added a checksum in the controller firmware. You can read the checksum first and if it hasn’t changed use a local cache of the configuration block. Eventually, I’ll implement that caching mechanism for OpenHMD but in the meantime it still reads the configuration blocks on each startup.
As an interim improvement I rewrote the radio communication logic to use a state machine that is checked in the update loop – allowing radio communications to be interleaved without blocking the regularly processing of events. It still interferes a bit, but no longer causes a full multi-second stall as each hand controller turns on.
The hand controllers have haptic feedback ‘rumble’ motors that really add to the immersiveness of VR by letting you sense collisions with objects. Until now, OpenHMD hasn’t had any support for applications to trigger haptic events. I spent a bit of time looking at USB packet traces with Philipp Zabel and we figured out the radio commands to turn the rumble motors on and off.
In the Rift CV1, the haptic motors have a mode where you schedule feedback events into a ringbuffer – effectively they operate like a low frequency audio device. However, that mode was removed for the Rift S (and presumably in the Quest devices) – and deprecated for the CV1.
With that in mind, I aimed for implementing the unbuffered mode, with explicit ‘motor on + frequency + amplitude’ and ‘motor off’ commands sent as needed. Thanks to already having rewritten the radio communications to use a state machine, adding haptic commands was fairly easy.
The big question mark is around what API OpenHMD should provide for haptic feedback. I’ve implemented something simple for now, to get some discussion going. It works really well and adds hugely to the experience. That code is in the https://github.com/thaytan/OpenHMD/tree/rift-haptics branch, with a SteamVR-OpenHMD branch that uses it in https://github.com/thaytan/SteamVR-OpenHMD/tree/controller-haptics-wip
Unexpected tracking losses
I’d say the biggest problem right now is unexpected tracking loss and incorrect pose extractions when I’m not expecting them. Especially my right controller will suddenly glitch and start jumping around. Looking at a video of the debug feed, it’s not obvious why that’s happening:
To fix cases like those, I plan to add code to log the raw video feed and the IMU information together so that I can replay the video analysis frame-by-frame and investigate glitches systematically. Those recordings will also work as a regression suite to test future changes.
Sensor fusion efficiency
The Kalman filter I have implemented works really nicely – it does the latency compensation, predicts motion and extracts sensor biases all in one place… but it has a big downside of being quite expensive in CPU. The Unscented Kalman Filter CPU cost grows at O(n^3) with the size of the state, and the state in this case is 43 dimensional – 22 base dimensions, and 7 per latency-compensation slot. Running 1000 updates per second for the HMD and 500 for each of the hand controllers adds up quickly.
At some point, I want to find a better / cheaper approach to the problem that still provides low-latency motion predictions for the user while still providing the same benefits around latency compensation and bias extraction.
To generate a convincing illusion of objects at a distance in a headset that’s only a few centimetres deep, VR headsets use some interesting optics. The LCD/OLED panels displaying the output get distorted heavily before they hit the users eyes. What the software generates needs to compensate by applying the right inverse distortion to the output video.
Everyone that tests the CV1 notices that the distortion is not quite correct. As you look around, the world warps and shifts annoyingly. Sooner or later that needs fixing. That’s done by taking photos of calibration patterns through the headset lenses and generating a distortion model.
Camera / USB failures
The camera feeds are captured using a custom user-space UVC driver implementation that knows how to set up the special synchronisation settings of the CV1 and DK2 cameras, and then repeatedly schedules isochronous USB packet transfers to receive the video.
Occasionally, some people experience failure to re-schedule those transfers. The kernel rejects them with an out-of-memory error failing to set aside DMA memory (even though it may have been running fine for quite some time). It’s not clear why that happens – but the end result at the moment is that the USB traffic for that camera dies completely and there’ll be no more tracking from that camera until the application is restarted.
Often once it starts happening, it will keep happening until the PC is rebooted and the kernel memory state is reset.
Tracking generally works well when the cameras get a clear shot of each device, but there are cases like sighting down the barrel of a gun where we expect that the user will line up the controllers in front of one another, and in front of the headset. In that case, even though we probably have a good idea where each device is, it can be hard to figure out which LEDs belong to which device.
If we already have a good tracking lock on the devices, I think it should be possible to keep tracking even down to 1 or 2 LEDs being visible – but the pose assessment code will have to be aware that’s what is happening.
April 14th marks 2 years since I first branched off OpenHMD master to start working on CV1 tracking. How hard can it be, I thought? I’ll knock this over in a few months.
Since then I’ve accumulated over 300 commits on top of OpenHMD master that eventually all need upstreaming in some way.
One thing people have expressed as a prerequisite for upstreaming is to try and remove the OpenCV dependency. The tracking relies on OpenCV to do camera distortion calculations, and for their PnP implementation. It should be possible to reimplement both of those directly in OpenHMD with a bit of work – possibly using the fast LambdaTwist P3P algorithm that Philipp Zabel wrote, that I’m already using for pose extraction in the brute-force search.
I’ve picked the top issues to highlight here. https://github.com/thaytan/OpenHMD/issues has a list of all the other things that are still on the radar for fixing eventually.
At some point soon, I plan to put a pin in the CV1 tracking and look at adapting it to more recent inside-out headsets like the Rift S and WMR headsets. I implemented 3DOF support for the Rift S last year, but getting to full positional tracking for that and other inside-out headsets means implementing a SLAM/VIO tracking algorithm to track the headset position.
Once the headset is tracking, the code I’m developing here for CV1 to find and track controllers will hopefully transfer across – the difference with inside-out tracking is that the cameras move around with the headset. Finding the controllers in the actual video feed should work much the same.
This development happens mostly in my spare time and partly as open source contribution time at work at Centricular. I am accepting funding through Github Sponsorships to help me spend more time on it – I’d really like to keep helping Linux have top-notch support for VR/AR applications. Big thanks to the people that have helped get this far.