Oculus Rift CV1 progress

In my last post here, I gave some background on how Oculus Rift devices work, and promised to talk more about Rift S internals. I’ll do that another day – today I want to provide an update on implementing positional tracking for the Rift CV1.

I was working on CV1 support quite a lot earlier in the year, and then I took a detour to get basic Rift S support in place. Now that the Rift S works as a 3DOF device, I’ve gone back to plugging away at getting full positional support on the older CV1 headset.

So, what have I been doing on that front? Back in March, I posted this video of a new tracking algorithm I was working on to match LED constellations to object models to get the pose of the headset and controllers:

The core of this matching is a brute-force search that (somewhat cleverly) takes permutations of observed LEDs in the video footage and tests them against permutations of LEDs from the device models. It then uses an implementation of the Lambda Twist P3P algorithm (courtesy of pH5) to compute the possible poses for each combination of LEDs. Next, it projects the points of the candidate pose to count how many other LED blobs will match LEDs in the 3D model and how closely. Finally, it computes a fitness metric and tracks the ‘best’ pose match for each tracked object.

For the video above, I had the algorithm implemented as a GStreamer plugin that ran offline to process a recording of the device movements. I’ve now merged it back into OpenHMD, so it runs against the live camera data. When it runs in OpenHMD, it also has access to the IMU motion stream, which lets it predict motion between frames – which can help with retaining the tracking lock when the devices move around.

This weekend, I used that code to close the loop between the camera and the IMU. It’s a little simple for now, but is starting to work. What’s in place at the moment is:

  • At startup time, the devices track their movement only as 3DOF devices with default orientation and position.
  • When a camera view gets the first “good” tracking lock on the HMD, it calls that the zero position for the headset, and uses it to compute the pose of the camera relative to the playing space.
  • Camera observations of the position and orientation are now fed back into the IMU fusion to update the position and correct for yaw drift on the IMU (vertical orientation is still taken directly from the IMU detection of gravity)
  • Between camera frames, the IMU does interpolates the orientation and position.
  • When a new camera frame arrives, the current interpolated pose is transformed back into the camera’s reference frame and used to test if we still have a visual lock on the device’s LEDs, and to label any newly appearing LEDs if they match the tracked pose
  • The device’s pose is refined using all visible LEDs and fed back to the IMU fusion.

With this simple loop in place, OpenHMD can now track multiple devices, and can do it using multiple cameras – somewhat. The first time the tracking block associated to a camera thinks it has a good lock on the HMD, it uses that to compute the pose of that camera. As long as the lock is genuinely good at that point, and the pose the IMU fusion is tracking is good – then the relative pose between all the cameras is consistent and the tracking is OK. However, it’s easy for that to go wrong and end up with an inconsistency between different camera views that leads to jittery or jumpy tracking….

In the best case, it looks like this:

Which I am pretty happy with 🙂

In that test, I was using a single tracking camera, and had the controller sitting on desk where the camera couldn’t see it, which is why it was just floating there. Despite the fact that SteamVR draws it with a Vive controller model, the various controller buttons and triggers work, but there’s still something weird going on with the controller tracking.

What next? I have a list of known problems and TODO items to tackle:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed.
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting.
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test.
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now.
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

A nice side effect of all this work is that it can all feed in later to Rift S support. The Rift S uses inside-out tracking to determine the headset’s position in the world – but the filtering to combine those observations with the IMU data will be largely the same, and once you know where the headset is, finding and tracking the controller LED constellations still looks a lot like the CV1’s system.

If you want to try it out, or take a look at the code – it’s up on Github. I’m working in the rift-correspondence-search branch of my OpenHMD repository at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

OpenHMD and the Oculus Rift

For some time now, I’ve been involved in the OpenHMD project, working on building an open driver for the Oculus Rift CV1, and more recently the newer Rift S VR headsets.

This post is a bit of an overview of how the 2 devices work from a high level for people who might have used them or seen them, but not know much about the implementation. I also want to talk about OpenHMD and how it fits into the evolving Linux VR/AR API stack.

OpenHMD

http://www.openhmd.net/

In short, OpenHMD is a project providing open drivers for various VR headsets through a single simple API. I don’t know of any other project that provides support for as many different headsets as OpenHMD, so it’s the logical place to contribute for largest effect.

OpenHMD is supported as a backend in Monado, and in SteamVR via the SteamVR-OpenHMD plugin. Working drivers in OpenHMD opens up a range of VR games – as well as non-gaming applications like Blender. I think it’s important that Linux and friends not get left behind – in what is basically a Windows-only activity right now.

One downside is that does come with the usual disadvantages of an abstraction API, in that it doesn’t fully expose the varied capabilities of each device, but instead the common denominator. I hope we can fix that in time by extending the OpenHMD API, without losing its simplicity.

Oculus Rift S

I bought an Oculus Rift S in April, to supplement my original consumer Oculus Rift (the CV1) from 2017. At that point, the only way to use it was in Windows via the official Oculus driver as there was no open source driver yet. Since then, I’ve largely reverse engineered the USB protocol for it, and have implemented a basic driver that’s upstream in OpenHMD now.

I find the Rift S a somewhat interesting device. It’s not entirely an upgrade over the older CV1. The build quality, and some of the specifications are actually worse than the original device – but one area that it is a clear improvement is in the tracking system.

CV1 Tracking

The Rift CV1 uses what is called an outside-in tracking system, which has 2 major components. The first is input from Inertial Measurement Units (IMU) on each device – the headset and the 2 hand controllers. The 2nd component is infrared cameras (Rift Sensors) that you space around the room and then run a calibration procedure that lets the driver software calculate their positions relative to the play area.

IMUs provide readings of linear acceleration and angular velocity, which can be used to determine the orientation of a device, but don’t provide absolute position information. You can derive relative motion from a starting point using an IMU, but only over a short time frame as the integration of the readings is quite noisy.

This is where the Rift Sensors get involved. The cameras observe constellations of infrared LEDs on the headset and hand controllers, and use those in concert with the IMU readings to position the devices within the playing space – so that as you move, the virtual world accurately reflects your movements. The cameras and LEDs synchronise to a radio pulse from the headset, and the camera exposure time is kept very short. That means the picture from the camera is completely black, except for very bright IR sources. Hopefully that means only the LEDs are visible, although light bulbs and open windows can inject noise and make the tracking harder.

Rift Sensor view of the CV1 headset and 2 controllers.
Rift Sensor view of the CV1 headset and 2 controllers.

If you have both IMU and camera data, you can build what we call a 6 Degree of Freedom (6DOF) driver. With only IMUs, a driver is limited to providing 3 DOF – allowing you to stand in one place and look around, but not to move.

OpenHMD provides a 3DOF driver for the CV1 at this point, with experimental 6DOF work in a branch in my fork. Getting to a working 6DOF driver is a real challenge. The official drivers from Oculus still receive regular updates to tweak the tracking algorithms.

I have given several presentations about the progress on implementing positional tracking for the CV1. Most recently at Linux.conf.au 2020 in January. There’s a recording at https://www.youtube.com/watch?v=PTHE-cdWN_s if you’re interested, and I plan to talk more about that in a future post.

Rift S Tracking

The Rift S uses Inside Out tracking, which inverts the tracking process by putting the cameras on the headset instead of around the room. With the cameras in fixed positions on the headset, the cameras and their view of the world moves as the user’s head moves. For the Rift S, there are 5 individual cameras pointing outward in different directions to provide (overall) a very wide-angle view of the surroundings.

The role of the tracking algorithm in the driver in this scenario is to use the cameras to look for visual landmarks in the play area, and to combine that information with the IMU readings to find the position of the headset. This is called Visual Inertial Odometry.

There is then a 2nd part to the tracking – finding the position of the hand controllers. This part works the same as on the CV1 – looking for constellations of LED lights on the controllers and matching what you see to a model of the controllers.

This is where I think the tracking gets particularly interesting. The requirements for finding where the headset is in the room, and the goal of finding the controllers require 2 different types of camera view!

To find the landmarks in the room, the vision algorithm needs to be able to see everything clearly and you want a balanced exposure from the cameras. To identify the controllers, you want a very fast exposure synchronised with the bright flashes from the hand controller LEDs – the same as when doing CV1 tracking.

The Rift S satisfies both requirements by capturing alternating video frames with fast and normal exposures. Each time, it captures the 5 cameras simultaneously and stitches them together into 1 video frame to deliver over USB to the host computer. The driver then needs to split each frame according to whether it is a normal or fast exposure and dispatch it to the appropriate part of the tracking algorithm.

Rift S – normal room exposure for Visual Inertial Odometry.
Rift S – fast exposure with IR LEDs for controller tracking.

There are a bunch of interesting things to notice in these camera captures:

  • Each camera view is inserted into the frame in some native orientation, and requires external information to make use of the information in them
  • The cameras have a lot of fisheye distortion that will need correcting.
  • In the fast exposure frame, the light bulbs on my ceiling are hard to tell apart from the hand controller LEDs – another challenge for the computer vision algorithm.
  • The cameras are Infrared only, which is why the Rift S passthrough view (if you’ve ever seen it) is in grey-scale.
  • The top 16-pixels of each frame contain some binary data to help with frame identification. I don’t know how to interpret the contents of that data yet.

Status

This blog post is already too long, so I’ll stop here. In part 2, I’ll talk more about deciphering the Rift S protocol.

Thanks for reading! If you have any questions, hit me up at mailto:thaytan@noraisin.net or @thaytan on Twitter

New gst-rpicamsrc features

I’ve pushed some new changes to my Raspberry Pi camera GStreamer wrapper, at https://github.com/thaytan/gst-rpicamsrc/

These bring the GStreamer element up to date with new features added to raspivid since I first started the project, such as adding text annotations to the video, support for the 2nd camera on the compute module, intra-refresh and others.

Where possible, you can now dynamically update any of the properties – where the firmware supports it. So you can implement digital zoom by adjusting the region-of-interest (roi) properties on the fly, or update the annotation or change video effects and colour balance, for example.

The timestamps produced are now based on the internal STC of the Raspberry Pi, so the audio video sync is tighter. Although it was never terrible, it’s now more correct and slightly less jittery.

The one major feature I haven’t enabled as yet is stereoscopic handling. Stereoscopic capture requires 2 cameras attached to a Raspberry Pi Compute Module, so at the moment I have no way to test it works.

I’m also working on GStreamer stereoscopic handling in general (which is coming along). I look forward to releasing some of that code soon.

 

Network clock examples

Way back in 2006, Andy Wingo wrote some small scripts for GStreamer 0.10 to demonstrate what was (back then) a fairly new feature in GStreamer – the ability to share a clock across the network and use it to synchronise playback of content across different machines.

Since GStreamer 1.x has been out for over 2 years, and we get a lot of questions about how to use the network clock functionality, it’s a good time for an update. I’ve ported the simple examples for API changes and to use the gobject-introspection based Python bindings and put them up on my server.

To give it a try, fetch play-master.py and play-slave.py onto 2 or more computers with GStreamer 1 installed. You need a media file accessible via some URI to all machines, so they have something to play.

Then, on one machine run play-master.py, passing a URI for it to play and a port to publish the clock on:

./play-master.py http://server/path/to/file 8554

The script will print out a command line like so:

Start slave as: python ./play-slave.py http://server/path/to/file [IP] 8554 1071152650838999

On another machine(s), run the printed command, substituting the IP address of the machine running the master script.

After a moment or two, the slaved machine should start playing the file in synch with the master:

Network Synchronised Playback

If they’re not in sync, check that you have the port you chose open for UDP traffic so the clock synchronisation packets can be transferred.

This basic technique is the core of my Aurena home media player system, which builds on top of the network clock mechanism to provide file serving and a simple shuffle playlist.

For anyone still interested in GStreamer 0.10 – Andy’s old scripts can be found on his server: play-master.py and play-slave.py

2014 GStreamer Conference

I’ve been home from Europe over a week, after heading to Germany for the annual GStreamer conference and Linuxcon Europe.

We had a really great turnout for the GStreamer conference this year

GstConf2k14

as well as an amazing schedule of talks. All the talks were recorded by Ubicast, who got all the videos edited and uploaded in record time. The whole conference is available for viewing at http://gstconf.ubicast.tv/channels/#gstreamer-conference-2014

I gave one of the last talks of the schedule – about my current work adding support for describing and handling stereoscopic (3D) video. That support should land upstream sometime in the next month or two, so more on that in a bit.

elephant

There were too many great talks to mention them individually, but I was excited by 3 strong themes across the talks:

  • WebRTC/HTML5/Web Streaming support
  • Improving performance and reducing resource usage
  • Building better development and debugging tools

I’m looking forward to us collectively making progress on all those things and more in the upcoming year.

Mysterious Parcel

I received a package in the mail today!
Mysterious Package

Everything arrived all nicely packaged up in a hobby box and ready for assembly.
Opening the box

Lots of really interesting goodies in the box!
Out on the table

After a little while, I’ve got the first part together.First part assembled

The rest will have to wait for another day. In the meantime, have fun guessing what it is, and enjoy this picture of a cake I baked on the weekend:
Strawberry Sponge Cake

See you later!

DVD playback in GStreamer 1.0

Some time in 2012, the GStreamer team was busy working toward the GStreamer 1.0 major release. Along the way, I did my part and ported the DVD playback components from 0.10. DVD is a pretty complex playback scenario (let’s not talk about Blu-ray)

I gave a talk about it at the GStreamer conference way back in 2010 – video here. Apart from the content of that talk, the thing I liked most was that I used Totem as my presentation tool 🙂

With all the nice changes that GStreamer 1.0 brought, DVD playback worked better than ever. I was able to delete a bunch of hacks and workarounds from the 0.10 days. There have been some bugs, but mostly minor things. Recently though, I became aware of a whole class of DVDs that didn’t work for a very silly reason. The symptom was that particular discs would error out at the start with a cryptic “The stream is in the wrong format” message.

It turns out that these are DVDs that begin with a piece of video that has no sound.

Sometimes, that’s implemented on a disc as a video track with accompanying silence, but in the case that was broken the DVDs have no audio track for that initial section at all. For a normal file, GStreamer would handle that by not creating any audio decoder chain or audio sink output element and just decode and play video. For DVD though, there are very few discs that are entirely without audio – so we’re going to need the audio decoder chain sooner or later. There’s no point creating and destroying when the audio track appears and disappears.

Accordingly, we create an audio output pad, and GStreamer plugs in a suitable audio output sink, and then nothing happens because the pipeline can’t get to the Playing state – the pipeline is stuck in the Paused state. Before a pipeline can start playing, it has to progress through Ready and Paused and then to Playing state. The key to getting from Paused to Playing is that each output element (video sink and audio sink) in our case, has to receive some data and be ready to output it. A process called Pre-roll. Pre-rolling the pipeline avoids stuttering at the start, because otherwise the decoders would have to race to try and deliver something in time for it to get on screen.

With no audio track, there’s no actual audio packets to deliver, and the audio sink can’t Pre-roll. The solution in GStreamer 1.0 is a GAP event, sent to indicate that there is a space in the data, and elements should do whatever they need to to skip or fill it. In the audio sink’s case it should handle it by considering itself Pre-rolled and allowing the pipeline to go to Playing, starting the ring buffer and the audio clock – from which the rest of the pipeline will be timed.

Everything up to that point was working OK – the sink received the GAP event… and then errored out. It expects to be told what format the audio samples it’s receiving are so it knows how to fill in the gap… when there’s no audio track and no audio data, it was never being told.

In the end, the fix was to make the dummy place-holder audio decoder choose an audio sample format if it gets a GAP event and hasn’t received any data yet – any format, it doesn’t really matter as long as it’s reasonable. It’ll be discarded and a new format selected and propagated when some audio data really is encountered later in playback.

That fix is #c24a12 – later fixed up a bit by thiagoss to add the ‘sensible’ part to format selection. The initial commit liked to choose a samplerate of 1Hz 🙂

If you have any further bugs in your GStreamer DVD playback, please let us know!

Proof of life – A New Adventure!

Hi world! It’s been several years since I used this blog, and there’s been a lot of things happen to us since then. I don’t even live on the same continent as I did.

More on that in a future post. Today, I have an announcement to make – a new Open Source company! Together with fellow GStreamer hackers Tim-Philipp Müller and Sebastian Dröge, I have founded a new company: Centricular Ltd.

From 2007 until July, I was working at Oracle on Sun Ray thin client firmware. Oracle shut down the project in July, and my job along with it – opening up this excellent opportunity to try something I’ve wanted for a while and start a business, while getting back to Free Software full time.

Our website has more information about the Open Source technologies and services we plan to offer. This list is not complete and we will try to broaden it over time, so if you have anything interesting that is not listed there but you think we can help with, get in touch

As Centricular’s first official contribution to the software pool, here’s my Raspberry Pi Camera GStreamer module. It wraps code from Raspivid to allow direct capture from the official camera module and hardware encoding to H.264 in a GStreamer pipeline – without the shell pipe and fdsrc hack people have been using to date. Take a look at the README for more information.

Raspberry Pi Camera GStreamer element

Sebastian, Tim and I will be at the GStreamer Conference in Edinburgh next week.

A glimpse of audio nirvana

This is post is basically a love letter to the Pulseaudio and Gnome Bluetooth developers.

I upgraded my laptop to Ubuntu Karmic recently, which brought with it the ability to use my Bluetooth A2DP headphones natively. Getting them running is now as simple as using the Bluetooth icon in the panel to pair the laptop with the headphones, and then selecting them in the Sound Preferences applet, on the Output tab.

As soon as the headphones are connected, they show up as a new audio device. Selecting it instantly (and seamlessly) migrates my sounds and music from the laptop sound device onto the headphones. The Play/Pause, Next Track and Previous Track buttons all generate media key keypresses – so Rhythmbox and Totem behave like they’re supposed to. It’s lovely.

If that we’re all, it would already be pretty sweet in my opinion, but wait – there’s more!

A few days after starting to use my bluetooth headphones, my wife and I took a trip to Barcelona (from Dublin, where we live for the next few weeks… more on that later). When we got to the airport, the first thing we learned was that our flight had been delayed by 3 hours. Since I occasionally hack on multimedia related things, I typically have a few DVDs
with me for testing. In this case, I had Vicky Christina Barcelona on hand, and we hadn’t watched it yet – a perfect choice for 2 people on their way to Barcelona.

Problem! There are four sets of ears wanting to listen to the DVD, and only 2 audio channels produced. I could choose to send the sound to either the in built sound device, and listen on the earbuds my wife had, or I could send it to my bluetooth headphones, but not both.

Pulseaudio to the rescue! With a bit of command-line fu (no GUI for this, but that’s totally do-able), I created a virtual audio device, using Pulseaudio’s “combine” module. Like the name suggests, it combines multiple other audio devices into a single one. It can do more complex combinations (such as sending some channels hither and others thither), but I just needed a straight mirroring of the devices. In a terminal, I ran:

pactl load-module module-combine sink_name=shared_play adjust_time=3 slaves=”alsa_output.pci-0000_00_1b.0.analog-stereo,bluez_sink.00_15_0F_72_70_E1″

Hey presto! Now there’s a third audio device available in the Sound Preferences to send the sound to, and it comes out both the wired ear buds and my bluetooth headphones (with a very slight sync offset, but close enough for my purposes).

Also, for those interested – the names of the 2 audio devices in my pactl command line came from the output of ‘pactl list’.

This kind of seamless migration of running audio streams really isn’t possible to do without something like Pulseaudio that can manage stream routing on the fly. I’m well aware that Pulseaudio integration into the distributions has been a bumpy ride for lots of people, but I think the end goal justifies the painful process of fixing all the sound drivers. I hope you do too!

edit
Lennart points out that the extra paprefs application has a “Add virtual output device for simultaneous output on all local sound cards” check-box that does the same thing as loading the combine module manual, but also handles hot-plugging of devices as they appear and disappear.

OSSbarcamp 2 – GNOME 3.0 talk

I gave a talk at the second Dublin OSSbarcamp yesterday. My goal was to provide some insight into the goals for GNOME 3.0 for people who didn’t attend GCDS.

Actually, the credit for the entire talk goes to Vincent and friends, who gave the GNOME 3.0 overview during the GUADEC opening at GCDS and to Owen for his GNOME Shell talk. I stole content from their slides shamelessly.

The slides are available in ODP form, or as a PDF