OpenHMD and the Oculus Rift

For some time now, I’ve been involved in the OpenHMD project, working on building an open driver for the Oculus Rift CV1, and more recently the newer Rift S VR headsets.

This post is a bit of an overview of how the 2 devices work from a high level for people who might have used them or seen them, but not know much about the implementation. I also want to talk about OpenHMD and how it fits into the evolving Linux VR/AR API stack.

OpenHMD

http://www.openhmd.net/

In short, OpenHMD is a project providing open drivers for various VR headsets through a single simple API. I don’t know of any other project that provides support for as many different headsets as OpenHMD, so it’s the logical place to contribute for largest effect.

OpenHMD is supported as a backend in Monado, and in SteamVR via the SteamVR-OpenHMD plugin. Working drivers in OpenHMD opens up a range of VR games – as well as non-gaming applications like Blender. I think it’s important that Linux and friends not get left behind – in what is basically a Windows-only activity right now.

One downside is that does come with the usual disadvantages of an abstraction API, in that it doesn’t fully expose the varied capabilities of each device, but instead the common denominator. I hope we can fix that in time by extending the OpenHMD API, without losing its simplicity.

Oculus Rift S

I bought an Oculus Rift S in April, to supplement my original consumer Oculus Rift (the CV1) from 2017. At that point, the only way to use it was in Windows via the official Oculus driver as there was no open source driver yet. Since then, I’ve largely reverse engineered the USB protocol for it, and have implemented a basic driver that’s upstream in OpenHMD now.

I find the Rift S a somewhat interesting device. It’s not entirely an upgrade over the older CV1. The build quality, and some of the specifications are actually worse than the original device – but one area that it is a clear improvement is in the tracking system.

CV1 Tracking

The Rift CV1 uses what is called an outside-in tracking system, which has 2 major components. The first is input from Inertial Measurement Units (IMU) on each device – the headset and the 2 hand controllers. The 2nd component is infrared cameras (Rift Sensors) that you space around the room and then run a calibration procedure that lets the driver software calculate their positions relative to the play area.

IMUs provide readings of linear acceleration and angular velocity, which can be used to determine the orientation of a device, but don’t provide absolute position information. You can derive relative motion from a starting point using an IMU, but only over a short time frame as the integration of the readings is quite noisy.

This is where the Rift Sensors get involved. The cameras observe constellations of infrared LEDs on the headset and hand controllers, and use those in concert with the IMU readings to position the devices within the playing space – so that as you move, the virtual world accurately reflects your movements. The cameras and LEDs synchronise to a radio pulse from the headset, and the camera exposure time is kept very short. That means the picture from the camera is completely black, except for very bright IR sources. Hopefully that means only the LEDs are visible, although light bulbs and open windows can inject noise and make the tracking harder.

Rift Sensor view of the CV1 headset and 2 controllers.
Rift Sensor view of the CV1 headset and 2 controllers.

If you have both IMU and camera data, you can build what we call a 6 Degree of Freedom (6DOF) driver. With only IMUs, a driver is limited to providing 3 DOF – allowing you to stand in one place and look around, but not to move.

OpenHMD provides a 3DOF driver for the CV1 at this point, with experimental 6DOF work in a branch in my fork. Getting to a working 6DOF driver is a real challenge. The official drivers from Oculus still receive regular updates to tweak the tracking algorithms.

I have given several presentations about the progress on implementing positional tracking for the CV1. Most recently at Linux.conf.au 2020 in January. There’s a recording at https://www.youtube.com/watch?v=PTHE-cdWN_s if you’re interested, and I plan to talk more about that in a future post.

Rift S Tracking

The Rift S uses Inside Out tracking, which inverts the tracking process by putting the cameras on the headset instead of around the room. With the cameras in fixed positions on the headset, the cameras and their view of the world moves as the user’s head moves. For the Rift S, there are 5 individual cameras pointing outward in different directions to provide (overall) a very wide-angle view of the surroundings.

The role of the tracking algorithm in the driver in this scenario is to use the cameras to look for visual landmarks in the play area, and to combine that information with the IMU readings to find the position of the headset. This is called Visual Inertial Odometry.

There is then a 2nd part to the tracking – finding the position of the hand controllers. This part works the same as on the CV1 – looking for constellations of LED lights on the controllers and matching what you see to a model of the controllers.

This is where I think the tracking gets particularly interesting. The requirements for finding where the headset is in the room, and the goal of finding the controllers require 2 different types of camera view!

To find the landmarks in the room, the vision algorithm needs to be able to see everything clearly and you want a balanced exposure from the cameras. To identify the controllers, you want a very fast exposure synchronised with the bright flashes from the hand controller LEDs – the same as when doing CV1 tracking.

The Rift S satisfies both requirements by capturing alternating video frames with fast and normal exposures. Each time, it captures the 5 cameras simultaneously and stitches them together into 1 video frame to deliver over USB to the host computer. The driver then needs to split each frame according to whether it is a normal or fast exposure and dispatch it to the appropriate part of the tracking algorithm.

Rift S – normal room exposure for Visual Inertial Odometry.
Rift S – fast exposure with IR LEDs for controller tracking.

There are a bunch of interesting things to notice in these camera captures:

  • Each camera view is inserted into the frame in some native orientation, and requires external information to make use of the information in them
  • The cameras have a lot of fisheye distortion that will need correcting.
  • In the fast exposure frame, the light bulbs on my ceiling are hard to tell apart from the hand controller LEDs – another challenge for the computer vision algorithm.
  • The cameras are Infrared only, which is why the Rift S passthrough view (if you’ve ever seen it) is in grey-scale.
  • The top 16-pixels of each frame contain some binary data to help with frame identification. I don’t know how to interpret the contents of that data yet.

Status

This blog post is already too long, so I’ll stop here. In part 2, I’ll talk more about deciphering the Rift S protocol.

Thanks for reading! If you have any questions, hit me up at mailto:thaytan@noraisin.net or @thaytan on Twitter

New gst-rpicamsrc features

I’ve pushed some new changes to my Raspberry Pi camera GStreamer wrapper, at https://github.com/thaytan/gst-rpicamsrc/

These bring the GStreamer element up to date with new features added to raspivid since I first started the project, such as adding text annotations to the video, support for the 2nd camera on the compute module, intra-refresh and others.

Where possible, you can now dynamically update any of the properties – where the firmware supports it. So you can implement digital zoom by adjusting the region-of-interest (roi) properties on the fly, or update the annotation or change video effects and colour balance, for example.

The timestamps produced are now based on the internal STC of the Raspberry Pi, so the audio video sync is tighter. Although it was never terrible, it’s now more correct and slightly less jittery.

The one major feature I haven’t enabled as yet is stereoscopic handling. Stereoscopic capture requires 2 cameras attached to a Raspberry Pi Compute Module, so at the moment I have no way to test it works.

I’m also working on GStreamer stereoscopic handling in general (which is coming along). I look forward to releasing some of that code soon.

 

Network clock examples

Way back in 2006, Andy Wingo wrote some small scripts for GStreamer 0.10 to demonstrate what was (back then) a fairly new feature in GStreamer – the ability to share a clock across the network and use it to synchronise playback of content across different machines.

Since GStreamer 1.x has been out for over 2 years, and we get a lot of questions about how to use the network clock functionality, it’s a good time for an update. I’ve ported the simple examples for API changes and to use the gobject-introspection based Python bindings and put them up on my server.

To give it a try, fetch play-master.py and play-slave.py onto 2 or more computers with GStreamer 1 installed. You need a media file accessible via some URI to all machines, so they have something to play.

Then, on one machine run play-master.py, passing a URI for it to play and a port to publish the clock on:

./play-master.py http://server/path/to/file 8554

The script will print out a command line like so:

Start slave as: python ./play-slave.py http://server/path/to/file [IP] 8554 1071152650838999

On another machine(s), run the printed command, substituting the IP address of the machine running the master script.

After a moment or two, the slaved machine should start playing the file in synch with the master:

Network Synchronised Playback

If they’re not in sync, check that you have the port you chose open for UDP traffic so the clock synchronisation packets can be transferred.

This basic technique is the core of my Aurena home media player system, which builds on top of the network clock mechanism to provide file serving and a simple shuffle playlist.

For anyone still interested in GStreamer 0.10 – Andy’s old scripts can be found on his server: play-master.py and play-slave.py

2014 GStreamer Conference

I’ve been home from Europe over a week, after heading to Germany for the annual GStreamer conference and Linuxcon Europe.

We had a really great turnout for the GStreamer conference this year

GstConf2k14

as well as an amazing schedule of talks. All the talks were recorded by Ubicast, who got all the videos edited and uploaded in record time. The whole conference is available for viewing at http://gstconf.ubicast.tv/channels/#gstreamer-conference-2014

I gave one of the last talks of the schedule – about my current work adding support for describing and handling stereoscopic (3D) video. That support should land upstream sometime in the next month or two, so more on that in a bit.

elephant

There were too many great talks to mention them individually, but I was excited by 3 strong themes across the talks:

  • WebRTC/HTML5/Web Streaming support
  • Improving performance and reducing resource usage
  • Building better development and debugging tools

I’m looking forward to us collectively making progress on all those things and more in the upcoming year.

Mysterious Parcel

I received a package in the mail today!
Mysterious Package

Everything arrived all nicely packaged up in a hobby box and ready for assembly.
Opening the box

Lots of really interesting goodies in the box!
Out on the table

After a little while, I’ve got the first part together.First part assembled

The rest will have to wait for another day. In the meantime, have fun guessing what it is, and enjoy this picture of a cake I baked on the weekend:
Strawberry Sponge Cake

See you later!

DVD playback in GStreamer 1.0

Some time in 2012, the GStreamer team was busy working toward the GStreamer 1.0 major release. Along the way, I did my part and ported the DVD playback components from 0.10. DVD is a pretty complex playback scenario (let’s not talk about Blu-ray)

I gave a talk about it at the GStreamer conference way back in 2010 – video here. Apart from the content of that talk, the thing I liked most was that I used Totem as my presentation tool 🙂

With all the nice changes that GStreamer 1.0 brought, DVD playback worked better than ever. I was able to delete a bunch of hacks and workarounds from the 0.10 days. There have been some bugs, but mostly minor things. Recently though, I became aware of a whole class of DVDs that didn’t work for a very silly reason. The symptom was that particular discs would error out at the start with a cryptic “The stream is in the wrong format” message.

It turns out that these are DVDs that begin with a piece of video that has no sound.

Sometimes, that’s implemented on a disc as a video track with accompanying silence, but in the case that was broken the DVDs have no audio track for that initial section at all. For a normal file, GStreamer would handle that by not creating any audio decoder chain or audio sink output element and just decode and play video. For DVD though, there are very few discs that are entirely without audio – so we’re going to need the audio decoder chain sooner or later. There’s no point creating and destroying when the audio track appears and disappears.

Accordingly, we create an audio output pad, and GStreamer plugs in a suitable audio output sink, and then nothing happens because the pipeline can’t get to the Playing state – the pipeline is stuck in the Paused state. Before a pipeline can start playing, it has to progress through Ready and Paused and then to Playing state. The key to getting from Paused to Playing is that each output element (video sink and audio sink) in our case, has to receive some data and be ready to output it. A process called Pre-roll. Pre-rolling the pipeline avoids stuttering at the start, because otherwise the decoders would have to race to try and deliver something in time for it to get on screen.

With no audio track, there’s no actual audio packets to deliver, and the audio sink can’t Pre-roll. The solution in GStreamer 1.0 is a GAP event, sent to indicate that there is a space in the data, and elements should do whatever they need to to skip or fill it. In the audio sink’s case it should handle it by considering itself Pre-rolled and allowing the pipeline to go to Playing, starting the ring buffer and the audio clock – from which the rest of the pipeline will be timed.

Everything up to that point was working OK – the sink received the GAP event… and then errored out. It expects to be told what format the audio samples it’s receiving are so it knows how to fill in the gap… when there’s no audio track and no audio data, it was never being told.

In the end, the fix was to make the dummy place-holder audio decoder choose an audio sample format if it gets a GAP event and hasn’t received any data yet – any format, it doesn’t really matter as long as it’s reasonable. It’ll be discarded and a new format selected and propagated when some audio data really is encountered later in playback.

That fix is #c24a12 – later fixed up a bit by thiagoss to add the ‘sensible’ part to format selection. The initial commit liked to choose a samplerate of 1Hz 🙂

If you have any further bugs in your GStreamer DVD playback, please let us know!

Proof of life – A New Adventure!

Hi world! It’s been several years since I used this blog, and there’s been a lot of things happen to us since then. I don’t even live on the same continent as I did.

More on that in a future post. Today, I have an announcement to make – a new Open Source company! Together with fellow GStreamer hackers Tim-Philipp Müller and Sebastian Dröge, I have founded a new company: Centricular Ltd.

From 2007 until July, I was working at Oracle on Sun Ray thin client firmware. Oracle shut down the project in July, and my job along with it – opening up this excellent opportunity to try something I’ve wanted for a while and start a business, while getting back to Free Software full time.

Our website has more information about the Open Source technologies and services we plan to offer. This list is not complete and we will try to broaden it over time, so if you have anything interesting that is not listed there but you think we can help with, get in touch

As Centricular’s first official contribution to the software pool, here’s my Raspberry Pi Camera GStreamer module. It wraps code from Raspivid to allow direct capture from the official camera module and hardware encoding to H.264 in a GStreamer pipeline – without the shell pipe and fdsrc hack people have been using to date. Take a look at the README for more information.

Raspberry Pi Camera GStreamer element

Sebastian, Tim and I will be at the GStreamer Conference in Edinburgh next week.

A glimpse of audio nirvana

This is post is basically a love letter to the Pulseaudio and Gnome Bluetooth developers.

I upgraded my laptop to Ubuntu Karmic recently, which brought with it the ability to use my Bluetooth A2DP headphones natively. Getting them running is now as simple as using the Bluetooth icon in the panel to pair the laptop with the headphones, and then selecting them in the Sound Preferences applet, on the Output tab.

As soon as the headphones are connected, they show up as a new audio device. Selecting it instantly (and seamlessly) migrates my sounds and music from the laptop sound device onto the headphones. The Play/Pause, Next Track and Previous Track buttons all generate media key keypresses – so Rhythmbox and Totem behave like they’re supposed to. It’s lovely.

If that we’re all, it would already be pretty sweet in my opinion, but wait – there’s more!

A few days after starting to use my bluetooth headphones, my wife and I took a trip to Barcelona (from Dublin, where we live for the next few weeks… more on that later). When we got to the airport, the first thing we learned was that our flight had been delayed by 3 hours. Since I occasionally hack on multimedia related things, I typically have a few DVDs
with me for testing. In this case, I had Vicky Christina Barcelona on hand, and we hadn’t watched it yet – a perfect choice for 2 people on their way to Barcelona.

Problem! There are four sets of ears wanting to listen to the DVD, and only 2 audio channels produced. I could choose to send the sound to either the in built sound device, and listen on the earbuds my wife had, or I could send it to my bluetooth headphones, but not both.

Pulseaudio to the rescue! With a bit of command-line fu (no GUI for this, but that’s totally do-able), I created a virtual audio device, using Pulseaudio’s “combine” module. Like the name suggests, it combines multiple other audio devices into a single one. It can do more complex combinations (such as sending some channels hither and others thither), but I just needed a straight mirroring of the devices. In a terminal, I ran:

pactl load-module module-combine sink_name=shared_play adjust_time=3 slaves=”alsa_output.pci-0000_00_1b.0.analog-stereo,bluez_sink.00_15_0F_72_70_E1″

Hey presto! Now there’s a third audio device available in the Sound Preferences to send the sound to, and it comes out both the wired ear buds and my bluetooth headphones (with a very slight sync offset, but close enough for my purposes).

Also, for those interested – the names of the 2 audio devices in my pactl command line came from the output of ‘pactl list’.

This kind of seamless migration of running audio streams really isn’t possible to do without something like Pulseaudio that can manage stream routing on the fly. I’m well aware that Pulseaudio integration into the distributions has been a bumpy ride for lots of people, but I think the end goal justifies the painful process of fixing all the sound drivers. I hope you do too!

edit
Lennart points out that the extra paprefs application has a “Add virtual output device for simultaneous output on all local sound cards” check-box that does the same thing as loading the combine module manual, but also handles hot-plugging of devices as they appear and disappear.

OSSbarcamp 2 – GNOME 3.0 talk

I gave a talk at the second Dublin OSSbarcamp yesterday. My goal was to provide some insight into the goals for GNOME 3.0 for people who didn’t attend GCDS.

Actually, the credit for the entire talk goes to Vincent and friends, who gave the GNOME 3.0 overview during the GUADEC opening at GCDS and to Owen for his GNOME Shell talk. I stole content from their slides shamelessly.

The slides are available in ODP form, or as a PDF

Towards GStreamer 1.0 talk

 GStreamer logo

I gave my talk titled “Towards GStreamer 1.0” at the Gran Canaria Desktop Summit on Sunday. The slides are available here

My intention with the talk was to present some of the history and development of the GStreamer project as a means to look at where we might go next. I talked briefly about the origins of the project, its growth, and some of my personal highlights from the work we’ve done in the last year. To prepare the talk, I extracted some simple statistics from our commit history. In those, it’s easy to see both the general growth of the project, in terms of development energy/speed, as well as the increase in the number of contributors. It’s also possible to see the large hike in productivity that switching to Git in January has provided us.

The second part of the talk was discussing some of the pros and cons around considering whether to embark on a new major GStreamer release cycle leading up to a 1.0 release. We’ve successfully maintained the 0.10 GStreamer release series with backwards-compatible ABI and API (with some minor glitches) for 3.5 years now, and been very successful at adding features and improving the framework while doing so.

After 3.5 years of stable development, it’s clear to me that when we made GStreamer 0.10, it really ought to have been 1.0. Nevertheless, there are some parts of GStreamer 0.10 that we’re collectively not entirely happy with and would like to fix, but can’t without breaking backwards compatibility – so I think that even if we had made 0.10 at that point, I’d want to be doing 1.2 by now.

Some examples of things that are hard to do in 0.10:

  • Replace ugly or hard to use API
  • ABI mistakes such as structure members that should be private having been accidentally exposed in some release.
  • Running out of padding members in public structures, preventing further expansion
  • Deprecated API (and associated dead code paths) we’d like to remove

There are also some enhancements that fall into a more marginal category, in that they are technically possible to achieve in incremental steps during the 0.10 cycle, but are made more difficult by the need to preserve backwards compatibility. These include things like adding per-buffer metadata to buffers (for extensible timestamping/timecode information, pan & scan regions and others), variable strides in video buffers and creating/using more base classes for common element types.

In the cons category are considerations like the obvious migration pain that breaking ABI will cause our applications, and the opportunity cost of starting a new development cycle. The migration cost is mitigated somewhat by the ability to have parallel installations of GStreamer. GStreamer 0.10 applications will be able to coexist with GStreamer 1.0 applications.

The opportunity cost is a bit harder to ignore. When making the 0.9 development series, we found that the existing 0.8 branch became essentially unmaintained for 1.5 years, which is a phenomenon we’d all like to avoid with a new release series. I think that’s possible to achieve this time around, because I expect a much smaller scope of change between 0.10 and 1.0. Apart from the few exceptions above, GStreamer 0.10 has turned out really well, and has become a great framework being used in all sorts of exciting ways that doesn’t need large changes.

Weighing up the pros and cons, it’s my opinion that it’s worth making GStreamer 1.0. With that in mind, I made the following proposal at the end of my talk:

  • We should create a shared Git playground and invite people to use it for experimental API/ABI branches
  • Merge from the 0.10 master regularly into the playground regularly, and rebase/fix experimental branches
  • Keep developing most things in 0.10, relying on the regular merges to get them into the playground
  • After accumulating enough interesting features, pull the experimental branches together as a 0.11 branch and make some released
  • Target GStreamer 1.0 to come out in time for GNOME 3.0 in March 2010

This approach wasn’t really possible the last time around when everything was stored in CVS – it’s having a fast revision control system with easy merging and branch management that will allow it.

GStreamer Summit

On Thursday, we’re having a GStreamer summit in one of the rooms at the university. We’ll be discussing my proposal above, as well as talking about some of the problems people have with 0.10, and what they’d like to see in 1.0. If we can, I’d like to draw up a list of features and changes that define GStreamer 1.0 that we can start working towards.

Please come along if you’d like to help us push GStreamer forward to the next level. You’ll need to turn up at the university GCDS venue and then figure out on your own which room we’re in. We’ve been told there is one organised, but not where – so we’ll all be in the same boat.

The summit starts at 11am.