Augmented Reality Frameworks
I have been on holidays lately and it's been so good. My wife and I and four kids flew to the US to spend Christmas with our families in North Dakota. In addition to time with my family and extended family, having a few weeks off has allowed me to spend a good amount of times with my other interests, mainly around robotics and connected systems.
Intel has a lot of sway in the industry and if they are throwing their weight behind something I want to know about it. I started digging into what RealSense was by reading articles, blogs, everything. They have great docs online and I started to pour over their SDKs — the more and more I looked the more interested I become.
RealSense is Intel's move to include newfound abilities within the laptop, tablet and phone. These abilities include a wide range of things from depth sensing cameras, augmented reality, voice recognition, object scanning, re-focusing after a picture is taken, hand tracking and facial recognition (to name a few).
Front facing and world facing?
It took me a bit to understand why there are multiple products within the RealSense line. Currently there are two cameras, a front facing unit and a "world" facing unit. Finding out why that's the case and what they do took some digging. Eventually I came across a page which sums it up nicely, the front facing camera is for:
- Interact naturally
- Immersive collaboration
- Windows-based devices
The world facing camera is for:
- Augmented reality
- Enhanced photography
- Scene perception
- 3D scanning
Initially I thought this was odd to have such a clear distinction. I would expect that a single RealSense camera would have all these capabilities, but as I read more I realised the "why". This field is so new and the hardware is so new that Intel needs to have a fairly public R&D effort with their hardware. They need to quickly iterate on their hardware and that is more easily done if they can divide up the major functions of each and then have different teams working on them. The actual hardware is different enough that if it were wrapped up in one unit I would imagine it would first be an unwieldy camera,
I would say that towards the end of the year there will be just one RealSense camera that will cover all the functions of the current front and world facing cameras.
Currently they have two dev kits out, the F200 and R200. A very exciting prototype phone is being co-developed with Google, unfortunately is it not yet available to the developer community so I have put in my pre-order.
The front facing F200 is the short range camera that would be embedded in laptops and LCD monitors. As mentioned above, it's mainly aimed at gesture, face and hand recognition.
The R200 is a long range, HD camera that is focused on the outer world. It is used for augmented reality, spatial mapping, and enhanced photography uses. It would be embedded in phones and tablets.
Let's get started
I ordered the F200 & R200 straight away. The F200 was immediately available and was shipped the next day. The R200 was only listed as pre-order and I still haven't heard any news as to when it will ship. I am hoping it's within the next week or so while I am still in the US.
While I was waiting for the F200 to arrive I downloaded the SDK and started reading the API. The following features are in gold status (that is fully supported by the SDK).
- SDK essential interfaces and colour/depth/IR data streaming
- Face Tracking
- Hand Tracking
- Speech Recognition and Synthesis
- Unity* Toolkit
- Object Tracking
- 3D Scan (except Object)
Of most interest to me was the Object Tracking & 3D Scanning function. I wanted to explore how a real-world object could be tracked in real time with the users interactions also being tracked. This could allow for novel games or interactive UIs that were made up of every day objects.
Both of the above cameras are USB 3 tethered affairs which does limit their portability and use-cases. The real magic comes when this technology is married to mobile. That is were the ZR300 comes into play (the mobile phone that Intel is releasing with Google). I ordered one of those as well, but it is not yet released so waiting patiently.
I also came across Google's vision for AR, Project Tango. I dug deeper into this and it looks great, but for me doesn't hold the same interest. I can definitely see where Google is going with Tango and it is all about Spatial Mapping and Augmented Directions.
I use a Mac and all the SDKs are currently Windows only :( A quick search on the net revealed what I was hoping to be the case, simply using BootCamp on OSX. I did run into one issue though, my mac would not allow me to create a new partition. The error message it gave was particularly unhelpfull; essentially it said, "It didn't work". Thanks Apple :) A few google searches later and I found the issue.
After downloading several gigs of applications it all installed without a hitch. I was then able to run the example programs. None of them stood out except the app that allowed for object recognition. There were a few modes but the main paths were to first record and object in 3D space and the other was to then find this object in a scene.
I will post a follow up article on how this went, but a quick preview, not the best. I put this to the fact that I didn't have the best lighting or the most stable way to rotate the object back and forth to get an accurate read. I used my phone to add some additional light and I just turned a piece of paper back and forth. So yeah, not to good...
I think I will create some sort of turntable with my Mindstorms kit or a simple Arduino controlled servo. More to come.
What do you think?
What are your thoughts on Augmented Reality, RealSense, Tango? I would love to hear your opinion or get some feedback so please add a comment or drop me a line on Twitter: @davidseth.