Let's handle virtual objects, bare handed!

Natural User Interface

Posted by gduhamel at 8:36 pm

Natural user interaction with a virtual environment is another subject I was planning to focus on. Today touch screen or motion sensing devices bring us closer to information, making it tangible.
However, display is still the same, two-dimensional screen captive.

When the information comes on top of user perception, it seems logical to choose its point of view as a base for interaction.

Objective, coding this application, was to enable augmented reality handling from video eyewear devices. Using a camera to sense user actions, this one would interact with virtual display.

Point of reference

I used my shape recognition algorithm, written last month, to detect interaction attempts.
Advantages: fair tolerance of mistakes. Hand may be slightly angled: one side or another, up or down, longer or shorter, it will still be detected.

This method is best suited for multiple user accessibility. The only condition is to repeat the same predefined gesture.


The tricky part, though the shortest.
Each virtual constituent displayed is known by the system as an array of coordinates. With a point of reference’s position we know the “touched” area. Remaining problem is basic geometry exercise.

  • Interaction direction/vector calculation
  •  Looking for intersections with known surfaces

Interaction types

I wanted, at least, two possible interactions with virtual items. The simplest one: a “click” like selection, and a simplified Drag’n’drop, both relying on the same gesture.

Wheight matters

Unlike mouse click, visual shape recognition is vague. False positives can occur and misguide the system on user intentions. That is why I introduced a “weight” mechanic to trigger any interaction.

  • When an interaction is spotted, virtual item gains weight.
  • When no interaction spotted, virtual item drops weight.

=> An item is considered clicked when it reaches a defined weight.
=> Drag’n’drop context: weight defines user movement’s influence on targeted item.


As a conclusion, allowing somebody to “touch” virtual space is quite simple. Method used above can recognize more than just a hand, and multiple gestures.
It’s an easy way to bring natural user interaction into augmented reality.

Posted by gduhamel at 6:51 pm

After several attempts to detect gesture, relying on skin color, I tried object detection through the famous Viola & jones method. I used it as implemented in OpenCV library.

The experiment was about detecting a gesture used to select some displayed item. Among every possible hand movement, the opposite figure seemed the most intuitive and recognizable.

Picture Samples

Viola & Jones method is well known about efficiently detecting faces. But there is nothing about hands, even less for hands’ back. Software must be trained to identify this particular shape.

In order to compute the shape’s specific Haar-Like Deatures used by detection algorithm, many samples are required. A set of picture focused on the main object. And a set of random scenes, not containing the form to be detected.

Negative images are numerous on the internet. To industrially produce positive images, I wrote a little software. 

  • This software uses Webcam stream
  • A ten seconds countdown allows the user to fit an object into capture area.
  • By the end of the countdown, a snapshot is resized and saved.
  • Positive pictures are rectangular, it is not a problem.
  • Picture is then processed by an OpenCV utility (named createsamples), creating more images, slightly distorded. Modified contrast, scale or rotation, negative colours, …
  • For every supplied picture, 20 are created.

My 1500 positive images goal was worth 10 minutes.


Supervised learning

1500 positive images and 3500 negatives was enough for OpenCv to compute Haar-Like Features. OpenCV library provides the “haartraining” utility, doing the job for us.
The goal was to detect the shape with wide tolerances, allowing few false positives. It took the computer 8 hours to work out a 10 stages detector.

Below, some features involved in detection:



Results are quite interesting, regarding the short training duration. Initial shape is correctly and efficiently detected, while false positives seem to be rare.

Loading Video…

This shape recognition may be part of a detection process. Simply used to locate features for the software to investigate and identify a user interaction.


OpenC V Library documentation
Naotoshi Seo’s Tutorial on his blog