[05] An Architecture for Gesture-Based Control of Mobile Robots (Iba – 1999)

27 January 2008

Current Mood: uber-humored

Blogs I Commented On:


Rant:
At first, I didn’t think it was worth the effort to use a haptics approach for controlling mobile robots compared to using a conventional remote controller, especially for a single robot. Then I thought about it for a moment and imagined it from a different perspective. Suppose I had control of an army of obedient and competent toddlers. If I had to use a mechanism to control them, would it be more practical to use gesture-based hand-motion controls or a conventional controller. It’s a silly question anyway. Where could I possibly find enough diapers to supply such an army?

Summary:
The idea in this paper is to transfer the burden of programming manipulator systems from robot experts to task experts, whom have extensive task knowledge but limited robotics knowledge. The goal is to enable new users to interact with robots by having an intuitive interface and the ability to interpret sometimes vague user specifications. The challenge is in robots interpreting user intent instead of simply mimicking it. Their approach is to use hand gestures as input to mobile robots using a data glove and position sense, allowing for richer interaction. It’s currently limited due to the system having cable connections that may be overcome by technological advances in wearable computing systems.

The system is composed of a data glove, a position sensor, and a geolocation system that tracks position and orientation of the mobile robot. The data glove and position sensor are connected to a gesture sensor to spot and interpret gestures. Waving in a direction moves the robot (local control mode), while pointing at a desired location emulates a ‘point-and-go’ command (global control mode). Gestures themselves are recognized with an HMM-based recognizer. First, data preprocessing is done on the data glove’s joint angel measurements to improve the speed and performance of gesture classification, and then gesture spotting occurs afterwards by classifying the data as one of six gestures (OPENING, OPENED, CLOSING, POINT, WAVING LEFT, WAVING RIGHT) or “none of the above” (to prevent inadvertent robot actions).

The HMM-algorithm differs from Rubiner and Juang’s standard forward-backward technique in two ways. First, an observation sequence is limited to the n most recent observations, since an increased observation sequence given the HMM decreases its probability to zero. Second, a “wait state” is created as the first transition node to all the gesture models and itself with equal probability. The reasoning behind it is that if an observation sequence corresponds to any sequence, the final state in that gesture will have the highest probability, else it is trapped in the “wait state” while subsequence observations raise the correct gesture’s probability. This is in response to the current model where one of six gestures are selected unless the threshold is high enough to reject them all, which the authors find unacceptable since it may also exclude gestures performed slightly differently from the training gestures.

Discussion:
The idea of controlling a group of mobile robots (or an army of toddlers, your choice) using a haptics approach seems more feasible than current methods, since hand commands feels like a natural thing to do. Therefore, I think the paper’s work towards that has a lot of merit. In regards to the execution, I found their modified HMM-based implementation of this system quite intriguing. The first modification of limiting the observation sequence to only the n most recent observations feels like it should have been in the original standard forward-backwards technique. I still haven’t come up with a reason as to the benefits of not restricting the sequence of observations at the moment.

Their second modification of employing a "wait state" was also a novel move to recognize gestures which were performed slightly different from the training ones. The authors reasoned that observation sequences which didn’t correspond to any gesture would be held in a "wait state" until more observations raised the correct gesture’s probability. There’s one part about this second modification that confuses to me though. Let’s suppose that during the execution of a gesture, the user does a posture different enough from the training data to confuse the system. According to this second modification, the observation state up to that posture would put the gesture into a "wait state," that is until more data from a later observation state connected to that gesture would raise intended correct gesture’s probability. What happens if the observation state had capture the user finished with the gesture while it was in a "wait state"? Future observation states would stem from the next gesture being executed by the user. From what I read, I would imagine the probability of the previous gesture would not increase, and thus never classify the previous gesture (if the n value for the number of recent observations is low enough) or classify the current gesture incorrectly (if the n value is high enough).

2 comments:

Brandon said...

i think the authors confirmed your suspicions when they mentioned that the HMM with wait state missed 2 gestures but the HMM without wait state spotted every gesture correctly. obviously the wait state caused these 2 gestures to be missed, but why? it would have been nice if the authors could have provided some incite about why the HMM with wait state missed these gestures while the HMM without wait state didn't. were they performed more slowly or quickly that the others? which gesture was it that was missed? was it a wave? a point? some more explanation would have been beneficial.

Marimba said...

I think the idea of controlling an army of mobile robots is interesting, though I'm not sure the controls they used in the paper would be entirely useful. I have a hard time imagining a scenario where all the robots should move in exactly the same way -- to control many robots, it seems like they'd almost have to have some individual pathfinding capability of their own. On the other hand, adding some new gestures might be useful for controlling a group: maybe using two hands to signal that they should spread out, group up, or split into two groups.