Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control (Sawada & Hashimoto – 1997)

02 April 2008

Current Mood:

Blogs I Commented On:


Summary:
The focus of this paper is gesture recognition using simple three-dimensional acceleration sensors with weights, where the domain is conducting. From these sensors, the accelerations in the x, y, and z directions are obtained. Motion features are extracted from three vectors, each of which are a combination of two of the directions. Kinetic parameters are then extracted from those motion features, which are: intensity of motion, rotational direction, direction of the main motion, and distributions of the acceleration data over eight principal directions. For a series of acceleration data, a kind of membership function was used as a fuzzy representation of the closeness to each principal direction, and it’s defined as an isolated triangular function which decreases in value linearly from the principal direction to the neighboring principal directions. Each kinetic parameter is then calculated from the time-series vectors, and gestures are recognized from a total of 33 kinetic features.

To recognize the gestures, the start of them is first identified from the magnitude of the acceleration. Before system use, an individual’s motion feature patterns are constructed during training and gesture start is detected when his/her acceleration is larger than the fluctuation is observed. Samples of the motion during training are recognized as input M times, and some standard pattern data for each gesture are constructed and stored. During recognition mode, weight errors for each feature parameter of an unknown motion’s acceleration data series is computed, and the dissimilarity to the patterns are determined. For musical tempo recognition, the rotational component is ignored, and the rhythm is extracted from the pattern of changes of the magnitude and phase of the vector. The instant when the conductor’s arm reaches the lowest point of the motion space, it’s recognized as the rhythm point. By detecting the maxima periodically, the rhythm points can be identified in real-time. To evaluate the system, the authors tested 10 kinds of gestures used for performance control. Two users repeated the gestures 10 times each for standard pattern construction. Though their system recognized the gestures for the first user trained on perfectly but not completely for the second user tested on, the authors concluded nonetheless that it still does better than vision-based or other approaches.

Discussion:
Before I discuss the technical content of this paper, I just want to say that the paper format was written suckily. The related section was slapped on in the middle of the paper out of left field, and I felt that the figure and table numbers were not easy to locate quickly within the paper sometimes. Okay, back on topic. My thoughts:

  • I’m glad that this paper did not spend two pages talking about music theory.

  • The shape of the conducting gestures in Aaron’s user study looked pretty complex. It seemed like that the author’s tackled recognition of the conducting gestures by simplifying them as a collection of directional gestures. I’m no expert on conducting gestures, but I wonder if the author’s approximation sufficient enough to separate all the gestures in Aaron’s domain. I will say yes for now.

  • I know people in our class will criticize this paper for its lame evaluation section. This seems so common in the GR papers we’ve been reading that I just gave up and accepted it as being the norm for this field.

  • I love reading translated research papers written by Japanese authors. They’re the funny.

1 comments:

Grandmaster Mash said...

A lack of an evaluation is more of a nuisance at this point than an infuriating factor. And their system uses gestures, but mine will probably not include a typical "gesture" since I'd like people to be able to conduct in any time.