Gesture Recognition Using an Acceleration Sensor and Its Application to Musical Performance Control (Sawada & Hashimoto – 1997)

02 April 2008

Current Mood:

Blogs I Commented On:


Summary:
The focus of this paper is gesture recognition using simple three-dimensional acceleration sensors with weights, where the domain is conducting. From these sensors, the accelerations in the x, y, and z directions are obtained. Motion features are extracted from three vectors, each of which are a combination of two of the directions. Kinetic parameters are then extracted from those motion features, which are: intensity of motion, rotational direction, direction of the main motion, and distributions of the acceleration data over eight principal directions. For a series of acceleration data, a kind of membership function was used as a fuzzy representation of the closeness to each principal direction, and it’s defined as an isolated triangular function which decreases in value linearly from the principal direction to the neighboring principal directions. Each kinetic parameter is then calculated from the time-series vectors, and gestures are recognized from a total of 33 kinetic features.

To recognize the gestures, the start of them is first identified from the magnitude of the acceleration. Before system use, an individual’s motion feature patterns are constructed during training and gesture start is detected when his/her acceleration is larger than the fluctuation is observed. Samples of the motion during training are recognized as input M times, and some standard pattern data for each gesture are constructed and stored. During recognition mode, weight errors for each feature parameter of an unknown motion’s acceleration data series is computed, and the dissimilarity to the patterns are determined. For musical tempo recognition, the rotational component is ignored, and the rhythm is extracted from the pattern of changes of the magnitude and phase of the vector. The instant when the conductor’s arm reaches the lowest point of the motion space, it’s recognized as the rhythm point. By detecting the maxima periodically, the rhythm points can be identified in real-time. To evaluate the system, the authors tested 10 kinds of gestures used for performance control. Two users repeated the gestures 10 times each for standard pattern construction. Though their system recognized the gestures for the first user trained on perfectly but not completely for the second user tested on, the authors concluded nonetheless that it still does better than vision-based or other approaches.

Discussion:
Before I discuss the technical content of this paper, I just want to say that the paper format was written suckily. The related section was slapped on in the middle of the paper out of left field, and I felt that the figure and table numbers were not easy to locate quickly within the paper sometimes. Okay, back on topic. My thoughts:

  • I’m glad that this paper did not spend two pages talking about music theory.

  • The shape of the conducting gestures in Aaron’s user study looked pretty complex. It seemed like that the author’s tackled recognition of the conducting gestures by simplifying them as a collection of directional gestures. I’m no expert on conducting gestures, but I wonder if the author’s approximation sufficient enough to separate all the gestures in Aaron’s domain. I will say yes for now.

  • I know people in our class will criticize this paper for its lame evaluation section. This seems so common in the GR papers we’ve been reading that I just gave up and accepted it as being the norm for this field.

  • I love reading translated research papers written by Japanese authors. They’re the funny.

Activity Recognition using Visual Tracking and RFID (Krahnstoever, et al – 2005)

01 April 2008

Current Mood:

Blogs I Commented On:


Summary:
This paper discusses how RFID technology can augment a traditional vision system with very specific object information to significantly extend its capabilities. The system’s components consist of two modules: tracking and RFID tag tracking. For the former, human motion tracking estimates a person’s head and hand location in a 3D world coordinate reference frame using a camera system. The likelihood for a frame is estimated based on the summation of the image over the bounding box of either the head or hand in a specific view. The tracker continuously estimates proposal body part locations from the image data, which are used as initialization and recovery priors. The tracker also follows a sequential Monte Carlo filtering approach and performs partitioned sampling to reduce the number of particles needed for tracking. For the latter, the RFID tracking unit detects the presence, movements, and orientation of RFID tags in 3D space. An algorithm for articulated upper body tracking is given in the paper. Combining both trackers gives an articulated motion tracker, which outputs a time series of mean estimates of a subject’s head and hand locations in addition to visibility flags that express whether the respective body parts are currently visible in the optical field of view. By observing subsequent articulated movements, the authors claim that the tracker can estimate what the person is doing. These interactions were encoded as a set of scenarios using rules in an agent-based architecture. To test their system, a prototype system was created consisting of a shelf-type rack made to hold objects of varying sizes and shapes. When a person interacts with RFID-equipped objects, the system was able to detect which item the user was interacting with (difficult for a vision-only system) and the type of interaction the user was doing to the object (difficult for an RFID-only system).

Discussion:
I was familiar with RFID tags from other applications, but I didn't know or think that could be used for haptics. That's why I wasn't too sure at first about the relevance of this paper to our class. Josh P. fortunately enlightened our class that RFID tags can nicely supplement the existing devices we have in the labs. While the potential is definitely there for the type of things our class is doing, the paper itself was pretty poor in presenting that potential. Most of that can be attributed to a weak example application which, of course, happened to have no numerical results to speak of. But that's been like the norm in the paper's we've been reading.