Overview and File Structure

An overview of the existing Gesture-Recognition Subsystem implementation as well as the code for the subsystem is provided here (Note, unit test have not yet been written for the code base nor, has the code been tested extensively as laid out in the test plans, due to restrictions imposed by COVID-19.) The application was implemented using python’s kivy language to construct a GUI. Further documentation for this package can be found here. Several class files were also created to handle various aspects of the project. A brief description of each relevant file and class file are provided below:

  • App.py: defines the layout, and behavior of the main page of the GUI application. Several classes are included in this file which define different aspects of the GUI. This is also the file that should be run to run the application.
  • Gesture.kv: As per the standards of the Kivy language, this file interacts directly with app.py, and more explicitly defines the layout of different elements on the page.
  • Humans.py: defines the human object which is used to encapsulate the attributes of humans in the environment. In the current version of the application (as of May, 2020) these attributes are identity, current pose, and the predicted gesture.
  • Robot.py handles all interactions between the robot and the batcave. Due to limitations imposed by COVID-19, a majority of this class has yet to be implemented, though a basic layout is provided in the source code.
  • GestureClassification.py: handles all tasks related to adding features to the gesture queue, normalizing features, and providing predictions.
  • GestureSensor.py: interfaces with the available sensors to return an input image. Currently the two supported sensor types are a kinect version 1, and a web camera. Documentation for integrating the Kinect can be found here.
  • FaceRecognition.py: handles creating facial embeddings as well as returning identifying users in the environment during runtime. This application relies heavily on the face_recognition python library, documentation for which can be found here.
  • PosePrediction.py: contains the OpenPose prediction model. Additionally, this class handles the initial assignment of identities and poses to the human object, as well as updating these values for subsequent frames. Documentation for the OpenPose implementation used in this project are provided here.
  • Constants.py: contains global constants used throughout the code.

Additionally, the current application (as of May 2020) contains both a user and gesture database used for storing users and gestures, respectively. The application supports adding, deleting, and editing users and gestures through the GUI.


Main Loop Overview:

The process begins when the Batcave compute resource is powered on. This subsystem should be directly connected to a wall outlet and should not draw power from the remainder of the system. Incoming images will be streamed from the Kinect or video camera. These images are passed through a mobile implementation of the OpenPose model, which returns a sequence of 2D coordinates for each person in the environment. The input image is also passed to FaceRecognition.py which uses python's face_recognition library which utilizes facial embedding to predict user's identities (Users need to be inputted into the system prior to run time.) The skeletal output from openpose, is then correlated with the identities returned from FaceRecognition.py and stored as a human object. A separate human object is created for each person on the screen, and is stored in memory as long as that human remains on screen. Each human object also stores a feature queue which contains multiple frame-level features contained in a feature container. 

The contents of the frame-level, feature container are pushed to the multi-frame feature queue, of fixed length n. Here n is set equal to 15. It is important to note that the contents of the feature container will be treated as a high-level real vector for the purposes of later analysis. (Note: In the current version of the system, the feature container is largely unnecessary, as there is only one feature being outputted. However this architecture supports the addition of multiple features at a later date (facial expression, hand pose, etc...)  which will be combined in the feature container.)  Frame level features  are added to the feature queue until it is full. At this point, when new features are added to the queue, the oldest element is ejected from the queue in a first in, first out (FIFO) configuration. After each update to the queue, the current queue contents are compared to the contents of the Gesture Database using a KNN implementation using dynamic time warping as a distance metric. When a gesture is successfully recognized, the system will output this gesture to the robot, via robot.py. 


Codebase Download Instructions

Github Repository: https://github.com/mxj5897/alfrd-msd.git

Zip File: https://drive.google.com/open?id=1gY7iwc3blzQGS4yGWpQaKnM0ul1DMr2j

Tar File: https://drive.google.com/open?id=1tTK87sj9Cqs2ABBiepAbo1MbbnsaG3sw



Note: The files attached below do NOT represent the entire code base (too large to upload here.) To gain access to the full code base see the link to the above github repository.

  Fichier Modifié(e)
Fichier app.py avr. 21, 2020 by Michael Johnson
Fichier constants.py avr. 21, 2020 by Michael Johnson
Fichier faceRecognition.py avr. 21, 2020 by Michael Johnson
Fichier gesture.kv avr. 21, 2020 by Michael Johnson
Fichier gestureClassification.py avr. 21, 2020 by Michael Johnson
Fichier gestureSensor.py avr. 21, 2020 by Michael Johnson
Fichier human.py avr. 21, 2020 by Michael Johnson
Fichier posePrediction.py avr. 21, 2020 by Michael Johnson
Fichier texte requirements.txt avr. 21, 2020 by Michael Johnson
Fichier robot.py avr. 21, 2020 by Michael Johnson