A common use of the Affdex asset is to process previously captured video files. To do this, you will need to decode the video file and pass Frame objects to a Detector. The VideoFileInput script included in the Affdex Unity package helps streamline this effort.

Adding a Detector to a scene

The first step is to add a detector to your scene’s Main Camera (Add Component -> Scripts -> Affdex -> Detector):

Configuring the Detector

In the “Detector” section of the Inspector pane, you can now configure the emotions and expressions you are interested in (the more you select the worse performance will be, so only select the ones you need):

Alternatively, you can enable and disable emotion and expression classifiers programmatically. Here is an example of how to enable the detection of the smile classifier, by calling the Detector class’ SetExpressionState method:

void SetExpressionState(Expressions.Smile, true);

By default, all classifiers are disabled. Every classifier you enable will take a bit more system resources.

The Affdex classifier data files are used in frame analysis processing. These files are supplied as part of the asset. The location of the data files on the physical storage must remain as:

Assets/StreamingAssets/affdex-data*

Feeding video frames to the Detector

During processing, the VideoFileInput script decodes and processes frames as fast as possible, and actual processing times will depend on CPU speed. Please see this list of accepted file types and recommended video codecs that are compatible with the detector.

The VideoFileInput script is meant more as an example or for testing than for use in an actual game. Android and iOS don’t support the MovieTexture that this script relies on, and thus it cannot be used on those platforms. After adding it to a scene, you can set a default video and a sample rate. The sample rate defines how many times per second to pass the video frames to the Detector for processing metrics. As an example, if the video is 60 frames per second (YouTube’s currently supported frame rate) and you have the sample rate set to 20, then 20 of the 60 frames per second will be processed. If the video has no camera cuts, and one consistent face, than a sample rate as low as 5 should be sufficient.

Getting results

The Detector uses callback methods to communicate events and results: The ImageResultsListener interface defines methods that are invoked when the Detector has started or stopped tracking a face, and when it has detection results for a face. The OnFaceLost, OnFaceFound, and OnImageResults methods must be defined as part of a class attached as a component within Unity. Here is an example:

using Affdex;
using System.Collections.Generic;

public class PlayerEmotions : ImageResultsListener
{
    public float currentSmile;
    public float currentInterocularDistance;
    public float currentContempt;
    public float currentValence;
    public float currentAnger;
    public float currentFear;
    public FeaturePoint[] featurePointsList;

    public override void onFaceFound(float timestamp, int faceId)
    {
        Debug.Log("Found the face");
    }

    public override void onFaceLost(float timestamp, int faceId)
    {
        Debug.Log("Lost the face");
    }

    public override void onImageResults(Dictionary<int, Face> faces)
    {
        Debug.Log("Got face results");

        foreach (KeyValuePair<int, Face> pair in faces)
        {
            int FaceId = pair.Key;  // The Face Unique Id.
            Face face = pair.Value;    // Instance of the face class containing emotions, and facial expression values.

            //Retrieve the Emotions Scores
            face.Emotions.TryGetValue(Emotions.Contempt, out currentContempt);
            face.Emotions.TryGetValue(Emotions.Valence, out currentValence);
            face.Emotions.TryGetValue(Emotions.Anger, out currentAnger);
            face.Emotions.TryGetValue(Emotions.Fear, out currentFear);

            //Retrieve the Smile Score
            face.Expressions.TryGetValue(Expressions.Smile, out currentSmile);


            //Retrieve the Interocular distance, the distance between two outer eye corners.
            currentInterocularDistance = face.Measurements.interOcularDistance;


            //Retrieve the coordinates of the facial landmarks (face feature points)
            featurePointsList = face.FeaturePoints;

        }
    }
}

OnImageResults delivers the detection results via a dictionary of Face objects, which contain the values of all expressions and emotions for a face in a frame. It also allows you to get the interocular distance, facial feature point locations, and the orientation of the face. Note that in the current release, the dictionary will contain at most one face; multiple faces are not supported yet.

For a fully implemented sample, check out EmoSurvival. You can use OnFaceLost to pause a game. If you use Time.timeScale to pause, the camera script will also pause, as it uses the same time values.

Data Structures

Frame

The Frame class is used for passing images to the Detector. The Frame class’ constructor requires the width and height of the frame, the color format of the incoming image must be supplied, and a timestamp. If the source of the image content has an associated timestamp, you should use that; otherwise, if the images are coming from a real time source, you could use Time.realtimeSinceStartup.

There are two versions of the Frame constructor. The first expects an upright image:

Frame(Color32[] rgba, int width, int height, float timestamp);

The second requires the orientation of the image:

Frame(Color32[] rgba, int width, int height, Orientation orientation, float timestamp);

Face

The Face class represents a face found within a processed frame. It contains results for detected expressions, emotions, and head measurements.

Face.Expressions
Face.Emotions
Face.Measurements

The Face object also enables users to retrieve the feature points associated with a face:

Face.FeaturePoints

Expressions

Expressions is a representation of the probabilities of the facial expressions detected. Each value represents a probability between 0 to 100 of the presence of the expression in the frame analyzed:

struct Expressions
{
  float Smile;
  float InnerEyeBrowRaise;
  float BrowRaise;
  float BrowFurrow;
  float NoseWrinkler;
  float UpperLipRaiser;
  float LipCornerDepressor;
  float ChinRaiser;
  float LipPucker;
  float LipPress;
  float LipSuck;
  float MouthOpen;
  float Smirk;
  float EyeClosure;
  float Attention;
};

Emotions

Emotions is a representation of the probabilities of the emotions detected. Each value represents a probability between 0 to 100 of the presence of the emotion in the frame analyzed. Valence, a measure of positivity or negativity of the expressions, ranges from -100 to 100:

struct Emotions
{
  float Joy;
  float Fear;
  float Disgust;
  float Sadness;
  float Anger;
  float Surprise;
  float Contempt;
  float Valence;
  float Engagement;
};

Measurements

Measurements is a representation of the head and face measurements. The Interocular distance is defined as the distance between the two outer eye corners in pixels:

struct Expressions
{
  Orientation orientation;
  float interoculardistance;
};

<img src=”../images/graphic3.png” align=right>

Orientation

Orientation is a representation of the orientation of the head in a 3-D space using Euler angles (pitch, yaw, roll):

struct Orientation
{
  float pitch;
  float yaw;
  float roll;
};

FeaturePoint

FeaturePoint is the cartesian coordinates of a facial feature on the source image and is defined as the following:

struct FeaturePoint
{
  int id;
  float x;
  float y;
};

See the feature point indices table for a full list of feature points.