Using a camera is a common way to obtain images for facial expression detection. In a Unity app, you can use a WebCamTexture to get camera frames, then create Affdex Frame objects from the frame data, and feed those Frames to a Detector object.

A Detector tracks expressions in a sequence of real-time frames. It expects each frame to have a timestamp that indicates the time the frame was captured. The timestamps arrive in an increasing order, which is why pausing the game using Time.timeScale can impact processing. The Detector will detect a face in a frame and deliver information on it to you.

Adding a Detector to a scene

The first step is to add a detector to your scene’s Main Camera (Add Component -> Scripts -> Affdex -> Detector):

Configuring the Detector

In the “Detector” section of the Inspector pane, you can now configure the emotions and expressions you are interested in (the more you select the worse performance will be, so only select the ones you need):

Alternatively, you can enable and disable emotion and expression classifiers programmatically. Here is an example of how to enable the detection of the smile classifier, by calling the Detector class’ SetExpressionState method:

void SetExpressionState(Expressions.Smile, true);

By default, all classifiers are disabled. Every classifier you enable will take a bit more system resources.

The Affdex classifier data files are used in frame analysis processing. These files are supplied as part of the asset. The location of the data files on the physical storage must remain as:

Assets/StreamingAssets/affdex-data*

Feeding camera frames to the Detector

To get camera frames and deliver them to the Detector, you can either use the CameraInput script provided with the Affdex SDK or write your own. To use ours, add the Camera Input component to your scene’s Main Camera (Add Component -> Scripts -> Affdex -> CameraInput):

Set the camera rate, camera location, width and height:

Affdex performs best using a resolution ratio of 4:3 (e.g.: 320x240, 640x480, 800x600, 1024x768, etc.) and a sample rate from 5 to 20. You can reduce CPU usage by lowering the resolution and sample rate.

To create your own script for getting images, take a look at the Frame data structure below. You can also refer to the CameraInput script as an example.

Changing the Camera

If the device has multiple cameras, you may want the user to have the option of selecting the camera to use. The CameraInput class automatically selects the device’s front-facing camera in its Start method, but you can switch to a different one by calling CameraInput.SelectCamera with a camera name as the second argument. You can get a list of connected cameras using the example code from Unity. Once you have a specific camera selected, you can add code similar to the following to one of your scripts:

using UnityEngine;
using System.Collections;

public class ExampleClass : MonoBehaviour {
    Transform mainCamera;
    CameraInput cameraInput;
    string cameraName;
    string currentCameraName = "";
    
    // Update is called once per frame
    void Update () {
    
        if (currentCameraName != cameraName)
        {
            cameraInput.SelectCamera(true, cameraName);
            currentCameraName = cameraName;
        }
    
    }
    
    void Awake () {
        mainCamera = GameObject.FindGameObjectWithTag ("MainCamera").transform;
        cameraInput = mainCamera.GetComponent <CameraInput>();
    }
}

When you switch scenes, you need to destroy and respawn the Detector and CameraInput. If you do not respawn these components, Unity’s camera interface will get a frozen image at reload, thus causing the metrics to continually come from the image taken at the scene transition.

Getting results

The Detector uses callback methods to communicate events and results: The ImageResultsListener interface defines methods that are invoked when the Detector has started or stopped tracking a face, and when it has detection results for a face. The OnFaceLost, OnFaceFound, and OnImageResults methods must be defined as part of a class attached as a component within Unity. Here is an example:

using Affdex;
using System.Collections.Generic;

public class PlayerEmotions : ImageResultsListener
{
    public float currentSmile;
    public float currentInterocularDistance;
    public float currentContempt;
    public float currentValence;
    public float currentAnger;
    public float currentFear;
    public FeaturePoint[] featurePointsList;

    public override void onFaceFound(float timestamp, int faceId)
    {
        Debug.Log("Found the face");
    }

    public override void onFaceLost(float timestamp, int faceId)
    {
        Debug.Log("Lost the face");
    }

    public override void onImageResults(Dictionary<int, Face> faces)
    {
        Debug.Log("Got face results");

        foreach (KeyValuePair<int, Face> pair in faces)
        {
            int FaceId = pair.Key;  // The Face Unique Id.
            Face face = pair.Value;    // Instance of the face class containing emotions, and facial expression values.

            //Retrieve the Emotions Scores
            face.Emotions.TryGetValue(Emotions.Contempt, out currentContempt);
            face.Emotions.TryGetValue(Emotions.Valence, out currentValence);
            face.Emotions.TryGetValue(Emotions.Anger, out currentAnger);
            face.Emotions.TryGetValue(Emotions.Fear, out currentFear);

            //Retrieve the Smile Score
            face.Expressions.TryGetValue(Expressions.Smile, out currentSmile);


            //Retrieve the Interocular distance, the distance between two outer eye corners.
            currentInterocularDistance = face.Measurements.interOcularDistance;


            //Retrieve the coordinates of the facial landmarks (face feature points)
            featurePointsList = face.FeaturePoints;

        }
    }
}

OnImageResults delivers the detection results via a dictionary of Face objects, which contain the values of all expressions and emotions for a face in a frame. It also allows you to get the interocular distance, facial feature point locations, and the orientation of the face. Note that in the current release, the dictionary will contain at most one face; multiple faces are not supported yet.

For a fully implemented sample, check out EmoSurvival. You can use OnFaceLost to pause a game. If you use Time.timeScale to pause, the camera script will also pause, as it uses the same time values.

Data Structures

Frame

The Frame class is used for passing images to the Detector. The Frame class’ constructor requires the width and height of the frame, the color format of the incoming image must be supplied, and a timestamp. If the source of the image content has an associated timestamp, you should use that; otherwise, if the images are coming from a real time source, you could use Time.realtimeSinceStartup.

There are two versions of the Frame constructor. The first expects an upright image:

Frame(Color32[] rgba, int width, int height, float timestamp);

The second requires the orientation of the image:

Frame(Color32[] rgba, int width, int height, Orientation orientation, float timestamp);

Face

The Face class represents a face found within a processed frame. It contains results for detected expressions, emotions, and head measurements.

Face.Expressions
Face.Emotions
Face.Measurements

The Face object also enables users to retrieve the feature points associated with a face:

Face.FeaturePoints

Expressions

Expressions is a representation of the probabilities of the facial expressions detected. Each value represents a probability between 0 to 100 of the presence of the expression in the frame analyzed:

struct Expressions
{
  float Smile;
  float InnerEyeBrowRaise;
  float BrowRaise;
  float BrowFurrow;
  float NoseWrinkler;
  float UpperLipRaiser;
  float LipCornerDepressor;
  float ChinRaiser;
  float LipPucker;
  float LipPress;
  float LipSuck;
  float MouthOpen;
  float Smirk;
  float EyeClosure;
  float Attention;
};

Emotions

Emotions is a representation of the probabilities of the emotions detected. Each value represents a probability between 0 to 100 of the presence of the emotion in the frame analyzed. Valence, a measure of positivity or negativity of the expressions, ranges from -100 to 100:

struct Emotions
{
  float Joy;
  float Fear;
  float Disgust;
  float Sadness;
  float Anger;
  float Surprise;
  float Contempt;
  float Valence;
  float Engagement;
};

Measurements

Measurements is a representation of the head and face measurements. The Interocular distance is defined as the distance between the two outer eye corners in pixels:

struct Expressions
{
  Orientation orientation;
  float interoculardistance;
};

<img src=”../images/graphic3.png” align=right>

Orientation

Orientation is a representation of the orientation of the head in a 3-D space using Euler angles (pitch, yaw, roll):

struct Orientation
{
  float pitch;
  float yaw;
  float roll;
};

FeaturePoint

FeaturePoint is the cartesian coordinates of a facial feature on the source image and is defined as the following:

struct FeaturePoint
{
  int id;
  float x;
  float y;
};

See the feature point indices table for a full list of feature points.