A common use of the Affdex asset is to process previously captured video files. To do this, you will need to decode the video file and pass Frame objects to a Detector. The VideoFileInput script included in the Affdex Unity package helps streamline this effort.
The first step is to add a detector to your scene’s Main Camera (Add Component -> Scripts -> Affdex -> Detector):
In the “Detector” section of the Inspector pane, you can now configure the emotions and expressions you are interested in (the more you select the worse performance will be, so only select the ones you need):
Alternatively, you can enable and disable emotion and expression classifiers programmatically. Here is an example of how to enable the detection of the smile classifier, by calling the Detector class’ SetExpressionState method:
void SetExpressionState(Expressions.Smile, true);
By default, all classifiers are disabled. Every classifier you enable will take a bit more system resources.
The Affdex classifier data files are used in frame analysis processing. These files are supplied as part of the asset. The location of the data files on the physical storage must remain as:
During processing, the VideoFileInput script decodes and processes frames as fast as possible, and actual processing times will depend on CPU speed. Please see this list of accepted file types and recommended video codecs that are compatible with the detector.
The VideoFileInput script is meant more as an example or for testing than for use in an actual game. Android and iOS don’t support the MovieTexture that this script relies on, and thus it cannot be used on those platforms. After adding it to a scene, you can set a default video and a sample rate. The sample rate defines how many times per second to pass the video frames to the Detector for processing metrics. As an example, if the video is 60 frames per second (YouTube’s currently supported frame rate) and you have the sample rate set to 20, then 20 of the 60 frames per second will be processed. If the video has no camera cuts, and one consistent face, than a sample rate as low as 5 should be sufficient.
The Detector uses callback methods to communicate events and results:
The ImageResultsListener interface defines methods that are invoked when the Detector has started or stopped tracking a face, and when it has detection results for a face. The OnFaceLost, OnFaceFound, and OnImageResults methods must be defined as part of a class attached as a component within Unity. Here is an example:
public class PlayerEmotions : ImageResultsListener
public float currentSmile;
public float currentInterocularDistance;
public float currentContempt;
public float currentValence;
public float currentAnger;
public float currentFear;
public FeaturePoint featurePointsList;
public override void onFaceFound(float timestamp, int faceId)
Debug.Log("Found the face");
public override void onFaceLost(float timestamp, int faceId)
Debug.Log("Lost the face");
public override void onImageResults(Dictionary<int, Face> faces)
Debug.Log("Got face results");
foreach (KeyValuePair<int, Face> pair in faces)
int FaceId = pair.Key; // The Face Unique Id.
Face face = pair.Value; // Instance of the face class containing emotions, and facial expression values.
//Retrieve the Emotions Scores
face.Emotions.TryGetValue(Emotions.Contempt, out currentContempt);
face.Emotions.TryGetValue(Emotions.Valence, out currentValence);
face.Emotions.TryGetValue(Emotions.Anger, out currentAnger);
face.Emotions.TryGetValue(Emotions.Fear, out currentFear);
//Retrieve the Smile Score
face.Expressions.TryGetValue(Expressions.Smile, out currentSmile);
//Retrieve the Interocular distance, the distance between two outer eye corners.
currentInterocularDistance = face.Measurements.interOcularDistance;
//Retrieve the coordinates of the facial landmarks (face feature points)
featurePointsList = face.FeaturePoints;
OnImageResults delivers the detection results via a dictionary of Face objects, which contain the values of all expressions and emotions for a face in a frame. It also allows you to get the interocular distance, facial feature point locations, and the orientation of the face. Note that in the current release, the dictionary will contain at most one face; multiple faces are not supported yet.
For a fully implemented sample, check out EmoSurvival. You can use OnFaceLost to pause a game. If you use Time.timeScale to pause, the camera script will also pause, as it uses the same time values.
The Frame class is used for passing images to the Detector. The Frame class’ constructor requires the width and height of the frame, the color format of the incoming image must be supplied, and a timestamp. If the source of the image content has an associated timestamp, you should use that; otherwise, if the images are coming from a real time source, you could use Time.realtimeSinceStartup.
There are two versions of the Frame constructor. The first expects an upright image:
Frame(Color32 rgba, int width, int height, float timestamp);
The second requires the orientation of the image:
Frame(Color32 rgba, int width, int height, Orientation orientation, float timestamp);
The Face class represents a face found within a processed frame. It contains results for detected expressions, emotions, and head measurements.
The Face object also enables users to retrieve the feature points associated with a face:
Expressions is a representation of the probabilities of the facial expressions detected. Each value represents a probability between 0 to 100 of the presence of the expression in the frame analyzed:
Emotions is a representation of the probabilities of the emotions detected. Each value represents a probability between 0 to 100 of the presence of the emotion in the frame analyzed. Valence, a measure of positivity or negativity of the expressions, ranges from -100 to 100:
Measurements is a representation of the head and face measurements. The Interocular distance is defined as the distance between the two outer eye corners in pixels:
<img src=”../images/graphic3.png” align=right>
Orientation is a representation of the orientation of the head in a 3-D space using Euler angles (pitch, yaw, roll):
FeaturePoint is the cartesian coordinates of a facial feature on the source image and is defined as the following:
See the feature point indices table for a full list of feature points.