wiki:EMMA

EMMA

The Extensible Multimodal Annotation Language EMMA, a W3C Recommendation, is “an XML markup language for containing and annotating the interpretation of user input”⁠. As such, it is a wrapper language that can carry various kinds of payload representing the interpretation of user input. The EMMA language itself provides, as its core, the <emma:interpretation> element, containing all information about a single interpretation of user behaviour. Several such elements can be enclosed within an <emma:one-of> element in cases where more than one interpretation is present. An interpretation can have an emma:confidence attribute, indicating how confident the source of the annotation is that the interpretation is correct; time-related information such asemma:start,!emma:end, andemma:duration, indicating the time span for which the interpretation is provided; information about the modality upon which the interpretation is based, through the emma:medium and emma:mode attributes; and many more.

The following listing shows an example EMMA document carrying an interpretation of user behaviour represented using EmotionML. The interpretation refers to a start time. It can be seen that the EMMA wrapper elements and the EmotionML content are in different XML namespaces, so that it is unambiguously determined which element belongs to which part of the annotation.

<emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
  <emma:interpretation emma:start="123456789">
    <emotion xmlns="http://www.w3.org/2009/10/emotionml" dimension-set="http://www.example.com/emotion/dimension/FSRE.xml">
      <dimension name="arousal" value="0.23"/>
      <dimension name="valence" value="0.62"/>
    </emotion>
  </emma:interpretation>
</emma:emma>

EMMA can also be used to represent Automatic Speech Recognition (ASR) output, either as the single most probable word chain or as a word lattice, using the<emma:lattice>element.

Details

Skeleton for all EMMA documents:

All EMMA documents MUST have a top-level element, and SHOULD have at least one element. That interpretation SHOULD a time stamp, given in its attribute "emma:offset-to-start", and MAY have a confidence, given in the attribute "emma:confidence".

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">
      my annotation
    </emma:interpretation>
</emma:emma>

If the document contains a single annotation, the element SHOULD be a direct child of . A sequence of interpretations can be represented using the element, as for keywords spotted. A collection of interpretations with different probabilities can be represented using the element, as e.g. for interest.

For the individual types of content / payload, we use by default the same representation as for the "current best guess" user state, unless there are reasons speaking against it.

We distinguish verbal information, emotion-related information, and non-verbal information.

Verbal information

Type of informationTopic
keywords spottedstate.user.emma.words

Keywords

<emma:emma version="1.0"
  xmlns:emma="http://www.w3.org/2003/04/emma">
  <emma:sequence emma:offset-to-start=”12345” emma:duration=”110”>
    <emma:interpretation 
      emma:offset-to-start="12345"
      emma:tokens="bla" 
      emma:confidence="0.3"/>
    <emma:interpretation 
      emma:offset-to-start="12390"
      emma:tokens="bloo" 
      emma:confidence="0.4"/>
  </emma:sequence>
</emma:emma>

Emotion-related information

Type of informationTopic
emotionstate.user.emma.emotion.(modality)
intereststate.user.emma.emotion.(modality)

Emotion

The global user emotion is represented using the five dimensions intensity, arousal, valence, unpredictability and potency.

<?xml version="1.0" encoding="UTF-8"?><emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
   <emma:interpretation>
      <emotion xmlns="http://www.w3.org/2009/10/emotionml" dimension-set="http://www.example.com/emotion/dimension/FSRE.xml">
         <intensity confidence="0.30086732" value="0.4115755"/>
         <dimension confidence="0.9518124" name="arousal" value="0.1852386"/>
         <dimension confidence="0.2734806" name="valence" value="0.7791835"/>
         <dimension confidence="0.22194415" name="unpredictability" value="0.09359175"/>
         <dimension confidence="0.2912501" name="potency" value="0.050632834"/>
      </emotion>
   </emma:interpretation>
</emma:emma>

Interest

User interest is represented using a custom vocabulary of interest-related category labels: bored, neutral, and interested. The confidence is used to indicate the extent to which each of the three categories is recognised.

<?xml version="1.0" encoding="UTF-8"?><emma:emma xmlns:emma="http://www.w3.org/2003/04/emma" version="1.0">
   <emma:interpretation>
      <emotion xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.semaine-project.eu/emo/category/interest.xml">
         <category confidence="0.6955442" name="bored"/>
      </emotion>
      <emotion xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.semaine-project.eu/emo/category/interest.xml">
         <category confidence="0.24825269" name="neutral"/>
      </emotion>
      <emotion xmlns="http://www.w3.org/2009/10/emotionml" category-set="http://www.semaine-project.eu/emo/category/interest.xml">
         <category confidence="0.6315944" name="interested"/>
      </emotion>
   </emma:interpretation>
</emma:emma>

Non-verbal information

Type of informationTopic
head movementstate.user.emma.nonverbal.head
user speakingstate.user.emma.nonverbal.voice
pitch directionstate.user.emma.nonverbal.voice
genderstate.user.emma.nonverbal.voice
nonverbal vocalizationsstate.user.emma.nonverbal.voice
face presencestate.user.emma.nonverbal.face
action unitsstate.user.emma.nonverbal.face

Head movement

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:duration="444" emma:confidence="0.3">

      <bml:bml xmlns:bml="http://www.mindmakers.org/projects/BML">
          <bml:head type="NOD" start="12.345" end="12.789"/>
      </bml:bml>

    </emma:interpretation>
</emma:emma>

The payload format is the same as for user state -- here it is below emma:interpretation, there it is below semaine:user-state.

Note that the proposal includes the redundant specification of time: start time is in emma:interpretation/@emma:offset-to-start (in milliseconds), and in bml:head/@start (in seconds). End time is given indirectly by emma:interpretation/@emma:duration (in milliseconds), and directly through bml:head/@end (in seconds). Experience will tell whether this double representation is useful.

Possible values for /emma:emma/emma:interpretation/bml:bml/bml:head/@type: NOD, SHAKE, TILT-LEFT, TILT-RIGHT, APPROACH, RETRACT. Left and right are defined subject centred (i.e. left is left for the user).

User speaking

The output of the voice activity detection (VAD) / the speaking detector looks like this. It needs no confidence. The Speaking Analyser (part of the TumFeatureExtractor) outputs messages when the user starts or stops speaking. These messages are low-level messages, created directly from the VAD output, smoothed only over 3 frames. Thus, some thresholds must be applied in other components to reliably detect continuous segments where the user is really speaking.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">

        <semaine:speaking xmlns:semaine="http://www.semaine-project.eu/semaineml" statusChange="start"/>

    </emma:interpretation>
</emma:emma>

Possible values for /emma:emma/emma:interpretation/semaine:speaking/@statusChange : start, stop

Pitch direction

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:duration="444" emma:confidence="0.3">

        <semaine:pitch xmlns:semaine="http://www.semaine-project.eu/semaineml" direction="rise"/>

    </emma:interpretation>
</emma:emma>

The core difference with the user state is that here we have a start and a duration.

Possible values for /emma:emma/emma:interpretation/semaine:pitch/@direction : rise, fall, rise-fall, fall-rise, high, mid, low

Gender

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">

      <semaine:gender name="female" xmlns:semaine="http://www.semaine-project.eu/semaineml"/>

    </emma:interpretation>
</emma:emma>

Possible values of /emma:emma/emma:interpretation/semaine:gender/@name : male, female, unknown

Nonverbal vocalisations

Any non-verbal vocalizations produced by the user.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">

        <semaine:vocalization xmlns:semaine="http://www.semaine-project.eu/semaineml" name="(laughter)"/>

    </emma:interpretation>
</emma:emma>

Possible values for /emma:emma/emma:interpretation/semaine:vocalization/@name : (laughter), (sigh), (breath)

Face presence

Whether there is a face currently present.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">

        <semaine:face-present xmlns:semaine="http://www.semaine-project.eu/semaineml" statusChange="start"/>

    </emma:interpretation>
</emma:emma>

Possible values for /emma:emma/emma:interpretation/semaine:face-present/@statusChange : start, stop

Action units

Any action units recognised from the user's face.

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma">
  <emma:group>
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.3">
        <bml:bml xmlns:bml="http://www.mindmakers.org/projects/BML">
            <bml:face au="1"/>
        </bml:bml>
    </emma:interpretation>
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.4">
        <bml:bml xmlns:bml="http://www.mindmakers.org/projects/BML">
            <bml:face au="2"/>
        </bml:bml>
    </emma:interpretation>
    <emma:interpretation  emma:offset-to-start="12345" emma:confidence="0.2">
        <bml:bml xmlns:bml="http://www.mindmakers.org/projects/BML">
            <bml:face au="4"/>
        </bml:bml>
    </emma:interpretation>
  </emma:group>
</emma:emma>

Possible values for /emma:emma/emma:interpretation/bml:bml/bml:face/@au : a single integer number

Last modified 7 years ago Last modified on 12/15/10 09:24:33