Act Instruction Format

This specification defines the syntax and semantics of a behavior definition language. A service may send instructions encoded in this format for clients such as 3D game engines to realize. These messages include instructions about positioning objects, activating or deactivating systems, changing scenes or contexts, controlling visual properties, animating characters, and playing audio.

Message Format

This section is normative.

Behavior Interface

The atomic unit that can be immediately realized by a client is called a behavior. Consider the sentence, "The boy says hello." By analyzing the grammar, we find the subject is "boy", the verb is "says", and the direct-object is "hello". The intuition behind the behavior interface is to represent a similar structure.

        [Exposed]
        interface Behavior {
          attribute long id;
          attribute DOMString subject;
          attribute DOMString action;
          attribute Params params;
          attribute double delay;
          attribute Dictionary start;
          attribute SequenceOfStrings cc;
        };

        typedef sequence<DOMString> SequenceOfStrings;

id attribute

The id specifies the unique identifier. Note that id's are non unique across different blocks.

subject attribute

The subject attribute is the name of the agent that is performing an action. In grammar, this is often called the subject of a verb. Examples include:

              "boy"

And even abstract nouns such as the following.

              "system"

action attribute

Each subject affords a set of actions. The action attribute specifies one of those available actions. For example, if the subject is "boy", a possible action may look as follows.

            "say"

And if the subject is "system", a possible action may be an abstract concept as follows.

            "initialize"

params attribute

The Params attribute specifies the data-structure containing all data to be received and manipulated by the client. See Params below.

          [Exposed]
          interface Params {
            attribute String intent;
            attribute String ssml;
            attribute String name;
            attribute String url;
            attribute Phonemes phonemes;
            attribute String language;
            attribute Dictionary context;
          };

intent attribute

The intent attribute specifies what a user is trying to accomplish. An intent is specified within a String data type.

                "statement.i-can-help"

ssml attribute

The ssml attribute specifies the speech synthesis markup language for the specified intent. The ssml is specified within a String data type.

                "I can help you with that!"

name attribute

The name attribute specifies the name of the audio file to be played.

                "GO_0020"

url attribute

The url attribute specifies the source location of the audio file to be played. The url attribute is specified within a String data type.

                "http://audioURL..."

language attribute

The language attribute specifies the tongue in which any text, or audio will be communicated in. The language is specified by the three letter codec abiding ISO-639. The following example shows the three letter codec for the English language.

              "eng"

phonemes attribute

The phonemes attribute is an object containing the phonetic translation and total number of frames belonging to the corresponding ssml text. See Phonemes below.

            {
              "segments": [
                {
                  "phonemeLabel": "IY",
                  "startFrame": 3
                }
              ],
              "framecount": 202
            }

            typedef sequence< Dictionary > dictionarySequence;
                
            [Exposed]
            interface Phonemes {
              attribute dictionarySequence segments;
              attribute long framecount;
            };

segments attribute

A custom typedef was defined such that a sequence of Dictionary's corresponds to a dictionarySequence. Each Dictionary object within the segments attribute is a Dictionary containing a pair of key:value attributes. The first item in each nested Dictionary is specified by a String:String pairing denoting a phonetic label. The second Dictionary item corresponds to a String : double pairing corresponding to the start frame of the phonetic label.

              {
                  "phonemeLabel": "IY",
                  "startFrame": 3
              }

framecount attribute

The framecount attribute specifies the total number of feature frames of the transcribed ssml text.

context attribute

The context attribute specifies a Dictionary as defined in [[WebIDL-1]]. The dictionary members are specified in key:value pairs where both key and value of of type String.

            {
              "category": "colors",
              "gamestate" : "1",
              "target" : "blue"
            }

delay attribute

The delay attribute specifies the number of seconds to wait to perform a corresponding action through a double data type.

0.0

start attribute

The start attribute is an optional Dictionary argument that specifies when the Behavior should begin its execution, specified by a trigger and an id.

trigger key

The trigger attribute specifies the point at which the Behavior corresponding to the specified id should begin its execution. The trigger value may be "start" or "end".

id key

The id key specifies the Behavior that will be affected by the corresponding trigger value.

              {
                  "trigger" : "end",
                  "id": 0
              }

cc attribute

In defining the cc attribute, a custom typedef was used such that a sequence of the String's corresponds to a SequenceOfStrings data type. The String data type defined in [[WebIDL-1]] is used. The cc attribute is then specified by a sequence of DCMP standard abiding String's, containing the textual representation of any corresponding events within the message.

The following example describes the closed captioning for the event where a character named Frog excitedly says 'Hi there!'.

              [
                  "[Frog excited]",
                  "Hi there!"
              ]

The following example describes the closed captioning for the event where a character named Pig says 'Welcome back!' in a Southern Accent.

              [
                  "[Pig Southern Accent]",
                  "Welcome back!"
              ]

The following example shows the closed captioning for the event where three characters are simultaneously cheering.

              "[Frog, Pig, Goose cheering]"

The following example shows the closed captioning for the event where a Cow is Mooing.

              "[Cow moo's]"

Blocks

This section is informative.

In order to synchronize behavior (especially the "say" and "animate" actions), we propose to use the "message block". A message block is an ordered sequence of Behavior objects, where each object conforms to the Behavior message format specified within this document. The server sends sequences of blocks, and the client (e.g. Unity) must fully process one block of Behavior before reading the next block. While processing a single block of Behaviors, the client should schedule Behaviors by putting them on "tracks"(every action of every object has a track), then all the tracks start simultaneously.

Examples

This section is informative.

Suppose a humanoid agent named goose stands near a landmark labelled Landmark_1.

Goose walks towards a landmark named Landmark_1.

        [
          {
            "subject": "goose",
            "action": "do",
            "params": {
              "name": "walk",
              "context": ["Landmark_1"]
            }
          }
        ]

Goose points at a landmark named Landmark_1.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "enabled"]
            }
          }
        ]

Goose points at a landmark named Landmark_1, and stops after 2 seconds.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "enable"]
            }
          },
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "disable"]
            },
            "delay": 2.0
          }
        ]

Goose looks at a landmark named Landmark_1.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "look",
              "context": ["Landmark_1", "enable"]
            }
          }
        ]

Goose looks at a landmark named Landmark_1, and stops after 2 seconds.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "look",
              "context": ["Landmark_1", "enable"]
            }
          },
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "look",
              "context": ["Landmark_1", "disable"]
            },
            "delay": 2.0
          }
        ]

Goose turns body to a landmark named Landmark_1.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "turn",
              "context": ["Landmark_1"]
            }
          }
        ]

Goose walks towards a landmark named Landmark_1 while pointing at it.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "enable"]
            }
          },
          {
            "object": "goose",
            "action": "walk",
            "params": {
              "name": "point",
              "context": ["Landmark_1"]
            }
          }
        ]

Goose walks towards a landmark named Landmark_1 while pointing at it, then stops pointing.

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "enable"]
            }
          },
          {
            "object": "goose",
            "action": "walk",
            "params": {
              "name": "point",
              "context": ["Landmark_1"]
            }
          }
        ]

        [
          {
            "object": "goose",
            "action": "do",
            "params": {
              "name": "point",
              "context": ["Landmark_1", "disable"]
            }
          }
        ]

In the following example we examine events A and B: we want to enforce that event B can't start until event A has started.

Right when Goose starts speaking, Fox jumps.

        [
          {
            "subject": "goose",
            "action": "say",
            "params": {"name": "GO_0020" , "ssml": "I can help you with that!"}
          },
          {
            "subject": "fox",
            "action": "animate",
            "params": {"name": "FX_Jump"}
          }
        ]

In this example we want to run the statements in a particular sequence. Specifically, B can't start until A has finished. In order to achieve the desired order, we place each behavior in a separate block.

Only when Goose is done speaking, is the Fox able to jump.

        [
          {
            "subject": "goose",
            "action": "say",
            "params": {"name": "GO_0020" ,"intent": "statement.i-can-help","ssml": "I can help you with that!"}
          }
        ]

        [
          {
            "subject": "fox",
            "action": "animate",
            "params": {"name": "FX_Jump"}
          }
        ]

In this example we show how to sequence actions together.

Goose waves hands as it starts speaking. After hand-waving animation completes, sound effect plays.

        [
          {"id": "0", "subject": "goose", "action": "animate", "params": {"name": "wave-hands"}},
          {"subject": "goose", "action": "say", "params": {"name": "GO_0020" ,"intent": "greeting"}},
          {"subject": "sound", "action": "play", "params": {"name": "sfx.boom" ,"intent": "sfx.boom"}, "start": {"trigger" : "end", "id":0}}
        ]

In this example we show how to sequence actions together.

Goose waves hands as it starts speaking. After audio of utterances completes, then the sound effect plays.

      [
        {"subject": "goose", "action": "animate", "params": {"name": "wave-hands"}},
        {"subject": "goose", "action": "say", "params": {"name": "GO_0020" ,"intent": "greeting"}},
        {"subject": "sound", "action": "play", "params": {"name": "sfx.boom"}, "start": {"trigger" : "end", "id":1}}
      ]

In this example we show how to control different subjects in parallel.

Goose and fox cheer "yay!", together at the same time.

      [
        {"subject": "goose", "action": "say", "params": {"name": "GO_0020","intent": "yay"}},
        {"subject": "fox", "action": "say", "params": {"name": "FX_0020","intent": "yay"}}
      ]

In this example we show how to control different subjects in parallel.

Goose and fox cheer together, and then jump together.

      [
        {"subject": "goose", "action": "say", "params": {"name": "GO_0020" ,"intent": "yay"}},
        {"subject": "fox", "action": "say", "params": {"name": "FX_0020" ,"intent": "yay"}}
      ]

      [
        {"subject": "goose", "action": "animate", "params": {"name": "jump"}},
        {"subject": "fox", "action": "animate", "params": {"name": "jump"}}
      ]

In the following examples we show how to control different objects in parallel.

Goose and fox cheer together (goose finishes before fox). Goose's jump animation starts right after cheering. Same for Fox.

      [
        {"id": "0", "subject": "goose", "action": "say", "params": {"name": "GO_0020" ,"intent": "yay"}},
        {"id": "1", "subject": "fox", "action": "say", "params": {"name": "FX_0020" ,"intent": "yay"}},
        {"subject": "goose", "action": "animate", "params": {"name": "jump"}, "start": {"trigger" : "end", "id":0}},
        {"subject": "fox", "action": "animate", "params": {"name": "jump"}, "start": {"trigger" : "end", "id":1}}
      ]

The Goose says 'hello' and waves hand at the Fox. The Fox says 'hi' back (without interrupting the goose's utterance.

      [
        {"subject": "goose", "action": "say", "params": {"name": "GO_0020" ,"intent": "greeting"}},
        {"subject": "goose", "action": "animate", "params": {"name": "wave-hands"}},
        {"subject": "fox", "action": "say", "params": {"name": "FX_0020" ,"intent": "greeting"}, "start": {"trigger" : "start", "id":1}, "delay": 0.5}
      ]

More Examples

This section is normative.

The following are examples of objects and their associated behaviors. Note that the specified parameters for each of the following actions are normative.

`Animate`

An animate payload specifies an object to be animated by the client.

        {
          "subject": "goose",
          "action": "animate",
          "delay": 0,
          "params": { "name": "FR_Maracas_1" }
        }

`Say`

A say payload specifies what the desired subject should say.

        {
          "subject": "frog",
          "action": "say",
          "delay": 0,
          "params": {
            "intent": "statement.i-can-help",
            "ssml": "I can help you with that!",
            "name": "GO_0020",
            "url": "http://audioURL...",
            "phonemes": {
              "segments": [
                {
                  "phonemeLabel": "IY",
                  "startFrame": 3
                }
              ],
              "framecount": 202
            }
          }
        }

`Do`

A do payload specifies the name of the action the specified target should perform.

            {
              "subject": "goose",
              "action": "do",
              "delay": 0,
              "params": { "name": "walk", "context" : ["pig"]}
            }

`Play`

A play payload specifies the audio clip the client should play.

            {
              "subject": "sound",
              "action": "play",
              "delay": 0,
              "params": { "name": "sxf.bonusround.wav" }
            }

`Spawn`

A spawn payload notifies the client to display the specified object.

        {
          "subject": "system",
          "action": "spawn",
          "delay": 0,
          "params": {"name": "frog", "context": ["character2"]}
        }

`Destroy`

A destroy payload notifies the client to remove the specified object.

        {
          "subject": "system",
          "action": "destroy",
          "delay": 0,
          "params": {"name": "frog"}
        }

Interruption handling

The client may store blocks in a queue and realize the behaviors at its own pace. In the following example, the queue is cleared by calling clear action on a system subject.

        {
          "subject": "system",
          "action": "clear"
        }

Clearing the queue only removes future actions, but doesn't interrupt the current actions. The following example shows how to immediately terminte actions that are currently being performed. The params.ignore field is a list of subjects to ignore/skip, if desired.

        {
          "subject": "system",
          "action": "interrupt",
          "params": {
            "ignore": ["music"]
          }
        }

Clearing the queue and interrupting current actions are two behaviors that tend to be performed together. The following block shows how to send both behaviors at once.

        [
          {
            "subject": "system",
            "action": "clear"
          },
          {
            "subject": "system",
            "action": "interrupt",
            "params": {
              "ignore": ["music"]
            }
          }
        ]

Related Technologies

This section is informative.

BML

The following BML snippet simultaneously starts an animation and a speech utterance.

        <bml character="Alice">
            <pointing target="blueBox" mode="RIGHT_HAND" start="speech1:start"/>
            <speech id="speech1">
                <text>Look there!</text>
            </speech>
        </bml>

The equivalent behavior can be represented as follows.

      [
        {"subject": "Alice", "action": "do", "params": {"name": "point", "context": ["blueBox", "mode": "RIGHT_HAND"]}},
        {"subject": "Alice", "action": "say", "params": {"intent": "look"}}
      ]

BML documentation recommends using <wait> to align behavior with a condition or an event.

      <bml character="Alice">
        <gesture id=”g1” type=”point” target=”object1”/>
        <body id=”b1” posture=”sit”/>
        <wait id=”w1” condition=”g1:end AND b1:end”/>
        <gaze target=”object2” start=”w1:end”/>
      </bml>

The <wait> is unnecessary since we can synchronize behaviors by placing them in different blocks.

      [
        {"subject": "Alice", "action": "do", "params": {"name": "point", "context": ["object1"]}},
        {"subject": "Alice", "action": "do", "params": {"name": "body", "context": ["sit"]}}
      ]

      [
        {"subject": "Alice", "action": "animate", "params": {"name": "gaze"}}
      ]

Multi-party behavior synchronization is limited. It's non-trivial to have two characters say something at the same time as BML.

      <bml character="Alice">
          <speech><text>Yay!</text></speech>
      </bml>

      <bml character="Bob">
          <speech><text>Yay!</text></speech>
      </bml>

Act message format allows fine-control of multi-party behaviors, with a natural syntax.

      [
        {"subject": "Alice", "action": "say", "params": {"intent": "yay"}},
        {"subject": "Bob", "action": "say", "params": {"intent": "yay"}}
      ]

Introduction

Scope

Goals and Motivation

Message Format

Behavior Interface

id attribute

subject attribute

action attribute

params attribute

intent attribute

ssml attribute

name attribute

url attribute

language attribute

phonemes attribute

segments attribute

framecount attribute

context attribute

delay attribute

start attribute

trigger key

id key

cc attribute

Blocks

Examples

More Examples

`Animate`

`Say`

`Do`

`Play`

`Spawn`

`Destroy`

Interruption handling

Related Technologies

BML