Proposed strategies for annotating datasets

The goal of annotating this data is to create a shared database of multimodal sensor data for high-level naturalistic home behaviors to aid in the development of context-aware technologies. Our overall goal is to provide high-level labels of activity for as much data as is practical.

Issue 1: What is multitasking?
The operational definition of multitasking involves switching attentional focus back and fourth between multiple activities within a short period of time. While some activities are performed independently (e.g showering), many are undertaken in concert with others. For example, watching television is often a supplement to other activities.
  • There are aspects of attention, intention, and focus involved
    • Attention to one activity in favor of another
    • Intention to return to a prior activity is assumed, but may be overridden by interruptions or attentional capture
  • One potential model is mental "to do" list for activities underway or to be undertaken shortly
  • Certain activities (e.g., using a dishwasher, or doing laundry) are explicitly "background" tasks when multitasking, but may feature a cue (e.g., end-of-cycle alarm) that attracts the participant's attention when appropriate

When annotating, multiple activity labels can be simultaneously active. A layered annotation strategy can indicate where the subject's attention is focused. As priority switches from one activity to another, the alternately rising and falling levels illustrate the dynamic shifts in processing resources while multitasking. Figure 0.1 visually represents the flowing of attention between multiple tasks.

Figure 0.1

Figure 0.2

In this scenario, a subject enters the office (21:52:48.481) and sits down at the computer where a beverage was previously placed (21:52:49.647). The subject grasps the mouse and begins interacting with the computer (21:52:50.647). After (roughly) a minute the subject picks up the beverage (21:53:15.814) and takes a drink (21:53:18.481). Finishing the beverage (21:53:21.314) he/she gets up from the computer and heads to the kitchen (21:54:54.481). Once the subject places the beverage can in the trash he/she is definitely no longer drinking (21:56:22.314).

Issue 2: Start/Stop time markers
In the past annotation strategy focused on exclusively marking the beginning and termination of activities.
  • Pros
    • Easy and efficient for annotator to apply
  • Cons
    • Difficult within streams of natural behavior to determine the precise moment when an activity has begun or ended
    • Does not provide additional information about what labels an activity recognition algorithm should weight heavily
    • Does not provide information about quality of annotation that may change over the course of the event

Issue 3: Ambiguity of Data
There is a certain amount of ambiguity inherent in the data collected in natural environments (e.g camera blind spots, privacy issues etc). When translated to annotations, this ambiguity leads to uncertainty on the part of the annotator and potential errors for future applications.
  • Multiple sources of ambiguity:
    • Starting/stopping/unclear separation of activities
    • Definitions of activities may not be clear or easily applied to observed video
    • Granularity of labels (specific or broad) affect ease of application. Fine-grained labels are easy to apply, but more work-intensive
    • Presence/absence of sensor data should be noted or inferable from data stream
    • Behaviors that indicate preparation for an activity but are not intrinsic to the common sense definition of the activity might be included in the annotation, but at lower certainty values
    • If an individual is engaged in an ongoing activity but briefly focuses on another task, the pause may be reflected by marking the first activity with lower certainty
    • Long pauses that clearly interrupt the current activity may result in a rating of “Not happening” for that activity

Issue 4: Foreground/Background labels
Common sense definitions of home behavior suggest that there is a subset of activities that involve both foreground components (tasks involving active attention or interaction) and background components (tasks that continue without overt attention to the activity). Background components represent a specific subset of processes that must be actively initiated, but continue until a predefined cycle is completed or further action is taken by the participant.

These activities typically involve appliances that carry out a specific function without continuous direct manipulation. Such activities (e.g., cooking, washing dishes, listening to music) will be annotated with a '[background]' notation and should be applied when the process or cycle is initiated by the user (e.g., turning on the stove, setting washing machine) and left open until the user discontinues the process, or there is an obvious signal that a predefined cycle has terminated automatically (e.g., washing machine has beeped, user opens microwave oven). When the user is actively manipulating controls of such appliances or interacting with the appliance, the standard label without [background] should be applied in addition to the background label.

Foreground and Background Activity Labels:
  • dishwashing
    • dishwashing [background]
  • information
    • listening to music/radio/audio content [background]
    • watching TV/movie/video content [background]
  • laundry
    • doing laundry [background]
  • meal preparation
    • cooking or warming food [background]
    • preparing a drink [background]

The certainty ratings (possibly, likely, definitely) represent a separate scale from foreground/background, such that including FG/BG ratings in the same UI element renders it impossible to annotate certainty for background activities. Therefore, these two ratings systems much be distinct entities in order to apply certainty to all activities whether occurring in the foreground or background.

Issue 5: Prospective v.s Retrospective Annotations
Annotators can approach the rating process from one of two temporal perspectives: prospective or retrospective. Prospective annotators employ the confidence ratings as a means for expressing the potential for actions to happen while they watch the data moving forward in real time. This is similar to the perspective that potential context-detection algorithms will employ; taking activity input in real time and predicting the likelihood of future actions.

Retrospective annotation, on the other hand, allows annotators to freely move back and forth in time to identify and label the actions associated with a particular activity.

Activity narrative
Prospective annotations
Retrospective annotations
Subject A enters the kitchen holding an empty glass and approaches the sink
possibly preparing a beverage
possibly rinsing or hand washing dishes
possibly loading the dishwasher
Possibly rinsing or hand washing dishes
Possibly loading the dishwasher
Subject A reaches to turn on water
Likely preparing beverage
Likely rinsing or hand washing dishes
Possibly loading the dishwasher
Likely rinsing or hand washing dishes
Possibly loading the dishwasher
Subject A places cup under faucet
Likely preparing beverage
Definitely rinsing or hand washing dishes
Possibly loading the dishwasher
Definitely rinsing or hand washing dishes
Likely loading the dishwasher
Subject A turns off the faucet after emptying the cup of all water
Not happening preparing a beverage
Likely rinsing or hand washing dishes
Possibly loading dishwasher
Likely rinsing dishes
Likely loading dishwasher
Subject A reaches to open the dishwasher
Possibly rinsing or hand washing dishes
Likely loading dishwasher
Possibly rinsing dishes
Likely loading dishwasher
Subject A opens the top rack of the dishwasher and places the cup in
Not Happening rinsing or hand washing dishes
Definitely loading the dishwasher
Not Happening rinsing or hand washing dishes
Definitely loading the dishwasher
Subject A closes the dishwasher and turns to exit the kitchen
Not Happening loading the dishwasher
Not Happening loading the dishwasher

Proposed Strategy
Our current strategy addresses these presented issues as follows:

Issue 1: Multitasking
  • Naturalistic behaviors have a tendency to deviate from their prototypical definitions. This most commonly occurs during multitasking, as the shifting of focus from one task to another leaves some ambiguity as to which activity the subject is currently engaged in. Multiple activities being performed simultaneously can obscure the delineation between tasks.
  • To address these potential deviations we are employing a system with four levels, defined by confidence rating, to each activity: "not happening", "possibly", "likely" and "definitely". These levels provide a structure to differentiate between the essential and peripheral/preparatory actions associated with observed higher-order activities. Ratings of possibly or likely or definitely serve as moderators for how closely the observed behavior fits the prototypical definition of the activity. “Not happening” serves as an identifier for the end of an action that is in progress.
  • These values are time stamped and stored as integers (0-3) in the annotation data file along with the activity labels. While an activity is ongoing, annotators are encouraged to modify their certainty ratings according to whether the observed behavior increases or decreases the likelihood that the activity is occurring.

Issue 2: Start/Stop Markers
  • Our system of confidence ratings in annotations allows for either a focus on specifically the prototypical actions associated wtih an activity (i.e. times marked definitely) or a capacity to observe the behaviors associated with the preparation and resolution of activities (from possibly - likely - definitely).

Issue 3: Ambiguity
  • Currently, our system of annotation employs the "possibly," "likely" etc. system as a means to communicate two pieces of information:
    1. How closely the activity matches the definition
    2. Whether some form of ambiguity (e.g. poor camera angle, out of view etc.) is present in the data.
  • A potential future solution could be a system of "flagging" annotations where this uncertainty is present. Attached to each flag could be a meta data description indicating the source of ambiguity, and possibly a rating of the degree. Annotator-generated notes could be applied to the label where further explication is required.
  • Our goal of creating as much quality data as possible is best served by separating suspect annotations from labels of definite quality. Therefore, if ambiguity becomes apparent in the middle of a quality annotation, end the current label and begin a new one with a flag.

Issue 4: Foreground vs. Background
  • Our current strategy involves separating the activity labels into their foreground and background components. This eliminates the complication of integrating them into the annotation UI and helps dispel some of the ambiguity regarding what defines an activity as FG v.s BG.

Issue 5: Prospective vs. Retrospective
  • Currently, we are using retrospective annotation to accurately create datasets that can serve as a basis for training algorithms.