Making Home Activity Datasets a Shared Resource

Researchers in computer science, engineering, user-interface design, and telemedicine fields are developing technologies for the home that leverage new sensing capabilities to create context-aware environments. Context-aware environments can proactively deliver information to occupants wherever and whenever it is appropriate based on the occupants’ activities. Context is detected using inference algorithms that process data from sensors distributed throughout the environment, worn on the body, or both. Systems that can detect context in the home may enable novel computer applications for communication, education, work, entertainment, and health.

The Context-Aware Environment

For example, context-aware environments may be able to provide medical support and monitoring for people in their own homes as they age. If technology can be used to provide more care in the home and reduce or shorten hospital visits, more effective and preventive care might be delivered at lower cost. Recent work in academic and corporate labs (much of it funded by organizations such as the National Science Foundation) has demonstrated the viability of using sensors to automatically detect and monitor health-related activity in the home setting. Researchers are having difficulty, however, longitudinally testing their context-detection algorithms with realistic data from home settings once they have proven feasibility with small datasets they generate themselves. Deploying sensors in homes for longitudinal research is costly and time-consuming. For instance, even if the technology being studied is relatively simple to deploy, for most research privacy-invasive audio-visual observational sensors must also be used so that a comparison can be made between what the algorithms infer and what actually happened. To perform this comparison, the audio-visual data must be painstakingly annotated by trained experts. Researchers must develop procedures for and dedicate resources to recruiting participants, instrumenting spaces, synchronizing data-streams, protecting the privacy of participants, collecting data, and annotating datasets. Because of these barriers, most work on context-awareness in the home published to date has been on datasets of only a few hours or days of activity, often collected in the homes of the researchers themselves or using a “simulated protocol,” which may not reflect the complexity of behavior in most homes.

The proposed work will reduce this research barrier. A new community resource will be created over three years that consists of six datasets of dense, multi-modal sensor records of activity of volunteers who live in their own homes with sensor instrumentation unobtrusively installed by the researchers for four months. This sensor infrastructure is presently being stress-tested by researchers from multiple fields. Pilot tests have already resulted in a set of tools and procedures for working with participants, maintaining sensing technologies, managing rich datasets, and supporting iterative annotation.

Dataset-Sharing within the Scientific Community

The community resource will include high-quality, synchronized data streams from most of the sensor types currently being used in active research (object usage, RFID, wearable accelerometers, video, audio, etc.). The intellectual merit of this proposal will result from shared resources that allow researchers to focus on developing and testing in-home techniques for context-detection, without being stalled by the steep requirements of data collection. Researchers will be able to quickly assess the viability of inference algorithms they propose on datasets that capture realistic home behavior. Furthermore, since the datasets generated through this work will contain most of the commonly-used sensor types, researchers will be able to compare algorithms across different behaviors of interest and sensor modalities. The datasets created through this work will allow researchers to determine the minimal set of sensors required to reliably detect specific activities, thereby making larger scale testing of prototypes in other homes more practical. Rather than requiring months of work to collect data and run a test of an algorithm on data from a home, using the community resource researchers will be able to quickly try different approaches. Shared datasets in fields such as speech recognition and computer vision have accelerated research on problems such as speech and face recognition. No such datasets currently exist for researchers studying home behavior and activity detection.


The broader impact of the proposed research will be to accelerate work on the use of information technology to create novel applications in the home, especially those designed to help people stay healthy as they age. The datasets should assist researchers creating context-sensitive user interfaces that are customized to particular individuals and automatically present information at times when it is needed. This work will have application in the fields of user interface design, preventive healthcare, and ubiquitous computing. Researchers studying communication, education, health, work practices, and energy conservation may all be able to learn how to create easy-to-use and meaningful home technologies from analysis of the datasets this work would produce.

For Education

The educational mission of the proposed work is creating unique, freely-available multi-modal datasets of home activity that can be used by graduate and undergraduate students interested in developing advanced ubiquitous computing technology for the home. The datasets will be valuable both for research and as teaching aids.