As discussed in earlier blogs, I’ve made many attempts to refine the user interface for Watchdog to optimize usage ranging from ANSI graphical interfaces to touchscreens. At work, I learned about how Microsoft had continued to evolve their Speech API in .NET Framework and I decided to fiddle around with it a bit on my own. I was delighted to learn that it was rich, powerful and easy to use. I became convinced that this could be the next step in the evolution of the Watchdog interface – 2-way voice interaction leveraging the real-time house status information in the Watchdog database and RSS feeds from the Internet.
In an effort to further enhance the usability, I decided to give Watchdog a personality which included a consistent, distinct name and voice. At the time, my wife and I were enjoying a series on the SyFy channel called “Eureka” where the sheriff of a small town lives in an ultra high-tech, AI smarthouse named SARAH. In the TV show, SARAH stands for “Self-Actuated, Residential, Automated Habitat”. I adopted the name for my creation and used the “Microsoft Mary” voice in my application.
Over time, the list of functions SARAH can understand has grown. Looking back, I realized I had an equivalent of Apple’s “Siri” years ahead. Here is a sample of her initial vocabulary and functions:
House Information – SARAH provides the current status about doors, windows and locks
- Name a room
- Name a level (House, Upstairs, Mail Level, Basement, etc.)
House Temperature – SARAH provides current status about temperature
- House Temperature
- Upstairs Temperature
- Main Level Temperature
- Server Room Temperature
- Garage Temperature
Weather – SARAH uses RSS feeds to get real-time information for weather conditions outside the house
- What’s the weather like?
- Weather Today
- What should I wear today? (uses a custom decision tree based on temperature, chance of precipitation and other available data points)
- What should I wear tomorrow?
Other Questions – Other things SARAH knows how to answer
- What time is it?
- What day is it?
- What can I say? (gives a brief summary of the commands she can respond to in case you forget).
- What is the air speed velocity of a laden swallow? (yes, big Monty Python fan).
I ran SARAH from a server in the basement and used the house wiring to allocate 2 wires for a microphone in the kitchen. The microphone had an on/off switch so we could avoid having her respond to everything. When she responded, her voice could be heard throughout the house over the intercom system. It was a bit weird to be in another room while someone was using her because you only heard half the conversation.
Later, I moved her to a dedicated PC in my bedroom. I have a low-profile microphone and speaker on my wife’s nightstand. The Microphone has a mute button so you just push and talk.
SARAH works well, but has a few quirks:
- The Microsoft SAPI does not hear female voices well, for some reason. My son and I have no problem interacting with SARAH, but my wife and daughter do. Typically, to be understood, they speak in a false baritone voice and she responds. Strange. There’s no configuration in the API to address this.
- Over time, the RSS feeds were changed by the host. SARAH will simply say “I’m unable to process that request at this time”. When I hear that repeatedly, I know it’s time to dig back into the code and re-write the interface. I’ve reduced this by using XSLT’s to filter out only the XML tags I want so I don’t have to traverse the whole structure.
Later, I built the SARAH interface into a “Magic Mirror”. See this other post for more details.