Our houses are already listening to us…
There’s a quiet infiltration into our lives underway and while many people aren’t even aware it is occurring, most people will be thrilled with the potential, even if they are a bit apprehensive of it.
Many of us are already experiencing the early stages of this through things like Siri, Cortana or Google Now on our phones and these capabilities are spreading to more devices throughout our homes. At the same time, home automation is beginning to grow and soon our phones, homes, cars, and lives will be controllable with our voices.
Here are some early thoughts on the pieces at work here.
Who’s Listening?
Actually a bunch of things. Whether it is our phones as noted above, built into Smart TVs including things like Roku, Apple TV and Amazon’s Fire TV or more overtly the Amazon Echo, and even Windows 10 PCs and the Xbox Kinect, more and more of the devices around us can and do listen to us.
Many of these are currently passive1, like Roku, Apple TV and Fire TV, requiring the user to press a button on the remote to activate voice commands, but increasingly we are being surrounded by devices that actively listen to us.
Today we can turn on “Hey Siri”, “Hey Cortana” and “OK Google” on our phones, tablets and Windows 10 PCs and laptops, and these devices will actively listen for us to say those trigger words, then respond to the commands we give them. And the Amazon Echo and newly announce Google Home, come out of the box ready to respond to “Hey Alexa” and “Ok Google”. (Side note, Google needs to change this trigger word as it’s going to get really, really annoying to keep saying “ok Google”, “Ok Google”, “OK GOOGLE!!!”).
And when we talk to these devices our commands can run the gamut from “Search for House of Cards” to “Add dishwasher soap to Shopping List” to “Next time I’m at Bellevue Square remind me to pick up my mom’s gift”. And with the evolution of home automation we’ll be able to say “Turn on the living room lights” to “Set the temperature to 68 degrees between 10 pm and 6 am” to “Alert me if the garage door opens and no one’s home”2.
The challenge will be that today most of these voice activated “assistants” are siloed with our phones taking care of things like lists, reminders, calendar appointments, and emails, our smart TVs taking care of searching and controlling our media. Impressively, Amazon is already extending the Echo and Fire TV to do more than media control, information retrieval and shopping oriented tasks by adding the ability to control smart home devices.
Do We Really Need “Listeners” in Every Room?
Technicaly, no, assuming our phones are voice-enabled, we always have them with us, and assuming they are connected to all the things we want to control/do, our phones could be the only thing we need. But from a convenience standpoint, having multiple devices throughout our lives that can listen and respond to us makes sense.
Realistically we will have a web of devices; phones, PC’s, TVs, dedicated listening devices or assistants, that will provide us a seamless connection to everything we want to do. As noted above, Amazon recently announced that both the Echo and it’s new Fire TV can be used to control smart home devices. This still requires you to press the Voice button on the Fire TV remote but we can expect that shortly active listening will be built into the Fire TV, just as it is on the Echo.
Today these devices are fragmented, we have to say “Hey Roku” to control our TVs, “Hey Siri” to control our phones and “Hey Alexa” to add something to our shopping list in OneNote or Evernote. And, yes, Siri, Cortana and Google Now can all add things to shopping lists but it’s a different list than the one Alexa keeps for us and we are in the position today where we have to ask different devices to do different things.
While this is acceptable in the early days of voice-control of our lives, we will quickly want to pick one digital assistant to do everything, from anywhere in our homes, cars or literally anywhere we happen to be. What we really want is to just be able to say “Hey <insert assistant name of your choice here>, do/make/set/add/search/turn-on/turn-off/program/write/send/etc/etc/etc ” and just have it done.
Which Device Responds When We Talk?
Ok, so if all these devices are able to take our voice commands, and we really reach the point where one assistant can respond to all our needs, which device wins when we’re standing in a room with say our phone, a PC and a television and we say “Hey Cortana”?
There are two fundamental problems to be solved here:
First is we don’t want is all three of them answering at the same time. If multiple “assistant equipped” devices are in range we want them to quickly decide which one of them will respond to us, and have only one respond.
Today most assistants, including Microsoft’s Cortana all respond at the same time. For instance, sitting at my desk in my home office, when I say “Hey Cortana” both my Windows Phone and my Windows 10 PC wake up and respond.
Amazon recently announced a partial solution to this for the Echo and Echo Dot they call ESP, or Echo Spatial Perception (isn’t that spatial), which helps the Echo closest to you respond. But this is only part of the solution because it only works between Echo devices and as noted above, we don’t want siloed virtual assistants.
The second fundamental problem is the device closest to us isn’t necessarily the one we want to respond, or more importantly, take action. For instance, I might have an Echo in the kitchen for general help but also have a Dot in the family room connected to the home stereo.
If I’m standing in the kitchen and say “Alexa, play Alicia Keys” what I really want is for this to be played on the Dot connected into my home stereo, not on the Echo in the kitchen which is located closest to me.
Of course, what this means, is we want all of our stuff, whether it’s our television, stereo, lamps, thermostat, etc to be controllable, and addressable from our assistant of choice. So what I want to be able to say is “Alexa, play Alicia Keys on the home stereo” or “Alexa, play ESPN on the TV in the bonus room”.
Will all our listeners play nice?
Virtual Assistants, and their related artificial intelligence, will be one of the next big battlefields for technology. As our worlds get more complex, and as we get more and more “smart” devices, we’re going to want simpler, easier ways to interact with them.
And, more importantly, we’re going to want assistants that get smarter and not only understand, but anticipate our needs, and the leading tech companies are all going to battle to be that assistant of choice.
Amazon, Apple, Google, and Microsoft all want their assistants to be the one we use, or use most frequently. And it’s not just these “general” tech companies, it’s also companies like Xfinity that recognize the control, or the way we interact with our technology, that will have a huge influence on whose products and services we use.
As a result, I don’t expect all of the companies will “play nice” and allow us to pick our assistant of choice for spanning all of our technology.
We already see this today on the iPhone and in iOS. Apple is very protective of what they open up to third parties, and what they keep proprietary to Apple, and while Androids more open approach allows for third party assistants, Siri is your only option for iPhones, iPads, and Apple TV.
A huge battle is pending:
As noted above, the warm and fuzzy Siri, Cortana, an Alexa, or more coldly and correctly titled, Artificial Intelligence Assistants, will be the next battle for consumer technology.
This is truly transformative technology which will have universal appeal and the one who get’s there first, or the most, will be able to heavily influence, and profit from, this next wave of computing.
Amazon has a small, and potentially brief, lead but Google is now out of the blocks and Apple and Microsoft *shouldn’t* be too far behind.
- For the purposes of this post I will use the term “passive” to refer to devices that need to be physically activated via keyboard, button, etc to enable voice command and I will use the term “active” to refer to devices that actively listen for trigger words to enable voice command.
- “No one’s home” can easily be detected by “none of the families cell phones are in the house”.