Unless you’ve had your head buried in the sand, you’ve heard of Siri, Apple’s iOS Personal Assistant program. Siri was originally an application available in the AppStore and made by a company named, naturally, Siri. In April of 2010, Apple purchased Siri. At this time they killed off Siri’s plans to port the application to other devices (Siri had officially been working on Blackberry and Android ports). Interestingly, Siri worked on any iOS device prior to it becoming a feature in iOS5 and on the iPhone 4S. Once the 4s was announced, the Siri application pretty much stopped working which means that (as of right now) only people with a 4S can use Siri. Even the hacked versions people have made to work on non 4S devices won’t work. The reason for this get’s into how Siri works.
Everyone using Siri is using it on their iPhone 4S (one more time, currently). However, the bulk of the Siri work is actually done on Apple servers located far away from the 4S held in your hand. If you think about it, this makes sense. When you say something, a program needs to translate that speech to text, then process that text and understand it, then find the answer and deliver it. Finding the answer could involve searching Yelp, Google, Wolfram Alpha, OpenTable, making a calendar appointment on the device, or even more. For a device to do all of this in a timely manner is unrealistic (for now). Furthermore, since this processing is done on the server side, it’s easy for Apple to make improvements without having to push new versions of iOS out.
So now to the “Thoughts” on Siri. First, I think it’s awesome. It certainly isn’t something most people could use all of the time (i.e. in a meeting) but with Siri, you could get something done faster via speech than with typing. I think this is a very natural progression for Apple. To some degree, Apple has looked at using natural user interfaces pretty heavily. Think of when you swipe up and down pinch to zoom. These are the sort of things you would think you could do to interact with something. Opening a keyboard or holding down several keys to make something happen isn’t as intuitive. Speaking to people and things is something you’ve been doing since not long after you were born so it would be an incredibly natural way to interact with something. Furthermore, since I’ve been interested in Natural Language Processing for a while now, this sort of software is right up my alley. I wouldn’t be surprised for voice control to come to more things (so the rumors that Siri is coming to Lion aren’t surprising either). Another thing I really like about Siri is that it has a personality. It’s not always just returning cut and dry answers. It uses humor. It tells a story.
One thing that I’ve found very interesting about Siri is that Apple released it in Beta. To the public. If you look at the recent past, you don’t see a lot of beta’s coming out from Apple. A lot of people have been complaining about poor quality and misunderstandings when speaking with Siri and it’s been huge news. I think Apple customers (I think it’s safe to say the majority of people that bought 4Ss were already Apple fans and customers) aren’t use to having something released that isn’t amazing to start with. Google is a company that has, in the past, regularly released programs under the Beta tag. This enables them to get their new applications out to people for use without actually saying “we think this is a finished product.” Google obviously continues to improve these beta programs and I’m certain Apple intends to do the same thing. I do wonder if we’ll start seeing more Betas out of Apple.
Now I’d like to talk a little bit about some of the press Siri has gotten. To start, Andy Rubin, Google’s Senior VP of Mobile was quoted at the Asia D conference saying “’I don’t believe that your phone should be an assistant,’ he said. ‘Your phone is a tool for communicating. You shouldn’t be communicating with the phone; you should be communicating with somebody on the other side of the phone.’” I can only hope that this is just a front and that Google is actually working on something similar. They already have Voice Actions on Android which you can use to search, text, make calls, play music, pull up maps, etc. Is it that far off from something that could communicate back to you with some sense of personality?
If Andy Rubin’s comments were discouraging, Gary Morgenthaler’s comments were just foolish. Gary Morgenthaler, a one time director of Siri (when it was it’s own company), says that Siri gives Apple a 2 year advantage over Google. Comparing individual features such as Siri can make a phone call and Voice Actions can play a song wouldn’t make much sense. Any one of these individual features wouldn’t be difficult for the other to duplicate. What Siri does that Voice Actions does not, is translate human speech to intent. When you say “Remind me to pick up my dress when I leave work,” Siri translates that into “Create a reminder to Pick up my dress when the phones location leaves Work”. When you say “I’m in the mood for italian food in North Beach",” it translates that to “look up italian restaurants in North Beach on yelp and return the results.” While these are impressive feats (it’s definitely not an easy thing to go from normal human speech to specific instructions), it’s not something that Google couldn’t reproduce and top in well under 2 years. At the end of the day, Siri is still just taking more or less simple statements and translating them to simple computer instructions. It’s true that Siri has nowhere to go but up (hopefully) but I wouldn’t discount Google and I certainly wouldn’t try estimating how long it would take Google to catch up.