Thursday, February 17, 2011

Jeopardy IBM Challenge

I used to watch Jeopardy every day while folding newspapers before I delivered them (~$2.50/hour was a jackpot job for my thirteen-year-old self; damn labor laws...) but I've never been as psyched about it as I am this week. For the first time, a computer is playing, and its opponents are the two most winningest Jeopardy contestants in history.

I told a couple friends about this, trying to get them as excited as I was, and was thoroughly surprised by their reaction, which was: "well of course a big database will win at Jeopardy. What's the big deal?" It's a huge deal! Google knows everything but it can't find out anything; it takes a human user to understand the question, choose the relevant search words, filter the results, and pick out the actual answer amidst all the text. Watson does that (but without being connected to the internet). How awesome is that??

Watch the NOVA special on Watson (the computer's name) here.

The science is entertaining enough, but actually watching the match is very revealing (spoiler alert: don't continue reading if you want to watch the match without knowing who won).

First of all, the timing issue. This is a huge problem. If the metric of intelligence you are interested in is how well a computer can interpret a question and come up with the right answer in a small enough amount of time to qualify as a seamless conversationalist, success should not rest on buzzer quickness. The computer can "read" the question instantaneously, since it is sent as a text file as soon as it is visible to the humans, and buzz in without worrying about human reaction time. That is an enormous advantage, even ignoring the fact that humans take a significant amount of time to dig up facts from the remotest parts of their memories (a flaw which I count as a legitimate win for Watson's CPU speed, even if it doesn't ultimately "know" more.) In a quiz where each person answered each question, I have no idea who would win, but it would certainly not be a runaway race with Watson winning by over $25,000. In fact, you can noticeably see Ken Jennings adapting to this obstacle by buzzing in before he can think of the answer more and more frequently.

Then there are random amusing peculiarities that wouldn't happen with a human. Yesterday, Ken Jennings buzzed in with "1920s", was wrong, and then Watson buzzed in and also answered "1920s". Alex Trebek gently reminded him that Ken had already tried that answer, in a manner that was downright hilarious, knowing that Watson can't hear.

But the most fascinating thing to me was Watson's lack of grammatical understanding. Now, I know that grammar is a very difficult thing to teach a computer. Read The Language Instinct to be convinced of how spectacularly complex our language structure is and how it is uniquely suited to (or rather determined jointly with...*) our human brain genetics, structure, and development. Yet if the task is to interpret a question, I would think that grammar would be of paramount importance. But, the strange possible answers that Watson comes up with (his top three guesses are shown on the screen for every question), and his occasional non sequitur answers, suggest that that is not the case.

On second thought, I think most of the strangeness comes from two issues, rather than total grammatical ignorance. According to the Nova special, grammar was in fact heavily emphasized in Watson's development. But he frequently misses two things: the type of answer that the category itself requires, and phrases in the question that define the type of answer. Hence his response "Toronto" to a question about "U.S. Cities" and "Orson Wells" and "Lyon" possible answers in the "name the decade" category, "art theft" his tautological guess in the "art of the steal" category, or "pediment" his guess in the category "'church' and 'state'" (which seems sort of odd because in the earlier questions in that category he seems to know that each answer must have one of those words in it). He somewhat makes up for this impairment by learning what the category calls for by hearing others' answers: by the fifth and sixth questions in "name that decade", his guesses were all at least years, if not decades...

The other thing that he often missed that seems important not to miss is the defining phrase in a clue designating what class of answer is called for. Hence "listen to the music" was one of his possible answers to a question asking for a "this title gal" of a Beatles song, "1908 summer olympics" a guess for "this city", "caprice" for "this instrument", "porcupine" for "this protein". Despite the Nova special mentioning that they addressed the gender mismatch issue, his second guess about a "she" was John Lennon. Most of the time he does remarkably well coming up with the right answer despite 2nd or 3rd place guesses that don't make much sense, but I'm surprised that those answers aren't eliminated in the first round of pruning. If a question references "this city", start with a list of only cities and see which ones makes sense. That's certainly how my brain works. Sure, maybe human thought processes aren't optimal for everything, but it seems like if language is designed to be interpretable by human minds, then human thought processes must be particularly good at sifting information from language.

In all though, it's mindblowing how well Watson did. And obviously these issues aren't just oversights of a team that's thought of nothing else for the past several years; I'm sure there are very good reasons why these systematic issues still arise. I just can't wait to see if they're gone in version 2.0.

*To quote one of my favorite creationist/evolution debate quotes, by Douglas Adams, thinking that the brain is miraculously suited to learn human language "is rather as if you imagine a puddle waking up one morning and thinking, 'This is an interesting world I find myself in — an interesting hole I find myself in — fits me rather neatly, doesn't it? In fact it fits me staggeringly well, must have been made to have me in it!'"


fred said...

Is that really a creationist/evolution debate quote? It sounds like the weak anthropic principle to me... but maybe that's the same thing.

Vera said...

Yeah, it is, I just always hear it in the context of the evolution debate; as in, "just because life seems miraculously improbable doesn't mean it was designed by god". I think that's the context Douglas Adams was speaking in when he said it, but I'm not sure.