Singularity Summit 2011

Commercializing Watson

[Video] [Audio]

For more transcripts, videos and audio of Singularity Summit talks visit

Speaker: Dan Cerutti

Transcriber(s): Matt Cudmore, Ethan Dickinson

Dan Cerutti: ...we hired outside consultants. This is a tough question, and it remains a tough question. I’ll talk a little bit about our insight. We even asked Watson, and Watson didn’t know. [laughter]

And then one night I had a sobering thought. I actually know exactly what this technology is worth. It’s worth the amount of money you can win on Jeopardy! until they kick you off. That’s sobering, because that’s what the technology does.

We had some data points. Ken [Jennings] and Brad [Rutter] both won over 3 million dollars, so we had a starting point, a high-water mark. Unfortunately Dr. Ferrucci and team spent significantly more than that developing the technology. [laughter]

But we started on the right foot. We won a million dollars playing Jeopardy! Unfortunately they gave it away. [laughter] So Dr. Ferrucci is not responsible for our commercialization efforts, despite the fact that he took my suit coat and thinks he is.

Seriously... since I was at the Singularity Summit I was asked a question. Do I think Watson is smarter than a human being? I have absolutely no idea. I don’t even care. I’m thinking about the practical applications of this technology.

In some ways clearly we demonstrated that in a particular application the technology performs well. In other places, it doesn’t perform well. We’re not trying to replicate human behaviour. We’re trying to solve problems.

When we think about commercializing this technology, and why we’re excited about it, I think about four things.

The first is that we have a technology that is able to understand ambiguous human language. I don’t mean a small question; I mean an inquiry from a human being, and I’ll talk more about that in a second. That’s a big deal; I know lots of people who have trouble with that. That's a big deal.

The second is that computers, and obviously it's an excellent use of computers, are able to read and ingest effectively limitless volumes of information, structured and unstructured, and they tend not to forget. So that’s good.

The third thing about this technology, and it might be the most compelling aspect of all, is that Watson returns confidence levels. It returns what I think of as an opinion, an evidence-based opinion. David calls them answers or responses, but they’re confidence-based, quantitative measures with the evidence of why. That may be the single largest breakthrough in this technology. “Technology” meaning the system, which is actually obviously many technologies. That’s cool.

And the fourth thing, that you know of already, is it learns. And given the right amount of training data the system learns.

Those are four things that get us excited.

Then we start to think about where to take this technology. There are four orthogonal issues that we think about.

The first is time. It’s going to take a long time, independent of domain, to help this technology mature.

The second thing is we need to find high-value problems. This is an expensive system. This is not something that runs on a laptop. Someday maybe, but right now it requires a lot of compute power, a lot of memory, human beings to load text... We write annotators to parse text and expand the information. It is a complex system, therefore we need to find big problems that warrant big solutions.

The third thing is we need to find problems for which the solutions are generalizable in scale, because we don’t want to spend two years ending up over in the corner with a system and application that only works for one or two cases.

The fourth thing is we want to do something that matters. We have an opportunity to change the world, and that means a lot to us. It means a lot to Dave, it means a lot to me, it means a lot to the IBM company. Some of you may have heard we just celebrated our 100th anniversary. It’s a reflective period of time. We did a lot of good stuff for mankind in the last 100 years, and we think that Watson is the kind of technology that can make a difference. And I'll talk about where we're going to focus, and in a couple minutes you'll see that.

So the question is, what kind of problems are we looking for? Where should we apply this? We’ve got limited bandwidth we have to focus. We’ve had over 300 ideas brought to us, and I’m actually hoping that by talking a little bit today, a few more good ideas will come to us, because as I said this is not easy, it’s not obvious.

The ideas have ranged from... As you might expect, we have a lot of customers and clients around Wall Street, in the financial services, “Tell me what stocks to pick.” I’m not sure I’d sell that technology. [laughter] We are in the business of serving our shareholders. But the reality is, that’s a tough problem, and it's a problem that has been worked on for a long time by some of the greatest analytical minds and the greatest analytical technology we have. It’s structured data in general, and we’re not sure that’s a good use of the Watson technology.

Along the middle, we’ve had recommendations for legal applications that make a lot of sense. A lot of text, a lot of complex analysis, evidence matters, obscure opinions matter... Good problem. Clearly a problem we can make a difference in. The problem is that it’s tough to extract value in the legal economic system. Everything is billed back, time doesn’t matter much... Those sorts of problems are out there.

On the end of problems that seem to make sense to us are problems that have to do with intelligence, I mean defense, terrorism intelligence sorts of things. Customer-service applications. Teaching, education, using systems to make folks learn faster and better, more efficiently. Analyst-sorts of applications. Anyone who is in a position to read and synthesize large bodies of information and make recommendations, maybe in the management of a portfolio, not in the trading and purchase of stocks, but in the overall wealth-management of a portfolio. Those sorts of applications make a lot of sense to us. Why?

If I could summarize our analysis, we’re looking for problems, we’re looking for decisions made by human beings that are important and occur frequently – not every second, but maybe every few minutes, several times a day. Important decisions that are made by human beings where there is a big gap between the information that the person uses to make the decision at the time, and what is available in the world. What they use now, and what’s available if the human being were to be able to read everything known to man that was relevant, and remember it. So we’re looking for that gap.

When a physician makes a decision on a diagnosis, that generally means a lot to a person. It’s virtually impossible for that physician to have digested everything about my patient history, everything that has ever happened to me, everything that's in my labs, that's in my files, everything relevant to this particular condition or disease that I might have. They just can’t keep up, the journals, the clinical studies, the clinical data... Tough problem. That’s an example of a problem that matters, and the information gap is large.

And the other cases I gave you... Teaching is a different kind of thing, it’s a constant-learning-type application. Portfolio management, etc.

So we’re looking for the information gaps, and where that information gap is large, and there’s a preponderance of textual information, that’s a better problem. When real-time response time is less important, that’s a better problem. Sub-second hard; a couple minutes, an hour, better. We’ve got all the dimensions that matter. These are the kinds of problems we’re looking for. We’re going to pick a few.

The area we’ve decided to focus on the most is healthcare. I’ve been here only one day, but it has come up five or six times. The U.S. healthcare system is broken. It’s not going to get fixed easily. Many have said it’s the biggest problem the U.S. faces, I happen to agree with that. It’s a complex multidimensional problem. It has to do with regulations, incentives, antiquated systems, the way people do business... It’s a tough problem. We think we can make a dent in it. We’re not going to solve every problem with a Watson-type solution, but we are going to spend a lot of time building something we call “Watson for healthcare.”

So we’re working on all these other areas that I’ve mentioned, but we’re really focused on healthcare. In fact, I’m spending most of my time, and probably will spend most of my time hopefully for the next three to five years working on this.

The technology needs at least two improvements. Dave referred to this stuff a little bit, and we’re working on them.

The first is the ability to take in what we call an inquiry, not just a question. You saw some examples from Dave about how a question is parsed. In real life, an inquiry is a lot more loaded than a simple question. For instance, when I ask a simple question like “What’s wrong with me? My back hurts. What do you think?” there’s a lot of context that could be provided to the system, and that context has to do with the full articulation of my present condition, probably some of my patient history, maybe some observations that the staff or the doctor made. There’s a body of information. A lot of that information is included in what we call the electronic medical record.

Electronic medical records—EMRs—are in use today in most major care providers. They are repositories of information. They’re phenomenal if you spend some time crawling through them. Everything is going into the EMR. Most of the information in the EMR is not utilized. Why is that? Because if you printed it out, it would be 70 to 100 pages in most cases. Most of it is textual. Some of the most valuable information in EMRs is in what they call the doc’s notes, the nurse’s notes. When you leave, they type up notes. Computers don’t know what to do with that information. Watson knows what to do with that information.

So imagine going from a simple question to an inquiry. “What’s wrong with me, and here’s a bunch of information about me.” So we have expanded the system, we have this working already, to take in a large inquiry. We don’t talk about questions, we talk about inquiries.

The second part of the technology that needs to be improved is what Dave referred to, in fact we had a healthy debate at lunch about, we call the interaction model. Watson is fabulous at answering questions in three seconds when the money is on the line. That’s not the real world. The real world is, I give you a bunch of information, whether I’m a physician, or an executive, or a lawyer, or a teacher trying to help you. I give you some information, and then they say “Well what about this? Did you do this? What about...” It’s a conversation, a dialogue.

So the Watson applications that we envision will be dialogues, because I don’t have to dump the information on my doctor and say “You’ve got three seconds. What’s wrong with me?” [laughter] There are some doctors who will say “You’re good!” That's because the pay-for-service system stinks. But anyway, that's another story.

We envision applications where there is an interactive model, and we’re building something we call the interaction model. Watson could take in information, as much as is available at the time, and say to a caregiver, “Based on what I’ve heard, here are some thoughts. Here is a disease that is consistent with those symptoms. But you didn’t tell me anything about the person. Have they been to Africa lately?” You’re in the U.S., this is a disease that is native to Africa, I would never think of suggesting a diagnosis like that, but if you would tell me whether they were in Africa then maybe I can help. “Is this patient pregnant?” That wasn’t volunteered; that matters. And on and on. You could think of countless examples.

Dave is fond of offering real-life examples of interactions with doctors, of ten minutes, twelve minutes, go home, talk to the family, think about things for an hour, and then there’s a whole bunch of insight and information that would be wonderful to tell a doctor, but it didn’t fit in the ten-minute window and I didn’t think of it. So we think of an interaction model, and it’s okay to have four or five interactions. It’s about getting to the best answer.

We’re focused on healthcare, and what we call “Watson for healthcare” is a service. We are not going to take this code, which is massive and not quite ready yet, and throw it over the wall to every care-provider in large hospitals, and say “Here, go develop some applications.” It will never get there.

I’ve met with almost every major care-provider institution in the U.S. over the last 15 months, I’ve met with over 300 doctors, and I’ve met with many of the purveyors of health plans, the other folks involved in this discussion. And they're excited. They’re very excited, and they’ve asked us to build a system where we can create the repository for the industry, because it’s a big investment. We’re licensing content, getting clinical data, licensing journals, licensing textbooks, scouring the web. We’re building a huge repository. It will never be done. That repository is the evidence base against which these applications will run.

Our partners are developing solutions. The bread and butter of these solutions right now look like patient inquiry, work-up, diagnosis, treatment. Patient inquiry: I tell the system about myself. Work-up: the caregiver staff takes that information in a continuous fashion, expands it, brings in the EMR. Diagnosis: interaction with a caregiver staff. Treatment recommendation. That's sort of the core. We have 20 use cases, quite frankly. I’ve just described the four that go together in what I call the breadbasket of medical care.

But this is what we're doing, we’re working with large care institutions, starting with particular disease states, and they volunteer to give us all their data, and to work with us to bring their domain expertise (IBM is not a healthcare company, and we'll never be a healthcare company), but we're working with leading healthcare providers to develop applications that will help caregiver teams, not just physicians but caregiver teams. 75 to 80 percent of caregivers are not MDs. Those caregivers, nurses, nurse practitioners, lay caregivers, etc. spend a lot of time with us.

I was having a debate with a doctor about this, and it occurred to me, the doctor of course didn’t need this system that much because he had 15 years of medical experience, and went to medical school, and I finally said “We’re not building it for you! I get it, you’re perfect. [laughter] Not everyone went to medical school, and not everybody has your experience.” And he said, “Yeah. But I want to use it.” [laughter] So I think that’s the reality. That’s my guess at what’s going to happen.

We’re building these applications in concert with large care institutions, some brand names that you know, we'll pilot these solutions next year. It’s going to take time. The payers, the health plans, are crucial in this, and some of the leading ones are onboard, because they pay.

Imagine a scenario one day when you go in... here's a scenario and then I'll slow down and we can take some questions.. you access a web portal, it knows about you, it’s got some history, personal private information about you, and you present your latest condition and complaint. The system says, “Here are some thoughts. You’re probably okay.” “Here are some thoughts. You’d better schedule an appointment.” “Here are some thoughts. Let me connect you to your doc, to the [inaudible] person for a conversation.”

The staff brings you in. That information is not lost, it’s augmented with your longitudinal patient record, which means they didn’t forget anything about me, I don’t fill out the stupid forms every time I go in. The caregiver team has some interaction with me, and somewhere along the line they form a diagnosis based on recommendations from Watson. The doc gets involved, and thinks about treatments, and the doc has recommendations on what the proper treatments are, evidence-based recommendations on what protocols make sense for me, for my condition.

And the health plan is wired in. What is called health plan policies, which you know as what insurance companies will and will not pay, those are health plan policies, are married to care consideration guidelines, 90 percent of the time they're right that's why stuff gets paid automatically. Sometimes they're not. Imagine a single view of evidence-based treatment protocols, such that when your doctor says “Let’s do this,” it’s approved, it's done. This is what the health plans are after, they’re after getting rid of the waste in the healthcare system, they’re after better outcomes. Evidence-based medicine. That’s what we’re doing. That’s where we’re focused.

We’ll spend next year doing some pilots with these companies, in particular disease areas. In 2012 I hope to bring out solutions to the marketplace. At the same time, we’ll probably expand into some of the other areas as the technology matures a little bit. Someday, maybe as the technology gets more efficient, maybe we’ll all have a baby Watson. Maybe we’ll shrink it down and we'll get it into a reasonably-sized box, and we’ll all be able to answer our questions and to apply these technologies to other areas. For now, it’s going to take us a while to get after some of these big problems, to try to help this technology mature.

I’m going to stop. We have nine minutes, if this is correct. I’m going to invite my colleague Dave, with my suit jacket, to come out, and perhaps we can take a few questions. [applause]

[Q&A begins]

Man 1: Several speakers in the past two days have talked about the fact that technology has become less a part of the popular culture and people ask fewer deep questions about the future. I think the Watson moment with Jeopardy! was one of the few times recently where deep future technology was really brought into popular culture in a big way. Do you have any other plans of doing stuff like that, where it’s more about evangelizing the technology instead of commercializing it?

David Ferrucci: There are a couple of things that are of interest to me personally. I can’t dub them as legitimate plans yet, but I can tell you there are two topics that are of great interest to me beyond applying this to healthcare.

One is looking at this technology as a way to do much more efficient, intelligent tutoring. When we think about helping kids in reading comprehension, how to analyze text and make inferences and answer questions, and getting them to think about what they’re reading and the inferences they’re making, I see a really cool application of the technology into that spot. What I mean by “more efficient” is that currently those kinds of systems are highly scripted, and I think the potential of Watson is not to actually have to have them scripted. Watson can look at the text and the questions and actually do this more automatically without having humans have to script it. I think that’s really cool and exciting, and I think it has that similar kind of cachet.

Another one relates to how do we teach Watson. If you go with me on this that, I think Watson at least with regard to human language technology, this is probably one of the most advanced systems on the planet, it still doesn’t know everything, it doesn’t know everything about language, it doesn’t know everything about meaning certainly, and through interactions it can get smarter and smarter. One of the things we’re thinking about is deploying a class of crowdsourcing where if we get lots of people to interact with and teach Watson, we can very rapidly grow its capability. If anything, that’s one of the ideas I find analogous with the singularity concept, is that you might see that exponential acceleration in its ability to... I don’t like to use the word “understand” because I don’t think the computer is really understanding things the way humans do, but that’s a long-winded debate. But to be more effective in producing the meaning equivalence that humans would expect. That’s a more accurate way of putting it.

Dan: To be clear though, we are spending a lot of our time and energy commercializing and trying to find practical applications. The research will continue, but we’ll do both.

[next question]

Man 2: My question is, as far as for teaching and learning applications, specifically special education, people that have learning disabilities or learn in different ways, would it be possible to have Watson specialized just like the diagnoses, except for learning different individual [inaudible]...

Dan: Special ed would be one of those things that would be great. I don’t know. We just haven’t gotten far enough along to look at the kind of partners who want to help us bring solutions to market and what it means. So at this point, I don’t know.

[next question]

Man 3: Hi, my name is Yuri, I’m a healthcare provider. My question is, will your system be limited by the results of evidence-based medicine in the U.S. as far as giving opinions to the doctors and compiling data?

Dan: The answer is absolutely not. Let me see if I can clarify where we’re focused right now. The system... And I can use a word that is not exactly accurate. The system thinks in English, so to speak. So these ambiguities in the English language, we still have some work to do. Now we have medical jargon, is a hill we have to climb even further. The ingestion of the content... Right now we’re only looking at English-language content. But those are worldwide results. Clinical results in Europe are just as relevant, in fact sometimes more relevant. The beauty of a system like Watson is that it’s able to sort this stuff out and present evidence. A preponderance of evidence and more data is good. Whether we’ll be deploying outside the U.S. I don’t know yet.

Man 3: When you were talking about integrating the payer system at first, that’s really based on evidence-based medicine in the U.S.

Dan: That’s correct. But that’s not how the providers think. Our customers are actually healthcare providers. Complex issues. The payers complicate things, but they have the money, and we all want our care paid for.

[next question]

Man 4: Have you approached Congress to streamline the decision-making process that this country needs? [laughter and applause]

David: I haven’t approached them yet, no. [laughter]

[next question]

Man 5: Is the language parser hand-crafted, and did it trip up the engine when the question about the airport was asked?

David: Great question. The language parser technology unit... One of the really remarkable things is having the opportunity to do this at the IBM research labs. We’ve had investments in many of these areas for a long time, one of them of course being deep natural language parsing, and we had a parser that was in development a long time. But as I said earlier, none of these methods are perfect, and of course it would get tripped up. But one of the approaches we took was to look at multiple interpretations. We had this expectation that none of these algorithms were going to be perfect, and we really deferred commitments, looked at multiple possibilities, and ran through the entire system. Sometimes you’ll have more evidence other than the parse that might support an answer, and if through training we learned to value that evidence, that would overwhelm something that the parse might have confused. This was a very powerful underlying principle that we used, because again none of these technologies are perfect, and the more evidentiary context you have, the better off, you don’t rely on one algorithm being perfect. Future implementations may use not just multiple interpretations of the parse but multiple parsers.

[next question]

Man 6: Isn’t making the physician more efficient paving the cow path? Why not eliminate the expensive bottleneck, namely the physicians themselves, or replace them I mean? [applause] Or is that too radical a departure from the status quo?

Dan: As a patient, and as a father, I’m not sure how comfortable I would be with that. We’re not after replacing physicians or caregivers. There are things that people do, like sensing my mood, and understanding me, and making deep human judgements that I trust, and I suspect it’s going to be a few years before machines are able to do that. I don’t know the upside of replacing well-trained physicians. I do see the upside in making caregiver teams have more time to spend with me, and more data to help them make more evidence-based judgements. I will tell you, worldwide there are not enough physicians, and the shortage is growing. In the U.S. we’re actually a little bit rich. I just don’t see the upside of going after a problem like that.

[final question]

Man 7: I'm Rick Schwall. If I understand what you said, there is one Watson application package that did both the job of reading mountains of text, and then playing the game. Is there plans to have them done separately, that there's a text-reader database-loader, and then there's a question-answerer?

Dan: I'll let Dave... When we talk about Watson, we talk about technologies, with an “s”.

David: There’s an underlying architecture, and the way to think about the interface to that architecture... Dan used the word “inquiry”, and the inquiry could be a simple question, it could be a semi-structured graph of data, meaning a bunch of cues, a bunch of facts that are built in formal language, or in textual language, and there’s some unbound variable, in other words there's an unknown entity. In the case of Jeopardy! questions, it’s this province, this president, this whatever. In the case of medicine, it’s this disease, or this treatment. What you’re trying to do is bind that unknown entity. This is the basic interface. The architecture goes off, takes that input information, generates many possible bindings, many possible hypotheses, collects the evidence, scores the evidence, comes back and says here are the possibilities, here’s the evidence that supports those possibilities. That’s the basic interface to the architecture. I can now deploy that in many different ways. Something that looks like the question-answering system in Jeopardy!, something that looks like a dialogue with a doctor. I could deploy it in many different ways. It’s the same basic architecture in the back end. It has to be populated with different content and different algorithms, or more advanced and refined algorithms, same architecture.