In this post, I argue that given the current state of computer technology, the appropriate use of artificial intelligence (AI) for medical note taking is to amplify human intelligence, not replace it.
The End Application Matters When We Try to Use AI
In 1987, Prof. George Kondraske developed a “character pattern recognition and communications apparatus,” AKA predictive texting or autocorrection. We know this technology because we use it all the time. The computer guesses what it thinks you intend(ed) to type based on previous text-message examples, and suggests the best guess to you as a completion or correction. The more one uses predictive texting/autocorrecting over time, the better the program will become when suggesting words and phrases. This was an early practical example of the use of AI, in particular that branch of AI called machine learning (ML).
The same ML algorithmic concepts behind predictive texting/autocorrecting can in principle be used for more ambitious applications, such as medical note taking. But whereas errors made while dashing off a message to a friend might make us laugh, errors made by ML algorithms in the medical domain can be deadly.
Context is All
Let’s take a look at AI-based medical note taking in more detail. Our goal is to populate electronic health record (EHR) documents just from the spoken interaction between physicians and patients. To make this application a reality, there are three subtasks to be performed:
- Speech recognition -- turning the spoken words into text words. Although not perfect, speech recognition now works pretty well for a single party with clearly enunciated speech and minimal background noise and is coming along for multi-party conversations.
- Named entity & relationship recognition -- recognizing medically meaningful phrases in the stream of text words. This is also done reasonably well with current algorithms.
- Interpretation -- figuring out what the recognized phrases actually mean, so that accurate medical information can be recorded in the EHR document. This is where things can go drastically wrong, because the meaning attached to a phrase can differ significantly depending on context.
How can context make a difference? Consider the following predictive texting/autocorrecting example:
The right correction to the underlined word is obviously ‘sweetest’. But now consider this text:
Given the different context, the most likely corrected word is now ‘sweatiest’. Both corrections involve only one letter, but without considering the surrounding verbal context, it is very difficult to choose correctly between the two candidate corrections, even for a human.
Examples from a Commercial AI System
Now let’s consider the problem of interpreting medical words and phrases without sufficient context. The following example sentences (top) were taken from actual physician-patient conversations, and an existing commercial AI system was used to extract the candidate medical facts (bottom) to be included in the EHR. In the first figure, we have redacted the verbal context surrounding the medical phrases recognized by the AI system to illustrate the challenges posed by missing context.
Figure 1. Medical facts suggested by an AI system, context redacted.
Given the visible phrases, the candidate medical facts seem pretty plausible. (Except for the ‘do not resuscitate’ -- more on that later!) But now let’s look at the same examples, this time with the redacted text visible.
Figure 2. Medical facts suggested by an AI system, context visible.
Once the verbal context is considered, the medical facts are now easily seen by a human to be completely inappropriate. And what about the ‘do not resuscitate’ suggestion? Frankly, I’m at a bit of a loss to explain that, except to note that AI systems will always have occasional brain farts!
AI has its place. It can understand simple imperative utterances, such as “tell me the temperature” or “remind me of my next appointment.” But as we can see above, fully automated AI systems currently fall apart when interpreting more complex sentences that involve significant context.
What to do? Some day AI systems will be capable enough to handle verbal context at human levels, but that day could be a decade or more away. There are just too many ways to say all kinds of contextually relevant things, and ML-based approaches need to be presented with examples of all such utterances before they can work effectively. Until then, I believe the approach to medical note taking should not be AI, but IA -- intelligence amplification.
Intelligence Amplification at Augmedix
Intelligence amplification (IA) refers to the effective use of AI methods for augmenting human intelligence -- see https://en.wikipedia.org/wiki/Intelligence_amplification. IA has a long history of success. At Augmedix we think that IA in a semi-automated system is the right approach to medical note taking. In our IA system, remote documentation specialists are helped in their note creation process by AI methods that make suggestions and ask for clarifications, but that leaves final interpretations up to the humans. The AI methods are the same as in fully automated systems, but how they are used is crucially different: rather than trying to replace humans, the goal is to assist them. The former approach is too error-prone to work in a medical context; the latter is a practical approach that can provide a cost-effective, accurate documentation service for physicians.
Augmedix board member Joe Marks is Executive Director of the Center for Machine Learning & Health at Carnegie Mellon University.