The Future of Managed Care

It’s Not Perfect, But It’s Hot: Quick Triage from Dr. Bot

Online or app-based “chatbots” evaluate symptoms 24/7 and could make health care more accessible and effective. But are they just another toy for the tech‑privileged?

Howard Wolinsky

Dx populi
Apps and smarthphones have made symptom checkers available to the public.

Source: Ada

P​atients who are worried about symptoms in the middle of the night don’t have to run out to an urgent care facility these days or fuss about landing an appointment to see their family doctor in the morning. 

Now help is available from the nearby smartphone, tablet, or laptop just by opening a “chatbot” in an app or online, entering the symptoms, and getting some idea of what may be wrong. Some of these increasingly sophisticated health apps are triaging their users and advising some people to go to the emergency department, while telling others to “worry not” and stay home. An opportunity for real-time consultation with a physician is often a feature. 

It’s all made possible because of whip-smart artificial intelligence capable of searching the medical literature and the kind of software that has made it fast and easy to buy plane tickets online, handle financial transactions via smartphone apps, and put Apple’s Siri and Amazon’s Alexa at our beck and call. 

The convenience of health care chatbots is alluring, and they could make American health care more accessible, more effective, and less expensive. But there’s also a long list of reasons to be wary. For one thing, the accuracy might not be there; for another, the most likely users may be the tech-blessed and worried well, not those who would benefit most from appropriate health care. There may also be some “shiny object” syndrome. Nothing seems to overexcite American health care more than new, largely untested technology that winds up making at best a marginal contribution to improving care and reducing cost. 

Artificial intelligence, not chatbots, was the focus of a Liz Szabo article for Kaiser Health News in December. But there’s a lot of overlap between AI and chatbots, and Szabo wrote that “many doctors and consumer advocates fear that the tech industry, which lives by the mantra ‘fail fast and fix it later’ is putting patients at risk—and that regulators aren’t doing enough to keep consumers safe.” 

Not a doctor substitute 

As the name suggests, chatbots are pieces of software that create a real-time, chatlike experience for the human user through voice recognition or by typed-in text. It may require some suspension of disbelief, but at some level they are supposed to convince us that we are dealing with another human being, not just some clever bytes of code that can volley back a humanlike response to incoming bits. 

Symptoms checkers are one of the main health care applications. They are not new, notes Hamish Fraser, a physician and a medical informatics specialist at Brown University; they’ve been around for 20 years, helping physicians and other health care providers make diagnoses. What is new, he says, are web- and smartphone-based symptom checkers designed with the public in mind. “Mobile phone apps have just made these systems much more accessible,” says Fraser.

Since April 2019, Sutter Health patients and anyone else online with access to that health system’s patient portal can conduct a health assessment directly from the Sutter Health site, answering a series of questions regarding their medical history and current symptoms. The chatbot will then present users with an assessment of the most likely symptom causes and appropriate care options. “Trying to scale experiences that are really sustainable, really personalized, really providing the care that patients need is a challenge,” Albert Chan, MD, vice president and chief of the digital patient experience at Sutter, said in an email interview. The problems in health care are similar to “the solution of the last mile” for the tele­communications industry, noted Chan.

Sutter’s chatbot was developed by Ada Health, a Berlin-based company. Jeff Cutler, its chief commercial officer in New York, says that because Ada was built to support clinicians, its AI was developed to replicate the cognitive process of a physician rather than to rely on rigid rules. It works dynamically to take all available data into account, from the age and medical history of the patient to health conditions prevalent in different parts of the world. About three years ago, the company shifted to place a greater emphasis on consumers. Cutler stresses that the technology is not intended to replace a human doctor and that the information provided is not a diagnosis. “Do you know the number one reason that people in America go to a doctor?” Cutler asks. “It is to find out if they need to go see a doctor.” 

Within two months of going live on the Sutter portal, the Ada chatbot had evaluated nearly 10,000 patients. As one might expect, most people (60%) used the chatbot after normal office hours between 6 p.m. and 9 a.m. By the end of October 2019, the Ada chatbot had been used 21,000 times. Chan says, “With AI, we can take something meaningful like answering our patients’ questions in the wee hours of the morning and make that systematic. When you are concerned or sick, we aim to connect you to the care that you need, reducing friction one human interaction at a time.”

Chan says doctors have become accustomed to going through reams of paper generated by their patients from “Dr. Google” web searches. In contrast, Ada is a tool that actually has “been vetted by and built by doctors,” he said.

The GB experience

Chatbots have been gaining a foothold in Great Britain. Probably the one with the highest profile is Babylon Health. The National Health Service has incorporated Babylon Health’s “GP at Hand” system into its offerings in the London area. A spokesperson for Babylon noted that an independent committee for NHS showed that the 24-hour capabilities of GP at Hand helped NHS to reduce visits about 4% overall, at no cost to the NHS. Babylon has bragged that it has the fifth largest NHS primary practice, offering video visits and in-person visits to six health centers in London and one in Birmingham. But Fraser at Brown says that “Babylon could essentially cherry-pick all of the healthy people who commute into London and therefore are happy to get a quick video chat about their cold or their sprained ankle. And all of the sick people who take all of the work [could] remain with the ordinary practices.” A Babylon spokesperson stresses that Babylon GP at Hand is open to any patients who choose the service, regardless of age, sex, or health.

Fraser and a colleague did head-on comparisons of Ada, Babylon, and YourMD symptom checkers for an interview with Wired UK, the British offshoot of Wired magazine. They tested the symptom checker app for shingles, urinary tract infection, asthma, and alcoholic liver disease. Fraser says Ada did well, YourMD did OK, but Babylon missed them all. Of course, two years is eons when it comes to app development, and these three have undoubtedly gone through many iterations since this comparison. Even so, Fraser says problems—and benefits—with chatbots haven’t been rigorously evaluated. He is currently studying how reliable symptom checkers are in an urgent situation that may require a visit to an emergency department. 

Meanwhile, symptom checkers are collecting fans and critics, especially in Great Britain, where Secretary of State for Health and Social Care Matt Hancock, a GP at Hand user, has praised Babylon as a health innovation that could benefit patients and help control costs throughout NHS. One critic, known on Twitter as @DrMurphy11 and claiming to be an NHS consultant, has been called the “chatbot killer.” DrMurphy11 has tweeted about posing as a 66-year-old woman who asked GP at Hand about a painless breast lump. He lambasted the AI system as “absurd” because it asked the fictitious patient whether she was pregnant or breastfeeding. Then, instead of raising the possibility of breast cancer, the chatbot suggested that she had a moderate risk of bone-thinning osteoporosis.

“If you want to really use the system by Ada or Babylon or any of the others, we don’t yet have enough clinical evidence of their performance,” Fraser says.

Francis Fullam, an assistant professor at Rush University College of Health Sciences in Chicago and who has tested a dozen chatbots, says judgments about health care chatbots need to go beyond clinical factors. “Do these chatbots provide correct clinical assessments to the patient, and is the patient experience sufficiently positive to make the patient try it again after their first trial?” he asks. “You do not have a full definition of the quality of care unless both the clinical outcomes of care and the patient experience of care are considered.”

Twenty years ago, Jason Maude cofounded Isabel Healthcare in Great Britain to develop a symptom checker to help doctors better diagnose disease and train health professionals. Later it became a tool available to patients. Chatbot symptom checkers use rules-based technology, a type of AI, which cannot, in Maude’s view, deal with the full range of conditions and variety of ways that patients can describe their symptoms. “The technology is not right for this particular job,” he says. “Real life is too messy and complex.” Maude says Isabel uses machine learning, which enables the use of natural language.

Maude adds that some symptom checkers “force the user to pick their ‘most important symptom” in order to start them on a decision tree and ask them questions. This is potentially quite dangerous, as it misses the patient’s overall clinical presentation. If a patient has fever, diarrhea, vomiting, headache, and sore throat, for example, how can he or she pick the most important one? If you enter the same symptoms in different sequences into many of these symptom checkers, they will come up with different answers—a fact that is worrying.

Still, many experts picture a future when many of us will be comfortable—and happy—getting assistance from health care chatbots.

“There are lots of situations where people don’t under­stand the significance of some symptoms, and they either overreact or underreact to something important,” says Fraser. “And there’s potential for these types of tools to help patients.”

“But they’ve got to be developed in a very rigorous way,” he continues, “and be evaluated in a series of stages from a lab-based approach just using existing patient data through clinical observational studies to real randomized-controlled trials. And you can’t just rush into trying to diagnose whether the patient has malaria, meningitis, a heart attack, a pulmonary embolism, or a stroke, for example, without ensuring that the system is designed to be usable and rigorous and safe for that population.”

Fullam sees an immediate application for triage of patients and making sure the right patients get the right provider: “In the long run,” he says, “adding medication data to chatbots could significantly improve diagnosis and triage suggestions.”

A blueprint for high-volume, high-quality lung cancer screening that is detecting cancer earlier—and helping to save lives

Clinical Brief

Multiple Sclerosis: New Perspectives on the Patient Journey–2019 Update
Summary of an Actuarial Analysis and Report