Using large language models like ChatGPT in healthcare: Make sure you understand the risks

ChatGPT opens doors to exciting new possibilities in healthcare, but significant risks remain – particularly when it comes to data privacy and security. 

The recent advancements in artificial intelligence (AI) are, quite frankly, awe-inspiring, with large language models (LLM) setting the new frontier of what is possible. Open AI’s ChatGPT LLM gained one million users in the first five days after its release in November 2022, and currently receives approximately 1 billion website visitors per month. While some of these users are only testing what is possible with these language models, many others are using ChatGPT for professional tasks ranging from copywriting to job applications to corporate communications.

The potential opportunities in healthcare are tremendous, ranging from automating clinical documentation to summarizing patient histories to identifying candidate patients from clinical trials. While impressive, it is important to understand the risks of using the LLMs in healthcare.

Two major risks of using LLMs in healthcare that we will focus on are the accuracy of the model, and the privacy implications of its usage.

Understanding large language models

Before we look at the specific problems and risks of using LLMs and ChatGPT in healthcare, it’s important to consider what the LLM is actually doing when you use it. An LLM is trained on a large corpus of data, such as Wikipedia, with the task of predicting masked words in a sentence. For example, given a sentence, “The knight was moved across the [mask],” the LLM is trained to predict the masked word based on other words in its context. In this case, the mask may be board or castle. The LLM architecture describes the complexity of model, which is often described in millions or billions of parameters, layers of a network, and the number of ‘attention heads.’ When interacting with an LLM, it takes your prompt as input and uses the architecture and trained weights to iteratively predict the next word.

The risks of using ChatGPT in healthcare

Next, let's take a deeper dive into the problems and risks of healthcare organizations using LLMs like ChatGPT.

Privacy and compliance risks

One of the major concerns with ChatGPT is that using any third-party application requires sending data to the third party. When data is managed by a covered entity (such as a hospital) and contains protected health information (PHI), that data is subject to HIPAA. As a result, to use an LLM with PHI, a Business Associate Agreement (BAA) should be put in place. Unfortunately, this may not yet be possible for some LLM vendors. Moreover, if a covered entity’s BAA (such as a vendor) wants to provide a service using an LLM, that vendor needs to have the contractual right to share data to other third parties, which is often not the case with standard hospital contracts.

Moreover, by sending PHI out to additional third parties, organizations lose visibility into how that data will be managed. For example, healthcare organizations are not able to know exactly where their data are stored, if it will be mixed with other organizations’ data and used to train future language models, or what security controls are in place to protect the data. Healthcare organizations using LLMs must understand that their data are potentially at greater risk of breach or misuse.

To mitigate the privacy concern, some organizations de-identify data before sending to LLMs. However, complete de-identification is challenging and not always perfect. Moreover, user prompts to LLMs may themselves contain PHI, even if the data being analyzed has been de-identified.

An alternative and more privacy-first response is to run an open source LLM like Dolly, RedPajama or Falcon within a healthcare organization’s infrastructure. This way, data are never sent to a third party, but LLMs can still be leveraged. However, these open-source models are not yet as advanced or well-trained as the more popular commercial systems such as ChatGPT. Additionally, the resources and expertise required to program and maintain an open source LLM may not be readily available to many healthcare organizations. That being said, the pace of recent open-source improvements has been dramatic, with the gap between closed- and open-source models likely to decrease.

Accuracy and Hallucination

Some demonstrations of ChatGPT and other LLMs have been impressive. However, there have also been documented cases when the LLM gets things wrong or behaves erratically. In an industry where patient safety reigns supreme, healthcare organizations need to understand the impact of this potential risk.

When used to diagnose hypothetical patient cases, ChatGPT was accurate at a level close to a third- or fourth-year medical student – impressive, though not at a professional-level. But ChatGPT also hallucinated facts, fabricated sources, made logic errors, and produced answers that are inappropriate or non-ethical. The same problems occur with GPT-4, the latest version of the chatbot. Using it in a care context could easily lead to dangerous medical errors.

An interesting example of this problem was observed a few years ago by a company testing an earlier version of ChatGPT to investigate chatbots to automate the time doctors spend writing notes. While chatting with a fake depressed patient, the program suggested that to increase happiness, the patient should “Take a walk, go see a friend, or recycle your electronics.” Did that last one confuse you, too? When questioned about the merits of recycling electronics to treat depression, ChatGPT responded that recycling can get you a $15 tax refund, which makes Americans happy, because humans like money.

In summary, LLMs and ChatGPT provide impressive experiences, but the results are not 100% accurate currently. This potential error raises various patient safety and legal concerns that the healthcare industry must understand and manage before the system can be put into practice.

Choosing the right LLM for your organization

Ultimately, if and when an organization uses ChatGPT, an open-source LLM, or chooses to trust a vendor that uses ChatGPT, it should have realistic expectations about the tool’s strengths and weaknesses. Organizations should also have an understanding of where their data flows and the risks to the data, in order to deploy appropriate safeguards. These privacy and security risks must be managed from both a legal and technical perspective, using multiple skillsets across an organization. Similarly, the accuracy of models should be continually monitored, with any uses of LLMs in healthcare still requiring a human in the loop. The next few months will likely move at a rapid pace with open-sourced LLMs improving their quality, thus providing a more secure and controlled LLM usage for healthcare that are risk adverse.