The use of machine learning (ML) for the analysis of electronic health record (EHR) data has become more frequent over the course of the last two decades. Today, two complementary forces all but ensure that this trend will continue into the near future: EHRs are growing in size and complexity as more patients interact with health care sys­tems and as a broader array of screening technologies are deployed in clinical practice, while at the same time public and private investment is spurring the development of new ML methods that are capable of handling ever more diverse data types. As ML becomes more integrated into decision-making using EHR data, it is important that practitioners and policymakers understand how ML is being applied to EHRs, and how the use of ML may both improve and complicate health care.

ML—sometimes used synonymously with artificial intelligence (AI)—is an evocative term, which may mean different things to different audiences. For the purposes of this commentary, we define ML as a class of methods for deriving decision rules by using a combination of data, mathematical or statistical principles, and computer soft­ware. Health care decisions based on ML have the potential to be faster, more precise, less expensive, and less biased than those attainable purely through clinical judgement or case review. Hence, in the last two decades there has been intense interest in applying ML to the analysis of EHR data, where sample sizes tend to be large and where decision rules may inform public health assessments, clinical trial design, optimal treatment regimes, or clinical decision sup­port (CDS).

In many respects, current uses of ML for EHR data rep­resent innovations on themes that emerged in the 1990s as EHRs became more widely adopted by hospital systems. Even at that time, automating aspects of record-keeping and CDS were seen as key potential benefits of moving from paper records to EHRs.1 Now, various ML classifi­cation and regression models—which take in the demo­graphic information, test results, and biometric or genomic measurements in structured EHR data—are supplement­ing or replacing clinical decision rules based on common-knowledge health care guidelines.

These models may be integrated directly into EHR systems to identify patient subpopulations of interest and to assign diagnostic labels to patients with any number of rare or common disorders in real time.2–7 The paradigm of reinforcement learning (RL), which constitutes a subset of ML, is also being used to inform health care decisions that must be made sequen­tially in response to a course of patient outcomes. For instance, RL has been applied to design treatment regimens for patients in intensive care8 and for those enrolled in clinical trials.9 In addition to patient-centric uses, phy­sician-centric uses of ML have recently been studied and piloted in clinics. Most commonly, in order to reduce the stress associated with interacting with EHRs, experimental software platforms have been developed to use ML to pri­oritize only the most relevant information for display on cli­nicians’ EHR interface.10,11 ML is also increasingly being used to shape decisions for hospital resource management, such as scheduling hospital admissions from the emer­gency department (ED) or anticipating 30-day readmission to the hospital.12,13

While much of the activity in ML research for EHR data is a continuation of what came before, there are at least two factors that distinguish the current state of affairs from that of the previous decade. First, ML is transitioning from being a novelty to being a commonplace tool, particularly for large health care systems with mature EHRs. This means that ML has been around long enough to develop a track record in clinical practice, and that track record has been mixed.

Certain studies have revealed sources of bias, imprecision, and even increased time-load on physicians interacting with ML recommendation systems that have been deployed in EHR environments.14,15 As applications of ML to EHR data become less hypothetical, current and future research efforts in this area will likewise need to become more practi­cal, focusing on the challenges that inevitably arise as ML is integrated into real-time CDS.10,16–18

A second defining characteristic of the current era of ML for EHR data is a heavy emphasis on the analysis of so-called unstructured EHR data, which are comprised of physi­cian notes or other text entered by medical scribes. While ML methods for unstructured EHR data are by no means new, the effectiveness of these methods has recently seen an apparent increase due to the construction of large lan­guage models (LLMs), which derive decision rules for lan­guage generation using internet-scale sources of text data.19 Refinement of general-purpose LLMs using biomedical text databases has given rise to several open-source LLMs designed for biomedical language synthesis.20–23 Even more recent work has yielded LLMs specifically designed for question-answer and instruction-response style interaction with EHR data.24 The remarkable flexibility of LLMs in terms of their ability to process free-text instructions makes them a potentially valuable tool for clinicians who spend hours navigating EHR systems to generate documentation or to search for relevant medical notes, hours which might otherwise be devoted to patient care.25 However, while other forms of ML have reached the implementation stage in clinical settings, evaluating LLMs in realistic EHR environ­ments remains an open challenge.

State-of-the-art ML models have grown so large in recent years that the amount of data and computing power required to produce effective decision rules from them exceeds the resources of many health care systems. Hence, the next several years are likely to see an increased demand for the use of EHR data by third parties who administer EHR systems and an increased demand for access to EHR data from third parties who have extensive computing resources. The supply of ML tools will also increase, as companies that previously only administered EHR systems begin to develop proprietary ML models, and companies that previously focused on ML for other applications turn their attention to EHR data. To combat the influx of ad-hoc ML tools deployed at the department level, large hospitals with the requisite resources will increasingly turn to “command center” mod­els of hospital administration, which leverage a centralized set of predictive ML tools to monitor and coordinate patient care using live updates of the EHR.

If recent history is any guide, the frenetic activity around ML will create a sense that progress in health care delivery is both rapid and inevitable, but this will not match what is observed in practice. For instance, in a retrospective popula­tion-based study of patients who visited the Bradford Royal Infirmary Hospital in the UK, the use of a hospital command center equipped with ML-powered coordination software was not found to have a positive impact on patient flow or data quality.26 A widely used ML model developed by Epic Systems Corporation (ESC) for predicting the onset of sepsis was found to have significantly poorer discrimination and calibration than had been previously reported by ESC when validated by researchers at Michigan Medicine, lead­ing to substantial alert fatigue among clinicians using the model to inform their treatment decisions.27 Even if prop­erly calibrated to naturally occurring EHR data, ML tools can output faulty decisions if their input data have been subtly altered, leaving them vulnerable to cyber attack.28 Even without explicit malicious intent, corporate agents operating in the health care domain can cause security issues when handling EHR data. For example, in 2019, an aggressive acquisition of EHR data from 50 million Ascension custom­ers by Google gave its employees access to non-anonymous health data, raising concerns that patient confidentiality had been sacrificed for the purpose of creating a proprietary ML model.29

Cautionary tales like these cast doubt on the prospect that ML can be a panacea for the ills of the health care sys­tem, yet they should not necessarily discourage practitio­ners from considering ML solutions to their problems. Many applications of ML to EHR data do promise a better qual­ity of life, both for the patients who visit hospitals and for the health care professionals who work in them. However, making good on the promise of ML for EHR data will require clearheaded thinking from and coordination between the clinicians, scientists, and policymakers who use, design, and regulate ML tools. As several studies have demon­strated, the performance of ML methods as measured by retrospective EHR analyses tends to exceed that observed in clinical practice.15,27,30 Therefore, when conducting a cost-benefit analysis for the adoption of any ML tool, it is important to keep in mind that the actual performance of the tool may not match its reported performance. Designing prospective evaluation strategies that mimic realistic EHR deployment environments can be a crucial first step toward obtaining realistic estimates of the near-term and long-term performance of ML methods.13,17 Defining standards for the coding of medical terminology and the storage of medi­cal data has been a key challenge for the development of EHRs over the last 30 years.1 Looking forward, defining standards and best practices for ML as applied to EHR data and encouraging their widespread adoption poses new chal­lenges for stakeholders in health care systems, which will ultimately determine whether ML improves or merely com­plicates how EHRs are used to provide care.


The authors declare that they have no known conflicts of interest related to the writing of this article or any products or institutions mentioned.