DPR/250631 - Evaluation of Large Language Models for Automated HAI Case Definition Matching for Surveillance
The objective of this procedure is to engage in a negotiated process to explore and evaluate the potential of Large Language Models (LLMs) for the automated extraction of clinical data from unstructured health records, with the ultimate goal of matching patient data against the ECDC healthcare-associated infection (HAI) case definitions.
The project will focus in particular on bloodstream infections (BSIs) identification and origin discrimination. Determination of BSI subtype and origin is a difficult task which is especially reliant on clinical data typically found in unstructured text formats rather than structured electronic health records. The initiative will assess both the technical and economic feasibility of leveraging LLM capabilities to extract pertinent clinical information and will explore how the outcomes can be integrated into ECDC surveillance systems across member states. This initiative is inherently coherent with ECDC's new mandate, which prioritizes digitalization and automation as key strategic objectives. Automating surveillance activities will reduce the workload on healthcare professionals across member states while enhancing the timeliness of ECDC's surveillance operations. LLM based Artificial Intelligence (AI) are showing increasing performance in understanding and interacting with clinical text and data, making them a promising solution to further automate HAI surveillance. The project will address potential limitations related to data privacy and cost. Currently, the most powerful LLMs are provided by private companies over the cloud, raising concerns about hospitals' ability to share raw data with third parties due to legal restrictions. Conversely, deploying LLMs locally entails significant technical challenges, including the need for substantial computing power and memory resources. As part of this evaluation, the tenderer will be required to examine the range of options available, balancing cost-effectiveness with stringent data protection requirements.
This project is envisioned to span a duration of six to twelve months and will include a series of experiments and evaluations aimed at determining the best pathway to integrate LLM-driven automation into ECDC's HAI surveillance framework. Potential contractors must demonstrate access to relevant clinical data, proven expertise in data science and machine learning, and possibly experience implementing automated HAI surveillance systems in clinical settings.