Managing heterogeneity when pooling data from different surveillance systems

Guidance Surveillance and monitoring

This report addresses the heterogeneity that arises from pooling data from different surveillance systems and provides statistical and procedural approaches to minimise or remove its impact. The aim is to support public health specialists and researchers in answering key research and policy questions using available European data to the fullest possible extent.

Executive summary

This report addresses the heterogeneity that arises from pooling data from different surveillance systems and provides statistical and procedural approaches to minimise or remove its impact. The aim is to support public health specialists and researchers in answering key research and policy questions using available European data to the fullest possible extent. The guidance will also assist investigators to assess the validity of pooling data from different surveillance systems to derive point estimates.

Pooling surveillance data from different surveillance systems has been undertaken extensively to inform public health action and to obtain estimates of health outcomes. However, it poses a number of analytical and procedural problems that arise from the heterogeneity of these systems.

In the guidance heterogeneity that arise when pooling data from different surveillance systems is classified in three groups: heterogeneity of surveillance systems, heterogeneity in disease determinants and heterogeneity of data quality. These sources of heterogeneity were then assessed with regards to the following three surveillance objectives: trend analysis, risk factor analysis and burden of disease estimation. 

For major sources of heterogeneity, this report provides case studies which describe in greater detail the operation of the source of heterogeneity, how to assess its impact, and statistical and procedural methods to minimise it. For example, case studies on the impact of missing time period on trend analysis or of missing covariate data on risk factor analysis.