...

Harmonizing and Pooling Datasets for Health Research in R | by Rodrigo M Carrillo Larco, MD, PhD | Jan, 2025


R code to extract data from unique datasets and combine them in one harmonized dataset ready for seamless analysis

Photo by Claudio Schwarz on Unsplash

My academic research overwhelmingly includes identifying datasets for health research, harmonizing them, and combining (pooling) the individual datasets to analyze them together. This means combining datasets across populations, study sites, or countries. It also means combining variables so that they can be effectively analyzed together. In other words, I work in the data pooling field where I have been full time since 2017.

I will outline the methodology I follow to extract data from individual datasets, and to combine the individual datasets into one pooled dataset ready for analysis. This is based on over seven years of experience working in academic environments globally. This story includes code in R.

Data pooling — what is it?

In most settings we will collect new data (primary data collection) or work with only one dataset that is already available for analysis. This one dataset can be from one hospital, a specific population (e.g., epidemiological study conducted in a community), or a health survey conducted throughout a country (i.e., nationally representative health survey…

Source link

#Harmonizing #Pooling #Datasets #Health #Research #Rodrigo #Carrillo #Larco #PhD #Jan