Harnessing Expert Feedback to Generate Representative Synthetic Clinical Data for Underserved Populations
DHEPlab, UNC Gillings School of Global Public Health
June 27, 2025
EHR Adoption Disparities:
AI Training Data Bias:
“Clinical AI models risk perpetuating healthcare inequities precisely where they could have the most impact”
What is Synthetic Clinical Data?
Market For Synthetic Data:
Traditionally need a real dataset to train a model to produce (using Generative Adversarial Networks)
The Fundamental Challenge
How do we generate representative data when none exists for these populations?
Hypothesis: We can use local expert knowledge (clinician, providers of various types) to guide generation and validate quality

RLHF and RLEF are not exactly the same thing. RLHF is about learning what humans like; RLEF is about learning to do what experts do. RLHF handles subjective alignment, while RLEF handles competence transfer.
Traditional Limitations:
Potential RLEF Solutions:
\[\text{Maximize: } Q(n,e) = f(n \times I(e) \times E(g,s,a)) - CL(e)\]
\[\text{Subject to: } n \times c(e) \leq B\]
Where:
🔧 Technology
AI algorithms, devices, platforms
👥 Practitioners
Doctors, nurses, administrators
🏥 Patients
Individuals, families, communities
📋 Policies
Regulations, protocols, guidelines
💰 Incentives
Financial, professional, social
🔄 Dynamic Interactions
All elements continuously influence each other
Optimizing technical components alone can lead to system-wide failure
Key Addition: Feedback Loops. Models change the world they operate in
}
Technical Components:
Social Components:
“Healthcare AI systems exist within complex human and organizational contexts”
Cognitive Burden Factors:
Engagement Boosters:
Low Effort
Medium Effort
High Effort
}
Here: Clear binary choice
Why Kibera?
Partner: CFK Africa
Technical Innovation:
Social Innovation:
“Creating AI that learns from and serves those who need it most”
Contact:
Partners:
Funders:
Key Sources: