+ AI in Health: Foundations, Use Cases, and Pitfalls (half-day, 9:30 AM to 12:30 PM, August 19, 2026)
+ An Overview of Machine Learning Methods for Survival Data (half-day, 1:30 PM-5:00 PM, August 19, 2026)
Huimin Cheng, PhD, is an Assistant Professor in the Department of Biostatistics at Boston University School of Public Health and a Rafik B. Hariri Junior Faculty Fellow. Dr. Cheng received her PhD in Statistics from the University of Georgia. Her research lies at the intersection of statistics, machine learning, and network science, with a focus on developing principled methodologies for complex data, including graphs, biomedical knowledge systems, and large-scale AI models. Dr. Cheng’s work spans statistical network analysis, graph representation learning, causal inference, and AI-driven methods for biomedical discovery. More recently, her research has focused on evaluating and improving large language models and generative AI systems, particularly in high-stakes biomedical settings. Her work emphasizes reliability, interpretability, and the integration of statistical principles with modern AI.
This short course provides a rigorous and practical introduction to artificial intelligence in healthcare, with a particular focus on generative AI and large language models (LLMs). As AI rapidly transforms biomedical research and clinical practice, there is a critical need for statisticians and data scientists to understand both its foundational principles and its limitations.We begin by establishing a clear conceptual framework, distinguishing artificial intelligence, machine learning, deep learning, generative AI, and emerging agentic AI systems. This foundation is essential for understanding how modern AI systems differ from classical statistical approaches and where they offer genuine advantages. The course then focuses on generative AI and LLMs, covering their core mechanisms (e.g., pretraining, fine-tuning, prompting) and their growing role in healthcare. We will examine real-world applications, including clinical decision support, biomedical literature synthesis, drug discovery, and patient-facing systems. A central emphasis of the course is on practical challenges and pitfalls. Topics include hallucination, bias, lack of calibration, data leakage, evaluation difficulties, and regulatory considerations. We will discuss emerging strategies for improving reliability, such as grounding with knowledge graphs, structured prompting, and hybrid statistical–AI frameworks. Throughout the course, we connect modern AI methods with classical statistical thinking, highlighting opportunities for integration rather than replacement. The goal is to equip participants with the ability to critically assess and responsibly deploy generative AI in healthcare settings.
Haolin (Leo) Li, PhD, is an Assistant Professor of Biostatistics at the Boston University School of Public Health and an Adjunct Assistant Professor of Biostatistics at UNC Gillings School of Global Public Health. Dr. Li develops statistical methodology for the design and analysis of complex clinical trials and real-world data. His research focuses on (1) machine learning–based prediction and robust inference for studies under complex sampling designs (e.g., case-cohort and survey-based studies), and (2) the design and analysis of precision medicine clinical trials in the presence of treatment effect heterogeneity. His work is application-driven and industry-focused, with close collaborations with clinical investigators, pharmaceutical scientists, and public health researchers.
Machine learning methods have become increasingly important tools for analyzing survival data, offering flexible alternatives to traditional statistical models. This short course provides a practical and accessible overview of modern approaches, with an emphasis on big-picture understanding rather than technical detail. Topics include evaluation metrics such as the integrated Brier score (IBS) and concordance index (C-index), along with methods ranging from penalized regression and survival trees to ensemble approaches such as random survival forests, gradient boosting machines, and super learning. We will also introduce deep learning methods for survival outcomes. Real-world case studies with computing code will illustrate applications. The course is intended for participants with prior working knowledge of survival analysis (e.g., Kaplan–Meier estimation, logrank test, and Cox proportional hazards model) who want a broad overview of machine learning methods in this area.