PQHS 501: Adapting Tabular Foundation Models for Robust Survival Analysis in Small and Complex Cohorts

Event Date:
January 16th 9:30 AM - 10:30 AM

Yan Zou

.

Epidemiology and Biostatistics PhD student Yan Zhou presenting. If unable to attend in person in Biomedical Research Building room 105, you may :
Meeting ID: 958 2937 2435 and Passcode: 087450

The general overview:
Effective clinical decision-making depends heavily on understanding the timing of
disease progression and intervention success, which requires the ability to accurately
predict time-to-event outcomes. However, a significant methodological gap persists
when analyzing small, complex cohorts in clinical settings. This scenario is frequently
encountered in rare disease epidemiology, precision medicine subgroups, and pilot
intervention studies.

While survival analysis is important in fields ranging from medicine to finance, traditional
methods like the cox proportional hazards model often struggle in these contexts. They
provide statistical stability but rely on strong assumptions that cannot capture high-
dimensional feature interactions, especially when the number of features approaches or
exceeds the number of patients. Conversely, modern machine learning alternatives in
survival analysis are more flexible but also typically require large sample sizes to avoid
model overfitting. This makes them unreliable for the limited sample sizes that exist in
many critical clinical problems, a problem further complicated by missing values and
competing risks.

To help with this problem, we present a novel protocol that repurposes Tabular
Foundation Models, specifically Prior-Data Fitted Networks (PFNs), for survival analysis.
Unlike standard deep learning algorithms that must be trained from scratch for each
new dataset, our model uses a transformer architecture pre-trained on millions of
synthetic datasets to approximate bayesian inference via in-context learning. We adapt
this capability to the survival domain and transform continuous time-to-event data into a
discrete-time framework.

We validate this framework using simulation studies and real-world high-dimensional
datasets to demonstrate its predictive utility, especially in data-scarce regimes. By
successfully adapting foundation models to small-data challenges in survival analysis,
this research offers epidemiologists a reliable tool to uncover subtle risk factors and
heterogeneity in complex populations.