Systems for diagnostic classifier development
Patient host response, as measured by expression of targeted mRNA biomarkers from blood, can vary from patient to patient. Previous approaches have developed host response signatures and classifiers that tend to generalize poorly, owing to biomarker selection and classifier training and validation on a limited, unrepresentative set of patient observations from a single study or hospital. Pooling of data across multiple studies has proven effective in producing more generalizable host response signatures and classifiers but introduces other methodological challenges.
For example, before being used in standard classifier development, our multi-cohort data must first be co-normalized to minimize spurious variation unrelated to our classification tasks. Also, the manner in which we select our classifiers must move beyond random cross-validation to reflect the structure and heterogeneity in our patient population and to provide more realistic estimates of generalization performance. In addition, our training data have been profiled with multiple assay platforms, none of which are fast enough to enable clinically actionable turnaround times for indications like sepsis.
Our biomarker signatures and classifiers must be able to generalize to measurements obtained on more deployment-ready platforms, possibly without access to such limited data at training time. We addressed these challenges in an important proof-of-concept study (Mayhew et al., 2020a) that helped systematize our process of diagnostic classifier development.