Many steps in biological research pipelines involve the use of machine learning models, and these have become standard tools for many basic problems. Elaborations on basic machine learning models ("ensembles" of machine learning models) can provide improvements in accuracy compared to standard usage, for various biological questions. However, the design of these ensembles has been fairly ad hoc, and their use can be computationally intensive, which reduces their appeal in practice. This project will advance this technology by developing statistically rigorous techniques for building ensembles of machine learning models, with the goal of improving accuracy. The project will also develop methods that use these ensembles for new biological problems, including protein structure and function prediction. Broader impacts include software school, engagement with under-represented groups, and open-source software.<br/> <br/>Profile Hidden Markov Models (i.e., profile HMMs) are probabilistic graphical models that are in wide use in bioinformatics. Research over the last decade has shown that ensembles of profile HMMs (e-HMMs) can provide greater accuracy than a single profile HMM for many applications in bioinformatics, including phylogenetic placement, multiple sequence alignment, and taxonomic identification of metagenomic reads. This project will advance the use of e-HMMs by developing statistically rigorous techniques for building e-HMMs with the goal of improving accuracy and improving understanding of e-HMMs, and will also develop methods that use e-HMMs for protein structure and function prediction. Broader impacts include software schools, engagement with under-represented groups, and open-source software. Project software and papers are available at http://tandy.cs.illinois.edu/eHMMproject.html.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.