Sequence Space

An evolving protein must traverse sequence space. What determines which pathways are accessible?


We found that...

Current projects:

  • Can we find an observable signature of ensemble epistasis?
  • Can we use observed epistasis to dissect protein biophysics?
  • How does connectivity in a map scale with dimensionality?

Evolution of innate immune proteins

How do complicated, multi-component, multifunctional systems evolve?

We are studying the co-evolution of five proteins that play intersecting roles in vertebrate innate immunity: Toll-like receptor 4, MD-2, CD14, S100A8, and S100A9. Our goal is to better understand both how these molecules work and how they evolved their functions. To achieve this goal, we are using phylogenetics, ancestral sequence reconstruction, ex vivo functional assays, biophysical measurements, simulations, and high-throughput experiments.


We found that...

Current projects:

  • How does S100A9 activate Toll-like receptor 4?
  • How do evolutionary processes such as arms races shape the evolution and biochemistry of Toll-like receptor 4 and MD-2?

Evolution of specificity

S100 proteins are small, calcium signaling molecules that bind to a wide variety of peptide targets. We are using them as a model to understand how proteins evolve new specificity over evolutionary time.


We found that...

Current projects:

  • What evolutionary forces shape the changes in binding set?
  • To what extent are changes in binding set coupled to changes in function?

Open source software

Our goal is to develop useful, high-quality, intuitive, and well-documented scientific software. To see all Harms lab software, check out our github page.


pytc allows complicated global fits to Isothermal Titration Calorimetry data. Features include an intuitive API, Bayesian and maximum-likelihood analyses, and the ability to easily define new thermodynamic models.


Duvvuri et al. (2018) Biochemistry 57(18):2578-2583


HOPS (Hunches from Oregon about Peptide Specificity) takes high-throughput peptide scores (for example, enrichment values from a phage display experiment) and uses machine learning to train predictive models from the calculated physio-chemical properties of each peptide.

Wheeler et al. (2020) bioRxiv


Software for studying statistical, high-order epistasis in genotype-phenotype maps. You can use this library to:

  1. Decompose genotype-phenotype maps into high-order epistatic interactions
  2. Find nonlinear scales in the genotype-phenotype map
  3. Calculate the contributions of different epistatic orders
  4. Estimate the uncertainty in the epistatic coefficients

Sailer, Z. R., & Harms, M. J. (2017). Genetics, 205(3), 1079-1088.


GPSeer uses a simple, straightforward approach to infer the missing phenotypes from an incomplete genotype-phenotype map, with well-characterized uncertainty in its predictions. Such knowledge allows robust and statistically-informed analyses of features of the map, such as knowledge of possible evolutionary trajectories.

Sailer, Z.R. et al (2020). PLOS Computational Biology 16(9):e1008243.

genotype-phenotype map support libraries

Our studies of genotype-phenotype maps rely on a core set of libraries that we have released as individual packages.

  • gpmap: A Python API for managing genotype-phenotype map data: GPMap defines a flexible object for managing genotype-phenotype (GP) map data. At it's core, it stores all data in Pandas DataFrames and thus, interacts seamlessly with the PyData egosystem.
  • gpgraph: Genotype-phenotype maps in NetworkX. GPGraph follows NetworkX syntax. Initialize a graph, add the genotype-phenotype map object, and draw the graph.
  • gpvolve: A Python API for simulating and analyzing evolution in genotype-phenotype space. This can be used to build a Markov State Model from a genotype-phenotype-map, find clusters of genotypes that represent metastable states of the system, compute fluxes through the map using Transition Path Theory, and visualize the outputs from the above.