Bayesian inference of reassortment networks reveals fitness benefits of reassortment in human influenza viruses
Bayesian inference of reassortment networks reveals fitness benefits of reassortment in human influenza viruses
Reassortment is an important source of genetic diversity in seg- mented viruses and is the main source of novel pathogenic influenza viruses. Despite this, studying the reassortment process has been constrained by the lack of a coherent, model-based infer- ence framework. Here, we introduce a coalescent-based model that allows us to explicitly model the joint coalescent and reas- sortment process. In order to perform inference under this model, we present an efficient Markov chain Monte Carlo algorithm to sample rooted networks and the embedding of phylogenetic trees within networks. This algorithm provides the means to jointly infer coalescent and reassortment rates with the reassortment network and the embedding of segments in that network from full-genome sequence data. Studying reassortment patterns of different human influenza datasets, we find large differences in reassortment rates across different human influenza viruses. Additionally, we find that reassortment events predominantly occur on selectively fitter parts of reassortment networks show- ing that on a population level, reassortment positively contributes to the fitness of human influenza viruses.
phylogenetics | phylodynamics | infectious diseases | BEAST | MCMC
Through rapid evolution, human influenza viruses are able to evade host immunity in populations around the globe. In addition to mutation, reassortment of the different physically unlinked segments of influenza genomes provides an impor- tant source of viral diversity (1). If a cell is infected by more than one virus, progenitor viruses can carry segments from more than one parent (2). With the exception of accidental release of antigenically lagged human influenza viruses (3), reassort- ment remains the sole documented mechanism for generating pandemic influenza strains (e.g., refs. 4-6).
To characterize reassortment events, tanglegrams, comparison between tree heights (7, 8), or ancestral state reconstructions (9) are typically deployed. These approaches identify discordance between different segment tree topologies or differences in pairwise distances between isolates across segment trees. Tangle- grams in particular require a substantial amount of subjectivity and have been described as potentially misleading (10).
While the reassortment process has been intensively studied (e.g., refs. 7-9 and 11), there is currently no explicit model-based inference approach available. We address this by introducing a coalescent-based model for the reassortment of viral lineages. In this phylogenetic network model, ancestral lineages carry genome segments, of which only a subset may be ancestral to sampled viral genomes. As in a normal coalescent process, net- work lineages coalesce (merge) with each other backward in time at a rate inversely proportional to the effective popula- tion size. We model reassortment (splitting) events as a result of a constant-rate Poisson process on network lineages. At such a splitting event, the ancestry of segments on the original lin- eage diverges, with a random subset following each new lineage. We thus explicitly model reassortment networks and the embed-
17104-17111 | PNAS | July 21, 2020 | vol. 117 | no. 29
ding of segment trees within these, allowing us to infer these parameters from available sequence data.
The reassortment process modeled in this way differs from other recombination processes in that it is known where on the genome recombination of genetic material occurs and in that there is no ordering of the segments. The lack of linkage between segments means that at a reassortment event, any subset of segments can originate from either parent.
In order to perform inference under such a model, the reas- sortment network and the embedding of each segment tree within that network must be jointly inferred. This is similar to the well-known and challenging problem of inferring ancestral recombination graphs (ARGs). While many approaches to infer- ring ARGs exist, some are restricted to tree-based networks (12, 13), meaning that the networks consist of a base tree where recombination edges always attach to edges on the base tree. Other approaches (e.g., ref. 14) rely on approximations (15) and are not applicable to the reassortment model due to its aforementioned lack of segment ordering. Completely general inference methods exist (16), but these are again not directly
Significance
Significance
Genetic recombination processes, such as reassortment, make it complex or impossible to use standard phylogenetic and phylodynamic methods. This is due to the fact that the shared evolutionary history of individuals has to be represented by a phylogenetic network instead of a tree. We therefore require novel approaches that allow us to coherently model these pro- cesses and that allow us to perform inference in the presence of such processes. Here, we introduce an approach to infer reassortment networks of segmented viruses using a Markov chain Monte Carlo approach. Our approach allows us to study different aspects of the reassortment process and allows us to show fitness benefits of reassortment events in seasonal human influenza viruses.
www.pnas.org/cgi/doi/10.1073/pnas.1918304117
applicable to modeling reassortment and furthermore tend to be highly computationally demanding.
Here, we introduce a Markov chain Monte Carlo (MCMC) approach specifically designed to jointly sample reassortment networks and the embedding of segment trees within those networks under the coalescent model, without any additional approximations. This approach allows us to jointly infer the reas- sortment network, the phylogenetic trees of each segment, the reassortment and coalescent rates, as well as evolutionary rates.
We first show that this approach is able to retrieve reas- sortment rates, effective population sizes, and reassortment events from simulated data. Secondly, we discuss how a lack of genetic information influences the inference of these parameters. Thirdly, we show how using the coalescent with reassortment can influence the inference of effective population sizes, as well as evolutionary rates. We then apply this approach to quantify reas- sortment across the five seasonal human influenza subtypes, as listed in SI Appendix, Table S1. Finally, we study how reassort- ment rates differ on edges with high and low fitness of these reassortment networks.