This website requires JavaScript.
DOI: 10.1101/2022.12.27.521992

Protein modularity in phages is extensive and associated with functions linked to core replication machinery and host tropism determinants

B.Smug K. Szczepaniak E. P. C. Rocha S. Dunin-Horkawicz R. J. Mostowy
Phages are known for their genetic modularity in that their diversity can be viewed through the existence of evolutionary independent functional modules that occur and recombine in different combinations. Multiple studies have demonstrated how such mosaicism emerges in natural populations and facilitates adaptation to their hosts, bacteria. However, less is known about the extent of (within-)protein modularity and its impact on phage evolution. To fill this knowledge gap, here we quantified such modularity by detecting instances of `protein mosaicism', defined as homology between two otherwise unrelated proteins. We then used highly sensitive homology detection to quantify protein mosaicism between pairs of 133,624 representative phage proteins, and to understand its relationship with genetic and functional diversity in phage genomes. We found that diverse functional classes often shared homologous fragments and domains, with multiple instances of such mosaicism having emerged relatively recently. We detected the strongest signal for protein mosaicism in receptor-binding proteins, endolysins as well as the core replication machinery, with DNA polymerases as mosaic outliers. We argue that the extensive protein modularity linked to those functional classes is reflective of the co-evolutionary interactions with bacterial hosts but with important differences in the underlying nature of diversifying selection.