![]() Identification of Earth’s virome is a fundamental step in preparing for the next pandemic. To catalyse global virus discovery, we developed the Serratus cloud computing infrastructure for ultra-high-throughput sequence alignment, screening 5.7 million ecologically diverse sequencing libraries or 10.2 petabases of data. Petabases (1 × 10 15 bases) of sequencing data are freely available in public databases such as the Sequence Read Archive (SRA) 1, in which viral nucleic acids are often captured incidental to the goals of the original studies 12. Here we propose an alternative alignment-based strategy that is considerably cheaper than assembly and enables processing of massive datasets. Sequence analysis remains computationally expensive, in particular the assembly of short reads into contigs, which limits the breadth of samples analysed. Pioneering works expanding the virome of the Earth have each uncovered thousands of novel viruses, with the rate of virus discovery increasing exponentially and driven largely by the increased availability of high-throughput sequencing 5, 6, 7, 8, 9, 10, 11. Global surveillance of virus diversity is required for improved prediction and prevention of future epidemics, and is the focus of international consortia and hundreds of research laboratories 3, 4. There are an estimated 3 × 10 5 mammalian virus species from which infectious diseases in humans may arise 2, of which only a fraction are known at present. Viral zoonotic disease has had a major impact on human health over the past century, with notable examples including the 1918 Spanish influenza, AIDS, SARS, Ebola and COVID-19. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 10 5 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. ![]() ![]() Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |