Many of you have probably published a genome sequence for a new phage you’ve isolated. But huge numbers of genome sequences for viruses that haven’t yet been cultured are now being generated. This is thanks to the field of viral metagenomics, where DNA of all viruses in an environmental sample is analyzed at once.
Last week, we announced the publication of a new set of reporting standards for uncultivated viral genome (UViG) sequences. They’re called MIUViG standards (Minimum Information for Uncultivated Virus Genomes).
This week, we’re shining light on what these standards are, how they came to be, and why we should all take note.
What’s an uncultivated viral genome?
A genome sequence of a virus that’s never been propagated in the lab. Thanks to metagenomics and metatranscriptomics, more and more of these sequences are coming in.
How many uncultivated viral genome sequences have been collected?
Apparently, about 750,000 in the last 2 years alone. This is 5x higher than the number of genomes from viruses that HAVE been cultivated. Uncultivated viruses actually account for 95% of the viral genome sequence diversity in public databases.
What can these sequences tell us?
Understanding the diversity of viruses on our planet (most of which are probably phages) can tell us a lot about ecosystems and the roles of microbes within them.
Why do uncultivated viral genomes need (their own set of) standards?
If everyone used the same reporting standards, viruses found in different places, with different techniques, and by different people, could be compared more easily. This would allow much more information to be extracted from the massive amount of data being collected around the world.
Current standards for metadata reporting for other kinds of genomes, which are maintained by the Genomic Standards Consortium, don’t quite work for uncultivated viruses.
Uncultivated viruses are so much more diverse than other organisms, and their identification depends on computational methods. This means that the completeness and quality of their genomes, their taxonomy and their ecology each need to be evaluated in a way that takes these unique aspects into account.
Who made the new set of standards?
This was a joint effort by government and academic researchers in 15 countries (US, UK, Netherlands, Canada, France, Belgium, South Africa, Egypt, Germany, Australia, Japan, Spain, Austria, Colombia and Switzerland). The effort was led by the US Department of Energy Joint Genome Institute.
What needs to be reported?
Mandatory information* includes:
- Source (type of dataset)
- Genome assembly software
- Virus identification software (tool you used to determine that the sequence was viral)
- Predicted genome type and structure
- Detection type
- Number of contigs
- Assembly quality (see below)
*The reasons why this info is so important for uncultivated viruses, as well as instructions on how to do these analyses, are clearly detailed in the MIUViG standards paper.
Three new classifications for genome quality
- Genome fragments: fragments <90 percent complete, no estimated genome size, minimal annotation
- High-quality draft genome: >90 % of the complete expected genome sequence, gaps span mostly repetitive regions
- Finished genome: complete genome, single contiguous sequence, no gaps, extensive annotation
Are you avoiding these pitfalls?
- Misidentification of a cellular sequence as viral
- Partial genomes assembled as circular contigs
- Errors in gene prediction
- Inaccurate functional annotation
- Clustering of partial genomes
- Taxonomic classification
- Read mapping from nonquantitative datasets
Where should you submit your data?
Submit your sequence and metadata to any International Nucleotide Sequence Database Collaboration (INSDC) member database, such as:
Other interesting tidbits
If you’d like to learn more, please read the paper (open access) by Roux et al., which was our main source for this issue.
Thanks for reading!
– Jessica <>={