At this year’s Evergreen Phage Conference (here’s our recap), there were three bioinformatics workshops. This week, we’re covering one of them: Phage Genomics with Galaxy and Apollo. The workshop was taught by Dr. Jason Gill, a professor at the Center for Phage Technology at Texas A&M. Before the workshop, we had a primer on phage genome annotation from Dr. Ramy Aziz of Cairo University. The goal of this week’s article is to give a brief overview of what we learned about phage genome annotation from both.
Annotation for beginners
Before the workshop began, a show of hands showed that very few participants had annotated a genome before (myself included!).
What is annotation?
Even though we commonly hear talk of what annotation means, and we probably generally know it’s required, what does it actually entail? Essentially, once you have a fresh genome sequence, you need to “call genes”; this means figuring out where each gene (likely) starts and stops. After that, each hypothetical gene is checked against a database of other genes to assign it a likely function. All of this data (on where genes start and stop, and what they likely encode for) is then ideally stored along with the genome sequence, and this “annotated genome” is typically submitted to a database (like GenBank) so others can see it and compare it to their own genomes.
Annotation tools
There are a lot of computational tools available for genome annotation, but there’s no one tool to do it all. Rather, to get a reasonably accurate annotation for your new genome sequence, a good approach is to use several different tools (some that look for the starts of genes, some that look for stop codons, some that look for tRNA genes, some that look for terminators, some that are more stringent, others more relaxed, and so on). During and after the process, a discerning eye to catch when the computer suggests something that doesn’t make sense is also key to an accurately annotated genome.
What is Galaxy?
Galaxy is a web-based bioinformatics platform that can be used for genome annotation, among other things (you can also use it to do genome assembly, comparative genomics, mutational analysis, and more). It’s a way of using a multitude of different bioinformatics tools available, like BLAST and SPAdes, without needing to use command line, write code, or manually track what you’ve done (it tracks it for you!). It has a lot of file converters built into it, meaning you don’t need to worry about converting your sequence data into different file formats according to the requirements of each tool.
What is WebApollo?
Galaxy has an interactive genome visualizer built into it called WebApollo. It even allows collaborative genome annotation (like Google Docs, but for genomes). WebApollo is key to why Galaxy is easy to use.
The Evergreen Galaxy Workshop
In the first half of the workshop, we spent a couple of hours learning about Galaxy and how the Center for Phage Technology (CPT) at Texas A&M uses it (it has developed its own instance of Galaxy that’s tailored to phages). Fortunately, this was a beginner-friendly workshop (I am most definitely a beginner, and I didn’t feel lost!).
The second half of the workshop involved us all with our laptops out, working through annotating a sample genome. This was incredibly informative, and made me realize that contrary to how I felt during grad school, phage annotation needn’t be fear-inducing.
Bonus advice: overcall genes!
Jason’s advice on calling genes: overcall them! (At least you’ll have some data, which you can analyze further using tools like BLAST; you can always delete called genes later).
Comparing Galaxy with PATRIC for genome annotation
Another Evergreen workshop covered using PATRIC for genome annotation. Differently from PATRIC, annotation via Galaxy is a lot more manual, which has both pros and cons. For instance, you call each gene based on a few predicted possibilities (you choose the one that looks most likely to be correct). This lets you catch errors and bring in more of a discerning eye, and yet on the other hand, this method takes longer, and of course subjectivity comes into play when one does it this way.
A hands-on teaching tool
Jason highlighted that Galaxy can be much better as a teaching tool than something more automated like PATRIC, since Galaxy annotation is more hands-on. Students can see and learn all the steps involved (it’s less of a “black box”).
Accessing Galaxy
Anyone can use the CPT’s instance of Galaxy; you can get a free account and access it from anywhere.
Where can I learn more?
The CPT has a great website with all the tools and tutorials you could want related to Galaxy.
Constantly improving
The CPT writes a lot of bioinformatics tools in house, and they’re in constant talks with Apollo regarding how to improve the software.
Other Evergreen bioinformatics workshops
There were also two other bioinformatics workshops at Evergreen this year: one on annotating genomes with PATRIC (taught by Maulik Shukla and Rebecca Wattam) and one on viromics taught by Evelien Adriaenssens and Alejandro Reyes. We plan to cover these topics in the future.
More bioinformatics in Capsid & Tail
We’re aiming to start covering bioinformatics more frequently in our second year (which is already around the corner!). If you’ve got something to teach the phage community on any aspect of phage bioinformatics, whether it’s a paper to cover, a tool to explain, or something else, we’d love to have you as a guest writer! Email [email protected] if interested, or if you’d like more info.