How to name a phage

Issue 168 | March 18, 2022
19 min read
Capsid and Tail

Photo by Pierre Châtel-Innocenti on Unsplash

What’s in a phage name, and how can we make it count? Should we give phages pet names or serial numbers? This week Jan Zheng continues his informal blog series on phage data, with a new post on how he’s thinking about naming phages.

Urgent March 18, 2022

Urgent need for Mycobacterium avium phages for a patient in Belgium

Phage Therapy

We are urgently seeking Mycobacterium avium phages for a patient in Belgium

Ways to help at this stage:

  • By sending your phages for testing on the patient’s strain
  • By receiving the strain and testing your phages
  • By receiving the strain and using it to search for new phages against the organism
  • By offering to prepare phages supplied by others to clinical grade
  • By helping spread the word about this request
  • By providing us with names/email addresses of labs you think we should contact

Please email [email protected] if you can help in any way, or if you would like further details/clarification.

Let’s make a difference,
Phage Directory

Urgent March 18, 2022

Urgent need for Morganella phages for a patient in the USA

Phage Therapy

We are urgently seeking Morganella phages for a patient in the USA

Ways to help at this stage:

  • By sending your phages for testing on the patient’s strain
  • By receiving the strain and testing your phages
  • By receiving the strain and using it to search for new phages against the organism
  • By offering to prepare phages supplied by others to clinical grade
  • By helping spread the word about this request
  • By providing us with names/email addresses of labs you think we should contact

Please email [email protected] if you can help in any way, or if you would like further details/clarification.

Let’s make a difference,
Phage Directory

What’s New

Protocol: Simple two-step, high yield protocol for isolation and amplification of bacteriophages against methicillin-resistant Staphylococcus aureus (MRSA)

Bacteriophage isolationProtocol

Review: Recent progress toward the implementation of phage therapy in Western medicine

Phage TherapyReview

Snapshot: Bacterial immunity in response to phage infection

Phage-immune interactionsResearch paper

Pre-print: Phage Paride hijacks bacterial stress responses to kill dormant, antibiotic-tolerant cells

Antibiotic resistancePhage biologyPreprint

Paper: Genome-driven elucidation of phage-host interplay and impact of phage resistance evolution on bacterial fitness

Phage resistancePhage-host interactionsResearch paper

Latest Jobs

Post Doc
Postdoctoral Position at National University of Singapore: The Research Fellow’s role will be to develop a rapid bacteriophage isolation methodology. We have developed a bacteriophage platform and assembled a collection of bacteriophages. We also develop the basic principles for rapid isolation of bacteriophages using flow cytometry and cell-sorting, in combination with multiplexed cultures of target bacteria. One of the aims is to develop the method, validate it and deploy it on microbiome biobanks to mine and reveal the global phageome.
Senior scientist
Janssen Pharmaceuticals is seeking a senior scientist with phage expertise to work at its facility in San Francisco, CA. Email Dr. Mayukh Das (former CPT scientist) at [email protected] for more information.
AnaerobesPhagesResearch Assistant
Research assistant: Join the Yong Loo Lin School of Medicine to work with anaerobes and phages.

Community Board

Anyone can post a message to the phage community — and it could be anything from collaboration requests, post-doc searches, sequencing help — just ask!

Bacteriophages in Science and Clinical Use: First strategic meeting of the German Center for Infection Research to promote translational research in the field of bacteriophages.

Date: 11/12th of July 2022 in Frankfurt, Germany.

The symposium aims at connecting German researchers and physicians. Together we want to define a common roadmap for future translational phage research.

Questions? Ask [email protected]

Symposium

Hi all! I am a student at a Data Analysis school in France (“OpenClassrooms with ENSAE-ENSAI”) and we have the possibility to do a free project (has to put to use stats/data methods). I am interested in doing something related to phages. I am not familiar with open source biology databases and not confident defining a problem to attempt to solve. I was wondering if someone might be able to help me with that? I’ll quickly mention I recently passed a “molecular biology : transcription” EdX course, my Python >> R skills, contact me at tomsadigh [at] protonmail [dot] com. — Thomas Sadigh Rezvani

Seeking mentorshipSeeking project

How to name a phage

Profile Image
Product designer and co-founder of Phage Directory
Co-founder
Phage Directory, Atlanta, GA, United States
Skills

Bioinformatics, Data Science, UX Design, Full-stack Engineering

Jan Zheng is a co-founder of Phage Directory and has a Master of Human-Computer Interaction degree from Carnegie Mellon University, and a computer science and psychology background from UMBC.

For Phage Directory, he takes care of the product design, full-stack engineering, and business / operations aspects.

Once we’ve thought about what we need to know about our phages, we can start giving them identities. How we identify our phages tells us how we should name them. Do we affectionately name them after our pets (“Rue”), or do we give them a serial number (“NCTC 12673”)? Or do we use both?

In this second post in the informal and conversational “Phage Data series”, I’ll step through how I think about naming our phages for Phage Australia.

When we think about the identity of a person, place, or thing, the first thing we think of is its name. Names are powerful — merely having a name for some thing helps us talk and think about the thing. When everyone agrees on what to call the thing, the thing gains identity and a place in the world.

For our phages, names help us think and talk about them — both from within and beyond our lab. Names help us compare them, tweet them, publish on them, and accession them into Genbank. Names help us accession our phages into collective culture.

Names are powerful, and memorable names are doubly so. How will others cite and mention our phages? How will the world talk about them when they’re used to prevent the amputation of a 7-year-old girl’s leg?

Access(ion)ing our phages

We need a name when we need to find our phage from our database, from our freezers, from around the world. When creating a phage accessioning system, I think of using a library or museum as an analogy for Phage Australia. We’re a library actively curating multiple collections of phages. How do we think about accessioning (recording new phages) into our own collection?

Phages can come from the wild (isolated in the lab) or from another lab or biobank. Phages can be evolved or engineered. They can be shared with other labs, who then evolve or engineer their own versions. Multiple, slightly different versions will exist around the world. How do we think about these phages, and how would we know how they’re related? How do we refer to our own copy of a phage, versus some other biobank’s exact copy of that phage? Are they related from a genomics or from a provenance standpoint?

How could we name our phages in a way that embeds some of this hidden knowledge?

If we (in the near future) create some inter-library loan system for phages, how should we be able to search for, request, deposit, share, and access these phages?

What’s in a name?

“That which we call a T4 by any other name would plaque just the same” — Shakespeare (if he was a phage biologist)

The primary goal of a name is communication. Whatever you choose to call your phage, make sure that everyone agrees on its name. To aid in this goal, make it memorable. “Rue” is much easier to remember than “NCTC 12673”. SEA-PHAGES lets students name the phages they find, adding a degree of inclusivity and ownership.

A phage by the name of “Kangaroo” could go viral on Twitter (or get cited in articles) much more easily, merely by being memorable. To increase citations and mentions, try to give phages names that help them track back to your lab.

At Phage Australia, we call these “common names”. Common names are easy to pronounce, easier to remember, and harder to misspell or get wrong (was it 12673 or 12637?). Easy to remember phages are also easily found. If we need to find any data for it, we just search for “phage Kangaroo”.

Starting with common names

Naming our phages can be as simple or as complicated as we want. Naming a few dozen phages should be straightforward. ICTV suggests combining isolation strain’s genus, the word “phage”, and a common name. Examples include “Escherichia phage T4” or “Mycobacterium phage Muddy”.

basic naming.jpg

Using a common name like “Muddy” makes the phage memorable. It also acts as an “accession ID”. For example, here’s Muddy’s data on PhagesDB: https://phagesdb.org/phages/Muddy/

Simple common names are memorable, and great for daily use. However, if we plan on isolating/organizing hundreds or thousands of phages, giving each phage a distinct, memorable name will quickly become a chore.

When we start dealing with hundreds or thousands of phages, we will want to have a naming system. A system helps us add context to our common phages. For example, there are many people named “Jessica”. “Jessica Sacher” and “Jessica Simpson” are separated by a surname. Think of “given name” and “surname” as a naming system.

surnaming.jpg

Surnames are additional names we give to people to help us set them apart. To set our own phages apart, we could also use “surnames”. Surnames can help us identify the context or heredity of our phage. For example, was the phage isolated from an aubergine, or from the bottom of a riverbed?

Our lab will soon collect and manage lots of phages, and not every phage will be noteworthy. Some of our isolated phages we might save for future characterization. For these phages, we want a system that lets us easily recall and identify them, without needing common names. We want a flexible system that lets us use isolation strain and series names, on top of an optional common name, to add more context.

Adding complexity to phage names

For larger phage collections, a more contextual naming system will help us organize the phages in both our spreadsheets and freezer boxes.

We’re thinking of a more complex “surname” system that adds more context to the name.

For example, phages named “Pa1”, “Pa2”, …”Pa1000” indicates a set or series of phages isolated on P. aeruginosa. Alternatively, “WasteWater1”, “WasteWater2”… indicates source, ie. phages isolated on waste water.

Combining the two concepts, we get “WasteWater Pa1”, “WasteWater Pa100”, “WasteWater Ec1”, … “WasteWater Ec100” — this identifies the waste water series of phages isolated on both E. coli and P. aeruginosa.

We can still use common names! We could give “WasteWater Pa5” a memorable and Australian name like “Kangaroo” — this is the name we call it in the lab, in publications, and on social media. The full database name (”accession ID”) is “WasteWater Pa5 Kangaroo”.

naming system.jpg

In this example, “ChickMale1” or “Wastewater” is the source or series name, “Ec1” or “Pa5” is the surname (using isolation strain and a number, e.g. “fifth phage found), and “Kangaroo” is the common name.

Not all phages would need a common name — some could just be called “WasteWater Pa5”.

If we choose to publish and accession a common name for a phage, we should always make sure to make it memorable and unique. We should also Google the name and make sure it doesn’t already exist in either phagesdb.org or Genbank.

SEA-PHAGES has a handy guide for naming phages here: https://phagesdb.org/namerules/

Choosing a more pedantic naming system

If we need other labs to interact with our bank (e.g. request, borrow, donate/deposit, stockpile/cache, or even invest in phages), we would need to choose a more flexible accession system.

We would need to accommodate other biobanks’ naming strategies. Phages also need to account for provenance by using other labs’ identifiers as “surnames”. We might also take in non-phage material, so we should indicate that a sample is a phage (”vB”, as “virus of bacteria”, as opposed to a lysin, synthetic phages, etc.).

naming system complex.jpg

Some name example names could be “vB_Eco_NSW_ChickMale1_Ec1_CluckVader” and “vB_Pae_NSW_JIPa1_FlyingFox”. In this example, NSW stands for where the phage was isolated (”New South Wales”) and “JIPa1” stands for “Jon Iredell lab, P. aeruginosa phage number 1”, which is the first P. aeruginosa phage isolated in the lab.

Though more flexible, this naming system is complicated, pedantic, and obviously impossible to pronounce. This extra information adds a lot more contextual information and makes the phage names more easily machine readable. This in turn will make it easier to write code that can search, filter, and display phages in interfaces, apps, and reports.

When we add dozens or hundreds of phages from other collections, this extra flexibility allows us to accession others’ phages into our own collection’s identification system — without confusing where they came from. The “lab phage identifier” section should accommodate any other labs’ naming systems.

A more pedantic accessioning / naming system that includes provenance helps us track down the provenance (where a phage has come from) more easily. It helps us identify phages in our own biobank vs. other biobanks. If genetic drift or other issues occur, this naming system helps us indicate that a change in the phage has occurred in our biobank. This complicated naming system is primarily useful within the database and freezer system. Whether authors will want to publish using the accessioned database ID or its common name is purely up to the authors.

Internally, we’d still refer to our phages by its lab identifier (Pa1) or common name (FlyingFox)!

Different phages of the same name

With any kind of arbitrary naming system, sooner or later you will end up with problems like name collisions (the same name points to two different phages). This is ok, as long as you’re able to “resolve” the name of the phage to an actual box in the freezer, or row in a spreadsheet or database (e.g. if you know where the phage came from). This is why it’s very important to add a “surname” or a “group identifier” to a phage name, to help us trace the provenance. We don’t want too many “John Doe” phages floating around biobanks and publications.

At Phage Australia (and as Phage Directory) we’re working on a simple phage name registration/idenfification system, similar to the DOI system for papers or clinicaltrials.gov for clinical trials.

Naming rules can get complicated very quickly. Evelien from ICTV actively discourages adding too many characters to the name, like dashes and underscores. Ideally, she says, names should be as simple as possible.

Sequence-accurate identifiers

If we need a lot more precision around naming a phage, we could consider a phage’s characteristics in relation to its genetic and conditional characteristics (e.g. host range or plaque morphology), its abilities to lyse, or its sequencing information. If a phage is engineered, mutated, or experiences genetic drift — at what point does it become a different phage? At what point does it matter that we identify it as a different phage? E.g. do we consider a mutant “different” once it plaques on different hosts? At this point, do we give it a new name?

We might identify a phage purely by its genome sequence. This allows us to fully disconnect its identity with its characteristics and abilities. Full sequences are very hard to remember as names though — This is where hashing algorithms come in.

For example, let’s take the T4 phage genome. We can hash the contents of its fasta file using an algorithm like sha256 and turn the entire sequence into a slightly shorter line of numbers and letters. Here’s an example of what a hashed ID of a T4 fasta file looks like: “a4792ca503e4b70ff194fb981606a137dd73b43589329ca7612d36c02f08f9eb”.

Even though the hash is long, and only readable by machines, it still fits into a spreadsheet or database entry better than an entire Fasta file.

If we want to go further, we could use a web3 tool like IPFS (https://ipfs.io/) to turn the T4’s genome into a URL. This URL acts both as an ID to the genome and as a link to the fasta file itself. This is really useful for building something like Genbank:

https://dweb.link/ipfs/bafkreifepewkka7ew4h7dfh3talanijx3vz3inmjgkokoyjng3ac6chz5m

Sometimes we might have a collection of phages that are all labelled as T4, but whose genome sequences appear slightly differently from each other (e.g. because of drift, mutations, or sequencing errors). When hashed, these sequences would all have a different values — because their genome sequences all differ. Though their sequences differ, we might not want to classify them as uniquely different.

We then compare the sequences with each other and analyze their sequences’ hamming or Levenshtein distances or ANI (average nucleotide identity). Basically, this is a number that tells us how different genomes are from each other. Using this number, we can say something like “every phage sequence that has less than 3% sequence difference should be classified as the same phage” — we could set our own arbitrary cutoff numbers depending on our own needs.

If we later find out that one of those mutants exhibits different characteristics (e.g. different host range) from the others, we might give it a new identity, as it’s now functionally different from its close relatives. We might even give it a new common name!

Combining hashing and hamming/Levenshtein/ANI, gives us a new way to uniquely identify phages and their relationships to each other. As our phages undergo mutations, we end up with phages that appear closer or farther from each other (genomically or otherwise). We could represent these in “graphs”, similarly to how we currently represent family trees (e.g. as weighted, directed graphs)

I am known by many names…

A name is only useful when the majority agrees to refer to it by that name. Some phages that have moved between labs/collections will be called different names throughout publications.

Make sure to have spreadsheet that connects the phages’ common names, internal database IDs, Genbank accession IDs, and any names the phage has previously been known. Make sure it’s up-to-date when new phages are added!

Here’s a handy naming guide from SEAPHAGES: https://phagesdb.org/namerules/. And have fun naming your phages!

Looking ahead

Up to now we’ve explored identity and naming systems for our phages. Next, we can think of our phages like items on a shelf, books in a library, or items in a museum. Like inventories in a store. How do we think about, and manage our phage inventory? How do we keep track of “provenance” and “governance” and properly set up logistics?

We’ll talk through these somewhat dry, yet absolutely vital aspects and biobank requirements in the next issue.

Special thanks to Jessica Sacher, Evelien Adriaenssens, and the Phage Australia team (Ruby Lin, Nouri Ben Zakour, Stephanie Lynch, Jon Iredell) for helping me hash some of these ideas out.

More special thanks to various phage labs and biobanks we’ve spoken to over the years about data management. Some of these labs include: Queen Astrid Military Hospital, Sciensano, the Félix d’Hérelle Reference Center for Bacterial Viruses, DSMZ, ATCC, NCTC, TAILOR, Israeli Phage Bank, The Bacteriophage Bank of Korea, Fagenbank, Citizen Phage Library, Japan Phage Bank, and many more, throughout the years. Thanks so much for putting up with my incessant questioning!

Further Reading

Capsid & Tail

Follow Capsid & Tail, the periodical that reports the latest news from the phage therapy and research community.

We send Phage Alerts to the community when doctors require phages to treat their patient’s infections. If you need phages, please email us.

In collaboration with

Mary Ann Liebert PHAGE

Supported by

Leona M. and Harry B. Helmsley Charitable Trust

Crossref Member Badge