How to design a phage data system

Issue 196 | October 7, 2022
12 min read
Capsid and Tail

DALL·E 2’s interpretation of a phage data system

This week Jan Zheng explores a few design decisions and considerations behind building a system for collecting data from every part of the phage therapy pipeline.

This article is part of the ‘Phage Australia Series’ and the C&T Data Series

Check out our previous posts: How to keep track of phages, How to name a phage, How to organize your biobank data, Spot or Not, and MetaPhage

Kisaco Phage Futures Banner

Returning in-person for the first time since 2019, Phage Futures: USA is back — taking place November 16-17 in Boston, we’re bringing together 170+ leaders in the phage community to accelerate the translation of bacteriophage research into safe & effective clinically, regulatory & commercially viable therapeutics. Find out more about the event by taking a look at our full agenda here.

What’s New

The National Institute of Allergy and Infectious Diseases (NIAID) has begun a clinical trial for phage therapy in adults with cystic fibrosis, under the leadership of IPATH. The trial plans to enroll 72 participants across 16 cystic fibrosis centers throughout the United States. NIH press | IPATH’s FAQ on the trial

Clinical TrialCystic fibrosisPhage Therapy

Phagos, a French phage manufacturing company, announced the closing of a €2.4 million Seed investment round co-led by Demeter (Paris) and Hoxton Ventures (London).

Biotech newsAnimal healthFunding news

Adaptive Phage Therapeutics (APT) has announced that the Defense Health Agency (DHA) has awarded an additional $5 million to support clinical development of APT’s adaptive bacteriophage therapy in treatment of diabetic foot osteomyelitis (DFO).

Biotech newsFunding news

SpyPhage: A Cell-Free TXTL Platform for Rapid Engineering of Targeted Phage Therapies, by Sahan B. W. Liyanagedera and colleagues, proposes the SpyPhage system where a “scaffold” bacteriophage is engineered to incorporate a SpyTag moiety on its capsid head to enable rapid postsynthetic modification of their surfaces with SpyCatcher-fused therapeutic proteins.

Synthetic biologyResearch paper

In this Nature Communications paper, Brieuc Van Nieuwenhuyse (Université Catholique de Louvain) and colleagues report the story of a toddler who suffered from extensively drug-resistant Pseudomonas aeruginosa sepsis after liver transplantation. He was treated by a bacteriophage-antibiotic intravenous combination therapy for 86 days, which was associated with objective clinical and microbiological improvement.

Case studyPhage TherapyResearch paper

Latest Jobs

Phage defense systemsScientist
The Molecular Microbiology lab at Washington University is hiring a staff scientist to characterize new phage defense mechanisms.
PhD projectPhage-antibiotic synergy
The University of Exeter’s Living Systems Institute is inviting applications for a PhD studentship fully-funded by DSTL and the University of Exeter. This project will provide novel understanding on the biological mechanisms underlying the synergy between antibiotics and bacteriophages in eradicating bacterial pathogens.
BiotechResearch Scientist
Phico Therapeutics, a biotechnology company developing novel drugs for the treatment of bacterial infections, is recruiting a Microbiology Research Scientist to join their research team in Bourn, near Cambridge, UK.
The Department of Microbiology and Immunology at McGill University is seeking applications for a Course Lecturer to teach MIMM 391: SEA-PHAGES: Genome Annotation in Winter 2023.
Ecology and evolutionFaculty
The University of Maine is looking for an Associate Professor of Practice in Bacteriophage Ecology and Evolution. This is a full-time, 12-month, non-tenure track appointment with potential for renewal. Review of applications will begin February 1, 2023.
Post DocStructural Biology
The Central European Institute of Technology at Masaryk University is seeking a post-doctoral scientist to join the Laboratory of Structural Virology. The successful candidate will use cryo-electron microscopy and tomography to determine three-dimensional structures of viruses and bacteriophages and characterize their infection of cells.
PostdocPhage engineering
Iowa State University is hiring a Postdoctoral Research Associate to work on an NIH-funded project focused on bacteriophage engineering for therapeutic applications.
FacultyPhage-host interactions
Washington University in St. Louis is hiring a Staff Scientist to join the lab of Dr. Marcelo Jacobs-Lorena, which studies the interactions of bacterial pathogens with their viruses, bacteriophage.
Mobile genetic elementsPhage resistancePost Doc
The Department of Plant & Microbial Biology at the University of California, Berkeley is hiring a Postdoctoral Scholar in the area of phage-bacterial interactions. The Postdoctoral Scholar will conduct laboratory research as part of Dr. Kim Seed’s NIH-funded grant to explore how mobile genetic elements contribute to phage resistance in the diarrheal pathogen Vibrio cholerae.
MicrobiomePost Doc
The Host and Parasite group, Environmental Microbial Genomics group, and the Microbiology and Fermentation group at the University of Copenhagen are offering a 2-year postdoctoral position in the field of comparative metagenomics of the gut (fecal) microbiomes of animal models (minipigs and cockroaches) used as preclinical models for human gut dysbiosis.
Senior ScientistEngineered phages
Felix Biotechnology (South San Francisco, California) is hiring a Senior Scientist to work on engineering bacteriophage genomes using novel and cutting edge technologies.

Community Board

Anyone can post a message to the phage community — and it could be anything from collaboration requests, post-doc searches, sequencing help — just ask!

Dear Phage Family,

I have trouble isolating Listeria phages from environmental samples. I used the method from the article named ‘Silage Collected from Dairy Farms Harbors an Abundance of Listeriaphages with Considerable Host Range and Genome Size Diversity’ by Vongkamjan et al.

Do you have any tips for isolating Listeria phages?
I would be very glad if you send me tips.
e-mail: [email protected]


Phage isolationSeeking suggestions

The National Institutes of Health’s National Center for Advancing Translational Sciences (NCATS) and the National Institute of Allergy and Infectious Diseases (NIAID) are hosting a joint workshop on bacteriophage therapy on October 25, 2022.

WorkshopEventPhage Therapy

How to design a phage data system

Profile Image
Product designer and co-founder of Phage Directory
Co-founderProduct Designer
Iredell Lab, Phage Directory, The Westmead Institute for Medical Research, Sydney, Australia, Phage Australia
Twitter @yawnxyz

Bioinformatics, Data Science, UX Design, Full-stack Engineering

I am a co-founder of Phage Directory, and have a Master of Human-Computer Interaction degree from Carnegie Mellon University and a computer science and psychology background from UMBC.

For Phage Directory, I take care of the product design, full-stack engineering, and business / operations aspects.

As of Feb 2022, I’ve recently joined Jon Iredell’s group in Sydney, Australia to build informatics systems for Phage Australia. I’m helping get Phage Australia’s phage therapy system up and running here, working to streamline workflows for phage sourcing, biobanking and collection of phage/bacteria/patient matching and monitoring data, and integrating it all with Phage Directory’s phage exchange, phage alerts and phage atlas systems.

For phage therapy to be efficient, cost-effective, and reliable at scale, every stage of the phage therapy pipeline — phage hunting, matching, sequencing, manufacturing, administration, and therapeutic monitoring — needs to be measured.

Data collected from each stage tells us that a phage remains safe and viable, and helps us easily track down issues. By looking at patterns in our data, we can replace methods with safer and more efficient alternatives.

To scale up phage treatment, we need to scale up data collection. Consequently, this means that both generating and adding data and drawing useful insights and patterns from the data needs to be super easy.

At Phage Australia, we’re building a data system that tracks and connects our phage and bacterial data with outputs from the rest of the therapy pipeline. We use this data to generate safety and viability reports, and eventually leverage emerging patterns to make our phage therapy process cheaper, faster, and safer.

This post is a continuation of the informal “Data Series” posts and builds on some of the previous concepts like how to keep track of phages, how to name a phage, and how to organize biobank data.

Considerations of a phage data system

Previously, we discussed how to organize and how to accession items into a collection. We covered core identity data (a phage name), descriptive data (where was it from), and instance data (which fridge is the sample in). We also covered derived properties like how the phage behaves against a range of hosts.

A phage data system collects these properties for phages and strains at scale. Most of the data it collects about its phages and strains will be derived — meaning this data will come from experiments, device measurements, plaque assay images, CSVs and spreadsheets, and bioinformatics file outputs.

This is fundamentally a hard problem. The system should aggregate and make sense of both well-structured relational data and semi- and unstructured data. And because we’re discussing phage therapy at scale, it needs to support a diversity of labs, researchers, methods, protocols, and tools.

If we’re designing such a system for both data capture and regulatory accountability, we need to start with a few core assertions about the system’s design. Namely, the system needs to be auditable, replicable, and accessible.

1. The System needs to be Auditable

In order to be auditable, we can borrow concepts from the finance, such as: append-only log (no edits or deletes are allowed; any corrections are added), event-driven architecture (e.g. the available books in a library is a consequence of the number of books previously checked in), and double-entry bookkeeping (every transaction is recorded as a debit and a credit event, and they need match).

The system should also be accountable, meaning each piece of data should be attributed to a lab member and/or the machine that produced the result. Any “mysterious, unaccounted-for data” needs to be invalidated. This can be implemented through account management and lab agreements.

Finally, the system should be tamper-resistant. If there’s any signs the data has been changed, either through editing or data corruption (e.g. faulty hard drive), the data should be invalidated. “Tamper-evident seals” can be created with cryptographic tricks like checksums and merkle trees.

An auditable system creates accountability and trust in the output of each step of the phage production pipeline. Traceable outputs also help us more easily track and fix errors, generate reports for compliance and regulations, cite and reward contributors who provide data, and improve inefficient methods.

Finally, the data we collect and the reports we generate will build a track record for each phage, which should increase our confidence in the phage’s efficacy and safety profile.

This track record also helps us show authenticity and provenance. A phage provided with an auditable track record should inspire more confidence in its safety and efficacy compared to a phage without any data and questionable provenance.

And with provenance and authenticity systems in place, we can start doing neat things like share the phages with others, and collaboratively generate data.

2. The System needs to support Replicability

An auditable system is useless if the data isn’t accurate.

Our data is only as good as our protocols, methods, and machines. If triplicates are necessary for valid results, then replicates from additional labs should further validate those results.

The system needs to support adding replicated data from multiple labs working on phages from the same production batch with the same protocols. Pooling the data allows us to get better averages and detect anomalies. Such a “Proof of Data” mechanism would allow multiple labs to separately confirm and validate a phage’s profile in a decentralized way.

To support replicability, the system needs to be “federated”. This means that labs will generate and collect their own data, but then elect to consolidate their data into a central registry. As long as the lab follows the protocols and collects data in a way that meets requirements, their data (and even their phages and hosts) can be made available to the wider network.

With data federation and replicability, multiple labs across the network could create their own specialized phage collections, send those phages to other labs for replication, and trust that if they followed all the protocols, the federated system could generate reports on their phages and make them generally available.

3. The System needs to be Accessible

For a system like this to work, it needs to be openly available to all. All protocols, processes, requirements, schemas, and tools need to be available and easy to use.

This way, any lab can follow the right protocols and generate the correct kind of data. The data also needs to be openly available for auditing, reporting and further research. This can all be achieved through a combination of web interfaces, reporting tools and APIs.

Lastly, the system needs to be openly available (through the website and through open source), so others can contribute to the growth of the system.

New tools and techniques emerge constantly, and the system needs to account for the rapidly changing landscape.

Only with everyone working together can we hope to create a system that benefits us all.

Looking forward: “bio analytics”

The eventual goal of the system is to turn the data into actionable insights and strategic decisions about how to make phage therapy cheaper, faster, and safer.

In the business world, data scientists inspect, clean, transform, and manage business data for data analysts, who build models on top of the data to inform conclusions and support strategic decision-making. Bioinformatics is discipline of applying data science to biology, mainly with a focus of discovering new biological insights.

The phage field is missing the equivalent of “data analysts” who build models on top of data provided by bioinformatics and wet lab data to inform conclusions and support strategic decision-making for the phage therapy process itself.

In the next issue, we’ll dig deeper into data analytics, and explore what a data pipeline for phage therapy might look like.

Capsid & Tail

Follow Capsid & Tail, the periodical that reports the latest news from the phage therapy and research community.

We send Phage Alerts to the community when doctors require phages to treat their patient’s infections. If you need phages, please email us.

Sign up for Phage Alerts

In collaboration with

Mary Ann Liebert PHAGE

Supported by

Leona M. and Harry B. Helmsley Charitable Trust

Crossref Member Badge