Discovering coding (& building an AI tool) as a biologist, thanks to ChatGPT

Issue 220 | April 21, 2023
35 min read
Capsid and Tail

This week, Jessica Sacher shows you how, as a non-coder, she got ChatGPT to help her make a chatbot that knows everything we’ve published across 220 Capsid & Tail blog posts (almost every week for the last 4.5 years!), how you can play with it, and how you can build your own!

A version of this post was published earlier this week on Jessica’s TechBio Adventures.

Sponsor
MAL Logo

A message from Martha Clokie, Editor in Chief of PHAGE Journal:

Attendees of the 6th World Congress of Phage Therapy in Paris are welcome to submit papers for a Special Issue in PHAGE Journal. Details to follow!

What’s New

Fixed Phage, a Glasgow based biotech company, has secured £1.5m investment to further scale development and commercialisation of its phage technology in the fight against antimicrobial resistance.

Add a Comment

Biotech funding news

Federico Prokopczuk et al. engineered a superinfective phage (eSI-Pf) based on the Pf4 phage, which is able to infect Pseudomonas aeruginosa. They tested it as a phage therapy tool for the treatment of life-threatening burn wound infections and found that while it was not able to maintain a sustained reduction in bacterial burden, it did prevent dissemination of P. aeruginosa from the burn site.

Add a Comment

Engineered phageResearch paper

Talia Backman and colleagues published a new preprint on how a weaponized phage suppresses competitors in historical and modern metapopulations of pathogenic bacteria. The authors discovered a viral cluster shared amongst pathogenic Pseudomonas isolates, which encodes a tailocin — a phage-derived interbacterial warfare element.

Add a Comment

Phage-host interactionsResearch paperTailocin

Capsules and their traits affect phage susceptibility and plasmid conjugation efficiency, as per a preprint by Matthieu Haudiquet et al. They show that capsule swaps invert phage susceptibility and affect conjugation efficiency, with implications for population dynamics and horizontal gene transfer.

Add a Comment

Capsular polysaccharidesPhage resistanceResearch paper

Michael Schelling et al. published a new paper in PLOS Biology on how CRISPR-Cas effector specificity and cleavage site determine phage escape outcomes. They find that most pre-existing crRNA mismatches lead to phage escape, regardless of whether the mismatches ablate Cas12a cleavage in vitro.

Add a Comment

CRISPRPhage defenseResearch paper

Latest Jobs

Sponsored Ad Phages in agricultureMicrobiologist
Research Scientist at A&P Inphatec in Palo Alto, CA

A&P Inphatec, LLC has commercialized the first product to treat Pierce’s Disease in grapevines. Our unique solution utilizes bacteriophages to target and kill the bacteria at the source of the problem – giving wine growers an organic and sustainable solution to a rising threat in California.
A&P Inphatec, LLC is seeking an experienced microbiologist applicant to join our research and development team as a full-time Research Scientist.

Research Technician
Sudhakar Bhandare (University of Nottingham) is seeking a Senior Research Technician to provide technical support to a BBRSC 2022 International Partnership project, plann and oversee day-to-day running of the phage laboratory, and provide technical supervision/training in phage biology equipment and techniques.
ModelingPhage banksPost Doc
The Ohio State University College of Pharmacy is seeking a Post Doctoral Scholar to assist in coordinating datasets and laboratory activities for both microbial and phage banks. The candidate will perform research projects at the P3 Research Laboratory, using cutting-edge aspects of microbiology, infectious diseases, pharmacokinetic and pharmacodynamic modeling, and in vitro assays.
Post DocBioinformatics
Thomas Sicheritz-Pontén is on the lookout for a postdoc candidate experienced with knowledge graphs, TypeDB, or Neo4j. Join the PhageCompass team after summer and help us tackle marine ecosystems and phage genomes!

Community Board

Anyone can post a message to the phage community — and it could be anything from collaboration requests, post-doc searches, sequencing help — just ask!

From Timofey Skvortsov via Twitter: Hi all, where are you sequencing your new phage/bacterial isolates in the UK/Ireland these days? I’d appreciate any suggestions or insights!

Add a Comment

Seeking suggestions

In celebration of International Women’s History Month, this episode of the Phage Files podcast looks back at female phage researchers’ contributions to early phage research. The hosts interview Dr. Elizabeth Kutter (Evergreen State College), who is considered the Mother of Phage Research in the modern scientific era, and Prof. Martha Clokie (University of Leicester), 2020’s recipient of the Outstanding Women in STEM.

Add a Comment

InterviewWomen in STEMPodcast episode

In this episode of Breakpoints podcast, Dr. Saima Aslam and Dr. Daria Van Tyne join Dr. Rachel Britt for a discussion on phage therapy! Tune in for a review of their role in therapy, logistical considerations, the future of phage therapy, and more!

Add a Comment

Phage therapyPodcast

Discovering coding (& building an AI tool) as a biologist, thanks to ChatGPT

Profile Image
Phage microbiologist and co-founder of Phage Directory
Co-founderPostdoctoral Researcher
Iredell Lab, Phage Directory, The Westmead Institute for Medical Research, Sydney, Australia, Phage Australia
Skills

Phage characterization, Phage-host interactions, Phage Therapy, Molecular Biology

I’m a co-founder of Phage Directory and have a Ph.D in Microbiology and Biotechnology from the University of Alberta (I studied Campylobacter phage biology). For Phage Directory, I oversee community building, phage sourcing, communications, science, and our awesome team of volunteers.

As of Feb 2022, I’ve recently joined Jon Iredell’s group in Sydney, Australia as a postdoctoral research scientist for the Phage Australia project. I’m diving back into the lab to help get Phage Australia’s country-wide phage therapy system up and running here, working to streamline workflows for phage sourcing, biobanking and collection of phage/bacteria/patient matching and monitoring data, and integrating it all with Phage Directory’s phage exchange, phage alerts and phage atlas systems. I’m also delving into phage manufacturing and quality control.

A version of this post was published earlier this week on Jessica’s TechBio Adventures.

Omg you guys! I coded! And I made something useful — a chatbot that knows everything we’ve ever published here in Capsid & Tail. Here’s how I did it, how you can play with what I built, and how you can build your own. And lastly, why I think the fact that I could do this at all, as a non-coder, in a single weekend, has massive implications…

You’ve probably all heard about ChatGPT by now. And about large language models in general. It’s the talk of the town! And like anyone with at least one tech-savvy friend, I too have heard about little else for weeks now.

I was a huge fan of it from the beginning — who wouldn’t be; it’s magical. Gradually, ideas started bubbling. Could this thing write emails for me? Help me write blog posts? Help me troubleshoot experiments? I played around with it and found a ton I could do (even science-related stuff, like turn my messy notes about my experiments into clean bullet points and even a flowchart!).

Needless to say, I was hooked. Meanwhile, my partner Jan was working with ChatGPT on another level. As a programmer, he was diving into giving ChatGPT more superpowers than it already had. For example, he found a way to give ChatGPT my entire 300-page PhD thesis on Campylobacter phage glycosylation, then have a chat about it.

This was pretty crazy, given I’d already encountered the word limit on ChatGPT (like many of us, I presume). How could I get these superpowers? How could I give ChatGPT a lot more information, such as access to my lab’s research papers, so I can quickly check which strains or mutants we’ve tested in a particular assay, or quickly grab one of our methods based on a quick question instead of opening a bunch of files and using control+F a bunch of times. Could I give ChatGPT a new paper and get it to explain it like I’m 5? Could it ingest a bunch of papers on cloning and act as my personal cloning wizard friend?

I expressed this to Jan, and he was way ahead of me. “Yes, you need to do this tutorial. Right now.”

Ooh! A tutorial! I opened this, and made this face 😱:

fig 1 pic of tutorial

It seemed cool based on the title of course. I’d love to be able to build something like that. But why did it have to use words like ‘embeddings’ and ‘vector store’? Why would Jan think I would be able to do this tutorial? I can’t code / had only really done Codecademy basics (one of my pandemic hobbies I took up for a while and then set back down)…

Jan said, “Do it. This is the Next Big Thing. Ask ChatGPT if you get stuck.”

I decided to try. I figured this could be like when I learned knitting by forcing myself to start with a weirdly hard pattern because the lady at the yarn store said it was ‘for beginners’, and I believed her. She said ‘use youtube if you get stuck’. (Fun fact, thanks to the million knitting youtube videos, not only was I was able to do it, but the knitting pattern creator lady even asked if she could add my finished scarf to her pattern page on Ravelry.com!! Life highlight).

fig 2 ravelry
Flashback to when I became Ravelry-famous

So I decided to press on and hope for a similar outcome with this… Worst case scenario, I’d learn something about coding, and see if I could actually do it / might like it. Best case, I’d unlock the power to give ChatGPT superpowers, which I could use to help me as a scientist/startup founder.

Struggling through a ‘build a chatbot’ tutorial with ChatGPT’s help

First I tried following the tutorial by Marc Päpper. I saw that the first step was to ‘use the website’s sitemap.xml to query all URLs’. Hmm, what is a ‘sitemap.xml’? I asked ChatGPT, who said it was a list of URLs on a website, and said you could look it up for any website (though it may or may not exist). I tried to get ours for phage.directory as a test… but try as I might, I couldn’t. I wasn’t about to tell Jan, as I didn’t want him to help me yet…

Instead of solving that issue right away, I changed course and decided I’d try to do the tutorial with the author’s site map, since he seemed to have one. I just wanted to see if I could replicate what he’d done. This turned out to be a good plan.

Before one can code, one needs an ‘environment’ to code in…

The next step in the tutorial was ‘Download the HTML of each URL and extract the text only’ and the picture showed a bunch of code. 😱

Deep breath. Ok cool, the code is there, I can copy it and use it. Now I just need a place to type the code into. Back in 2020, during a pandemic-induced intro coding course, I had been exposed to Codecademy’s interface, and a bit of Jupyter notebooks… I’d also taken an ‘intro to bioinformatics’ course once (circa 2016) which had resulted in some scary times working in the Terminal. I remember liking how Codecademy and Jupyter notebooks had friendly-looking text boxes which let you type code and hit ‘run’, and then you could see if the code worked. I wasn’t sure if I could use them to start something of my own.

I then remembered that Packy McCormick, my fave internet writer, had said that he had recently tried building an app as a beginner using something called Replit.

I decided to try that — maybe that would be a beginner friendly way I could start to code?

I went to repl.it and got an account. I then opened a new Replit project and tried putting in some of what the tutorial said. I typed in the first two lines of code in the tutorial (even though I hadn’t the faintest clue what they meant) and pressed run.

Of course, an error. The error meant nothing to me, so I asked ChatGPT about it. It said something about how the code I had pasted was supposed to be run in the Terminal. Ahh! Not the Terminal…

So I opened the Terminal application on my mac and typed it in there. Error! The Terminal did not like those lines either (surprise surprise). (Something about needing to have ‘packages installed’.)

So Replit couldn’t run things the Terminal was supposed to run. And Terminal couldn’t run those things either. I then found a place in Replit called ‘Shell’. According to ChatGPT, Shell is the Terminal for Replit! Thanks, pal! But alas, no, trying to run those lines didn’t work in the Shell either. I still got an error. And ChatGPT’s help was not helping.

Getting help from a human

Around this time, Jan saw me ‘coding’ (lol) and made me tell him what I was stuck on. He told me about coding ‘environments’ and how the Terminal relates to coding… I caught some of it before my brain starting turning to mush. What I took away was essentially, that the Terminal can be used to installs things that can run code. Once you’ve installed those things, then you can use the Terminal to run the code. Code, meanwhile, lives in files, waiting to be run. You don’t generally want to use the Terminal to write the code (it doesn’t really save things). Instead you should write and save the code in files. And then you should use Terminal to run that code.

My brain went to a biology metaphor, and so far it has not failed me: think of Terminal as a ribosome, and code as mRNA.

fig 2 ribosome

Anyway, Jan then gave me a golden piece of advice, which I now pass on to you; he said that to get started, I should use Google Colab. He said this is like a coding ‘environment’ that has Terminal built in — I write the code, and there is a button I can press to ‘Run’. Yay! This is what I wanted.

Getting Google Colab up and running

Creating a new Google Colab notebook is as easy as creating a new Google Doc.

fig 3 new google doc

Here’s what a Google Colab notebook looks like. A place to write code, next to a ‘play’ button that you click to run it.

fig 4colab notebook

Nuggets of wisdom revealed by constantly pestering ChatGPT

Switching to Colab didn’t solve my problems right away. But as I continued to paste in my tutorial code, line by line, and continued to paste in each error message I would get, I was able to get myself unstuck. I think this is in part because ChatGPT seems to know a lot about Colab specifically, and could offer special tips and tricks.

For example, ChatGPT enlightened me on what I’m calling the ‘exclamation mark rule’. Anytime you get an error saying ‘you should be running this in terminal’, you put an ‘!’ in front of the code and rerun it, and it (should) fix the problem — it now acts like you’re typing in Terminal. Amazing! So I could finally run those Terminal commands I wanted to run from the beginning.

fig 5 exclam
ChatGPT giving me sweet tips. Thanks, friend!

Ok, so we’re back in business — ready to continue attacking this tutorial. We’ve made it past the first couple lines without getting errors. From here, I basically just repeated this process:

Type tutorial code into Colab → get error → paste error message to ChatGPT → get possible solution → fix code, or paste new code into Colab → repeat until it works (no error)

fig 6 getting help
Example of how I would ask ChatGPT for help with error messages. First, I’d just paste in an error. No context. Then I’d ask followup questions until something made sense.

fig 6b

Importantly, I was asking pretty rudimentary things. Things like ‘How do I do what you just said?’, ‘I’m confused’, ‘what does missing schema mean in the error i pasted in earlier?’ ‘hmm that’s weird, i did feed in the full URLs. is there any other interpretation of the error?’

I eventually wasn’t bothering to phrase questions, I was just pasting in error messages with no context. But by asking ‘why’ continuously, like a 5 year old, ChatGPT would eventually say something that I could latch onto, and sometimes it would reveal some really useful tips! And the best part was, it never laughed at me or said ‘You are asking such stupid questions, it is clear you are wasting my time and should go learn python before attempting to make a chatbot’.

Sometimes I’d get extremely useful nuggets I could keep in my toolbox for the future. For example, at one point, ChatGPT casually mentioned that I should ‘clone the Github’ of the tutorial I was following. This was gamechanging! Why? Because before that I had been constantly copying and pasting from the tutorial. But Marc Päpper (tutorial guy) had also put all his code on Github. Instead of having to make the files he was telling me to make, I could just zap them all into my Colab at once by cloning his Github. (No wonder everyone raves about Github! If only we had these kind of reproducibility tools in biology…)

Together, me + ChatGPT made a chatbot

Finally, hours later (seriously though… not days, not weeks, not months… hours), we did it! I successfully used ChatGPT to help me make a chatbot. I followed the tutorial and recreated what Marc Päpper had done, and got the same answer out of my chatbot as he did. It was a WILD feeling!!

I wasted zero time; I now needed to make my own chatbot, based on my own content, not Marc’s website. I didn’t have a sitemap like he did, but I did have a list of the URLs from all 216 issues of our Capsid & Tail phage community blog that we’d published each week for the last few years. I felt ready to make this happen.

How I made my Capsid & Tail chatbot

If you’d rather see the finished product first, Jan just put it online! If you want to follow along, here’s my Google Colab notebook for the project.

Note: for most of this, I didn’t have to change the code; it came straight from the tutorial! There is essentially just one part I changed. But I’m including the overarching steps you you can see the entire process.

Also note, this is what I did, not ‘best practices’ — as I hope I’ve made clear, I have no idea what the best practices are!

Step 1: Cloning the tutorial project Github

As mentioned above, the tutorial included a Github link. To clone it, you go to the project page in Github, click the green ‘Code’ button (top right of below screenshot), and it gives you the ‘Clone’ URL.

fig 7 clone

Now paste that into your code (see below) and all the files from the project will be magically available to you in your Colab notebook. No copying and pasting!

!git clone https://github.com/mpaepper/content-chatbot.git

Step 2: Installing the ‘requirements’

I next had to install ‘requirements’. (I’m told requirements are ‘dependencies’, ie. stuff you need for the code to run).

Luckily, part of what was cloned with the Github project was a file called ‘requirements.txt’ (and apparently it’s pretty standard to save your requirements for any project in a file called requirements.txt’). I found ‘files’ in my Colab notebook (left panel), and could indeed see the requirements file.

fig 8a reqs 1

fig8b

So I went ahead and installed it like this!

pip install -r requirements.txt

Step 3: Giving it my OpenAI API key

Note: At this point I switched to following the ‘readme file’ within the Github project instead of the blog post, since the readme seemed more straightforward for a beginner (it seems the blog post is for explaining to programmers HOW the code worked; whereas for me, I just wanted to MAKE it work).

I found this sentence in the readme:

“Once you have the $api_key, you can run export OPENAI_API_KEY=’$api_key’ in your terminal.”

Hmm, so I needed to have an OpenAI API key. I had seen on another tutorial Jan had made me watch (one of James Briggs’ Langchain tutorials) that getting one of these was free — you just go to your OpenAI account, go to Settings, and click to generate one.

Ok, did it. Now back to Colab to type it in. Error. No problem — I whined to ChatGPT about it, it gave me new code, I took its word for it and pasted it in, and voila! It worked.

Step 4: Running the first ‘script’

At this point it seemed the ‘setup’ phase was done. The next step in the tutorial involved running a series of ‘scripts’. (As I now understand, a script is just a bunch of code, living inside a file).

The first script to run was the ‘create_embeddings.py’ script. From the readme:

create_embeddings.py: this is the main script which loops your website’s sitemap.xml to create embeddings (vectors representing the semantics of your data) of your content

I could see the file called ‘create_embeddings.py’ in my Colab. Double clicking on it, I could see there was a bunch of code inside it.

fig 9

Now this led me to another structural question… should I copy the code within that script file into the main middle section of my Colab notebook? The readme said to ‘simply run’ this:

python create_embeddings.py --sitemap https://path/to/your/sitemap.xml --filter https://path/to/your/blog/posts

Which made it sound like I didn’t have to copy and paste the code… I could just ‘call’ the file by name. So maybe I’d just create a copy of ‘create_embeddings.py’ and use it straight from there.

Note: I did this at first, but when I left and came back to my Colab notebook, I realized that Colab does NOT save your files once you close it! So if you make changes to code in a script, it will not save unless you save a copy of that file in your Google drive or onto your computer somehow. Fortunately Colab DOES save what you put in the main middle section of the notebook, so I found it was nicer, for any code you want to modify, to copy the code from the script into the middle section of the notebook, and modify it there.

Step 5: Changing the script so it would accept my Capsid & Tail blog post URLs instead of a ‘sitemap’

At this point I realized the hard work needed to begin. I needed to understand the code in that script enough to know what part of it to change, so it would take my Capsid & Tail blog posts instead of Marc Päpper’s sitemap.

I didn’t have a site map (I had asked Jan and he said we didn’t have one). What I did have was a list of links to our blog posts (in a spreadsheet). I figured I could copy and paste those into a text file. But then what? I needed to somehow, in the code, swap out ‘site map’ for my file full of links. I figured this would be doable, since ChatGPT had told me a site map is just a list of links after all.

I did my best to phrase what I wanted to ChatGPT:

fig 10 phrase help

Excitingly, it did not say ‘what you want is impossible’ — it got straight into tactics. Excellent. And even better, it had an answer I sort of understood! (I had learned about arrays/lists in my early Codecademy days…)

Cool! So ChatGPT would be able to help me write the new code. Now to excise the code I didn’t want from Marc Päpper’s script. To do this, I needed to understand what the different chunks did. I started copying and pasting the chunks into ChatGPT and asked it to explain each. I eventually located the right area.

pages = []
    for info in raw['urlset']['url']:
        # info example: {'loc': '<https://www.paepper.com/>...', 'lastmod': '2021-12-28'}
        url = info['loc']
        if args.filter in url:
            pages.append({'text': extract_text_from(url), 'source': url})

This is the part of the code that creates a new ‘dictionary’ called ‘pages’, loops through a sitemap to collect all the text from all those URLs, and dumps all that text (along with URLs for each) into the ‘pages’ dictionary via the ‘append’ function.

With the (immense, repeated) help of ChatGPT, I replaced that section with my own:

with open('links.txt', 'r') as f:
  links = f.read()

links_split = links.split('\\n')

print(links_split)

What this code does is opens my file called ‘links.txt’ (this is where I pasted in my Capsid & Tail links). It then ‘splits’ the links into a list, and calls that list ‘links_split’.

fig 11
This is what my ‘links.txt’ file looks like.

Of note, this code also ‘prints’ my split list. I clicked run, and voila! At the bottom of the screenshot below, you can see the output: a bunch of links in a list! And they look like Capsid & Tail blog posts! Winning.

fig 12 print

(FYI, Jan told me ‘print’ statements like this are great to sprinkle in anywhere you want, if you want to check if stuff is working as you think it should).

After this, I got to the ‘pages’ section. Instead of Marc Päpper’s version above, I told it to loop through my ‘links_split’ list, get the text from each, and ‘append’ each to my ‘pages’ dictionary.

pages = []
    for url in links_split:
            pages.append({'text': extract_text_from(url), 'source': url})

At this point I was back in line with Marc’s code; I had what he had; ‘pages’ = a dictionary full of text (and corresponding URLs for each text) that was extracted from a bunch of links. From here on out, I didn’t really have to change anything! I just ran his same code.

Step 6: ‘Splitting’ my text into little digestible chunks

The next part of the create_embeddings.py script involved some pretty cool stuff (for this, Marc’s blog post is good for explaining what’s happening, if you really want to understand). My understanding is, all the text I’d just fed into ‘pages’ was being chopped up into little chunks, using a function (piece of code that does some piece of work) called ‘Character Text Splitter’ (which is something the Langchain team created).

Using the handy dandy ‘print’ function, I could see that it had done this to my Capsid articles! Look at them, each split into some number of chunks:

fig 13 split chunks

Why small chunks? Because as we know, ChatGPT and other models like it only let you feed in a bit of text at once. This lets you give it a lot of info at once — by first splitting it into bitesized pieces.

After this, the ‘create_embeddings.py’ script does the thing it was named for: it ‘creates embeddings’ from all the chunks of text. This part is another innovation on Langchain’s part, as I understand it (and another thing to read about in Marc’s blog post). In my understanding, creating embeddings means turning these little chunks of text into numbers (‘vectors’), ie. sets of coordinates. Then those coordinates get stored in a database somewhere, and then those get fed into a language model like GPT-3 or ChatGPT. Then when you ask questions, you get responses that include all this new information you just ‘loaded’ into the model!

Step 7: Using the ‘ask questions’ script

This was starting to get really exciting. It was working!!!

I next did the thing that made me squeak in delight! I called upon the the ‘ask_question.py’ script that Marc wrote, and replaced his example question with a phage question. The first one I asked: Who treated a sea turtle with phages? And. I. Got. The. Answer. In a second. !!!

This was a moment like any of the most exciting of my biology career. I made the thing! The thing listened to me! I coded something! I then proceeded to ask it all the questions I could think of. It kept answering them well, and even giving its sources!

fig 14a answer1

fig 14b

Step 8: Discovering the last part: the chat bot interface

Finally, through copying the contents of Marc’s ‘start_chat_app.py’ file into Colab, I was able to chat with the bot! I gave it a ‘context’ (replaced what I saw in Marc’s script that related to machine learning-related stuff to phage-related stuff) and it could talk to me. Amazing!

fig 15

Reflecting on how incredible it is I built ANYTHING let alone something useful

This whole experience has left me with some major realizations.

First, in the words of my parents when I was a toddler, ‘Very Good, Jessica!’ I had coded! I felt amazing. I have wanted to be able to code forever, but have always been told (mostly by professors) that ‘13 year olds will run circles around you before you can learn’. So I never really put in the time.

Second, I learned how to use ChatGPT to ask questions about code. I would not have been able to do this without a resource like this. (I mean, if I had a human over my shoulder, sure. But humans who can code have better things to do than patiently answer my dumb questions for hours… ChatGPT is free, and never gets impatient or disappointed!)

Third, the actual thing I built showed me the massive potential of adding your own information/documents into large language models like ChatGPT! This superpowers the model so that it knows a lot of your really specific, relevant information. By figuring out how to ‘load’ information into the model, you aren’t limited by a few paragraphs in ChatGPT… you can upload massive datasets, and have conversations with a robot that knows it all at the click of a button…

Fourth, doing this exercise made me realize that non-coders can build VERY powerful things. We couldn’t really do this a month ago, and now we can. Others are doing this too, for example, take BabyAGI, a powerful tool that gets ChatGPT to create its own task list, execute on the tasks, and reprioritize them, which was made by a non-coder. Altogether this means a whole new level of possibility… it’s not just the coders who are going to be able to build useful tools. Us ‘non-coders’ are going to be able to use our domain knowledge and build stuff too.

All in all, I think the fact that I could do this has huge implications, on multiple levels, and I am officially hooked on figuring out how I can use these new powers. I will continue to share what I figure out here, and would love to hear from anyone re: suggestions on problems they’d like solved (ideally in the science/biology realm, because why not! I’m thinking journal article chatbots, grantwriting tools, protocol optimizers, ‘wizards’ who know a bunch about a given topic, like cloning or PCR…). Alternatively if anyone has tutorials/Github projects they find that they’d like to see applied to their own question/problem, send them my way! I’d be happy to give them a try, and report back on my struggles and successes!

How to use the Capsid & Tail chatbot for yourself

Just as I was putting the finishing touches on this blog post, Jan took my code and built it into a web app (!!), at our new project, labspace.ai. Try Capsid Chat here:

Or if you want to dive into the code & reproduce this all yourself, here’s my Google Colab notebook for this project: https://colab.research.google.com/drive/12swjyhWmdA-nAfk3U7Uq4UhpZWC1oxHx?usp=sharing (*updated this link 4/16/23)

Here’s my Capsid & Tail links file: ‘links.txt’, which you’ll need to run it: https://drive.google.com/file/d/1YB5LuhNsr_v-WjloYiPoxZSsIUel_TW0/view?usp=share_link

  1. Make a copy of each for yourself, save to your Google drive
  2. Click Run for each code block; if there’s an error, paste it to ChatGPT and see if you can fix it (don’t move on to the next block until you fix it)
  3. Add your OpenAI key (make sure you get one first at platform.openai.com)
  4. Try to reproduce what I did with the Capsid & Tail links first
    • Make sure you put the links.txt file in the ‘content-chatbot’ folder of the cloned Github project
  5. Ask it a question and see if it gives an answer!

How to create your own chatbot

Try it with your own list of links:

  1. Change the contents of the ‘links.txt’ file
    • Try with a list of URLs you care about (even just one or two)
  2. See if it works!
  3. Note that some websites are not ‘scrapeable’ so they won’t work. Some journal articles seem to be in this category.

If you’re really excited about this, look at Langchain’s documentation, especially the part on ‘Document Loaders’ to find code snippets you can use to get text from pdfs, powerpoints, and other sources, not just URLs like I’ve shown here. See if you can swap out my code for new code you take from Langchain’s library, and see if you can make it run. It’s fun!

Making an interface so others can use what we build

In a future post (maybe soon) I’ll show how I’m learning to make an interface around code, so I can host it on a website, so people (me! or you!) can use the things we build without opening up the code. (Thanks in advance to Jan for helping me with this!).

Resources

Capsid & Tail

Follow Capsid & Tail, the periodical that reports the latest news from the phage therapy and research community.

We send Phage Alerts to the community when doctors require phages to treat their patient’s infections. If you need phages, please email us.

Sign up for Phage Alerts

In collaboration with

Mary Ann Liebert PHAGE

Supported by

Leona M. and Harry B. Helmsley Charitable Trust

Crossref Member Badge