What's going on here...

Guess I should actually post something here now that I might have some content worth sharing.

Oregon Is Cool I Guess

Been trying to do some traveling when I have free time. I went on a solo trip to Portland almost as soon as I moved here. Most of my trip was spent in Washington Park, just taking the trails and exploring. The International Test Rose Gardens there were just as beautiful as I imagined them to be. Got to eat some good food there too, Fried Egg I’m in Love definitely lives up to the hype. Another good place is The Sports Bra, super awesome atmosphere and good bar food. Finally, I got to go to a super good tea place called the Tao of Tea. The city itself is probably the grungiest place I’ve ever been but I love just walking around and looking at things.
I’ve also gone on a couple excellent trips to the coast, something that was obviously impossible for me back home. It’s so beautiful and takes my breath away every time. My dog loves to drink salt water. I’ve also explored Salem, Newport, and done a little bit of hiking. Trips in the future will probably be to Tillamook, Eugene, and Bend.

Isolated chloroplast genes + Lewisia phylogeny

Alright, let’s get to the boring stuff. My samples came back for my grad research Halloween day so I ran through the chloroplast DNA workflow on it. It was pretty simple to get my head around, but I’m hoping that explaining it will further cement my knowledge on the subject.

The chloroplast genome is very similar across plant life. It is prokaryotic in origin and is inherited maternally. They tend to be around 150000 BP (in fact, the ones I looked at were all basically 151411 bp) and look like this. Anyways, they’re pretty easy to ID in the context of a full genome since they have a very recognizable pattern - especially with the two IR’S (inverted repeats).

So! I take my raw sequence data and run it through BBduk (a tool in the toolbox of BBtools) which trims adapters. I get a bunch of .fq files, which I can then use GetOrganelle to ID the chloroplast genome in the sequences. After this, it’s simply a matter of aligning the sequences with MiniMap 2 and then visualizing the relationships by constructing a phylogenetic tree (which my PI did for me :3c). Now that the chloroplast genome has been ID’d, the project is now handed over to me for TE identification. I’m only a little terrified.

My other weekend project was super cool, I’m co-authoring the section on Lewisia for OregonFlora, a plant ID book that’s about to release its third book. My PI suggested I download some GenBank sequences and create a phylogenetic tree. Easy, right?

A very crucial element is as follows: I am a dumbass. Anyways.

So, I start looking around at tools to download sequences from GenBank. NCBI has a great set of database download tools - I could NOT figure out how to use them. So, I take the easy way and use E-Utils. My command is as follows below (after install E-Utils, obvs).

Okay, cool, there’s some sequences in there and they’re seperated by accession. No idea what’s going on but I decide to follow the documentation for CD-HIT and cluster these sequences.

Yep, that was a lie. I forgot to cluster my sequences the first time, ran through the ENTIRE workflow, and ended up with this absolute beast.

Well, we could talk about my failures all day but let’s continue. My PI sets me straight, I cluster the sequences then align them with MAFFT (super cool tool!). IQ-TREE is used to construct the phylogenetic tree, and I throw the aligned sequences into JalView to see what’s going on.

I think I still have it a little messed up, I think that there’s some sequences clustered where they shouldn’t be. Oh well. It’s a learning process.

I’m currently in the process of setting up a Raspberry Pi as a home server, this will probably be one of the subjects of my next post. I’m really struggling with this as my computer is REFUSING to connect to the pi over SSH.


I also drank a lot of beers. And some other stuff. Yeah the formatting is messed up I don’t know how to fix it.

Written on November 7, 2023