Benchling and In silico gel art

virtual_digest_sequence_Lambda.png

I made an Elephant 🐘!

Chosen protein:

Ribulose-1,5-bisphosphate carboxylase, specifically the small subunit.

GenBank: AED94313.1

Protein sequence:

	 			1 massmlssaa vvtspaqatm vapftglkss aafpvtrktn kditsiasng grvscmkvwp
       61 pigkkkfetl sylpdlsdve lakevdyllr nkwipcvefe levintkhgf vyrehgntpg
      121 yydgrywtmw klplfgctds aqvlkeveec kkeypgafir iigfdntrqv qcisfiaykp
      181 psftea

Why this protein?:

Because it is essentially a Holy Grail for protein engineering. Not only is it the most abundant protein in the world, present in all form of photosynthetic organisms, but it is responsible for CO2 fixation rates. And the small chain, despite not having the actual catalytic sites, is the one responsible for the regulation of both the amount and efficiency of Rubisco in plants. (https://doi.org/[10.1093/jxb/erac309](https://doi.org/10.1093/jxb/erac309))

On top of that, the information for the protein is divided amongst several genes, which are themselves distributed in a tissue-specific way (https://doi.org/10.1093/nar/14.8.3325, https://doi.org/10.1104/pp.112.201459), which make it interesting on a transcriptional way.

Being able to control such an essential process would open an astounding amount of possibilities that would revolutionise crop yield, food security, and potentially improve quality of life globally.

Reverse translation:

Since the genetic code is described as degenerate, and one amino acid from the protein can be translated from multiple codons, it is not a straightforward process getting the original genetic sequence back.

For example, we can get the most likely codon sequence:

atggcgagcagcatgctgagcagcgcggcggtggtgaccagcccggcgcaggcgaccatg gtggcgccgtttaccggcctgaaaagcagcgcggcgtttccggtgacccgcaaaaccaac aaagatattaccagcattgcgagcaacggcggccgcgtgagctgcatgaaagtgtggccg ccgattggcaaaaaaaaatttgaaaccctgagctatctgccggatctgagcgatgtggaa ctggcgaaagaagtggattatctgctgcgcaacaaatggattccgtgcgtggaatttgaa ctggaagtgattaacaccaaacatggctttgtgtatcgcgaacatggcaacaccccgggc tattatgatggccgctattggaccatgtggaaactgccgctgtttggctgcaccgatagc gcgcaggtgctgaaagaagtggaagaatgcaaaaaagaatatccgggcgcgtttattcgc attattggctttgataacacccgccaggtgcagtgcattagctttattgcgtataaaccg ccgagctttaccgaagcg

Or we can get the sequence of consensus codons:

atggcnwsnwsnatgytnwsnwsngcngcngtngtnacnwsnccngcncargcnacnatg gtngcnccnttyacnggnytnaarwsnwsngcngcnttyccngtnacnmgnaaracnaay aargayathacnwsnathgcnwsnaayggnggnmgngtnwsntgyatgaargtntggccn ccnathggnaaraaraarttygaracnytnwsntayytnccngayytnwsngaygtngar ytngcnaargargtngaytayytnytnmgnaayaartggathccntgygtngarttygar ytngargtnathaayacnaarcayggnttygtntaymgngarcayggnaayacnccnggn taytaygayggnmgntaytggacnatgtggaarytnccnytnttyggntgyacngaywsn gcncargtnytnaargargtngargartgyaaraargartayccnggngcnttyathmgn athathggnttygayaayacnmgncargtncartgyathwsnttyathgcntayaarccn ccnwsnttyacngargcn