
I made an Elephant 🐘!
Ribulose-1,5-bisphosphate carboxylase, specifically the small subunit.
GenBank: AED94313.1
Protein sequence:
1 massmlssaa vvtspaqatm vapftglkss aafpvtrktn kditsiasng grvscmkvwp
61 pigkkkfetl sylpdlsdve lakevdyllr nkwipcvefe levintkhgf vyrehgntpg
121 yydgrywtmw klplfgctds aqvlkeveec kkeypgafir iigfdntrqv qcisfiaykp
181 psftea
Why this protein?:
Because it is essentially a Holy Grail for protein engineering. Not only is it the most abundant protein in the world, present in all form of photosynthetic organisms, but it is responsible for CO2 fixation rates. And the small chain, despite not having the actual catalytic sites, is the one responsible for the regulation of both the amount and efficiency of Rubisco in plants. (https://doi.org/[10.1093/jxb/erac309](https://doi.org/10.1093/jxb/erac309))
On top of that, the information for the protein is divided amongst several genes, which are themselves distributed in a tissue-specific way (https://doi.org/10.1093/nar/14.8.3325, https://doi.org/10.1104/pp.112.201459), which make it interesting on a transcriptional way.
Being able to control such an essential process would open an astounding amount of possibilities that would revolutionise crop yield, food security, and potentially improve quality of life globally.
Reverse translation:
Since the genetic code is described as degenerate, and one amino acid from the protein can be translated from multiple codons, it is not a straightforward process getting the original genetic sequence back.
For example, we can get the most likely codon sequence:
atggcgagcagcatgctgagcagcgcggcggtggtgaccagcccggcgcaggcgaccatg gtggcgccgtttaccggcctgaaaagcagcgcggcgtttccggtgacccgcaaaaccaac aaagatattaccagcattgcgagcaacggcggccgcgtgagctgcatgaaagtgtggccg ccgattggcaaaaaaaaatttgaaaccctgagctatctgccggatctgagcgatgtggaa ctggcgaaagaagtggattatctgctgcgcaacaaatggattccgtgcgtggaatttgaa ctggaagtgattaacaccaaacatggctttgtgtatcgcgaacatggcaacaccccgggc tattatgatggccgctattggaccatgtggaaactgccgctgtttggctgcaccgatagc gcgcaggtgctgaaagaagtggaagaatgcaaaaaagaatatccgggcgcgtttattcgc attattggctttgataacacccgccaggtgcagtgcattagctttattgcgtataaaccg ccgagctttaccgaagcg
Or we can get the sequence of consensus codons:
atggcnwsnwsnatgytnwsnwsngcngcngtngtnacnwsnccngcncargcnacnatg gtngcnccnttyacnggnytnaarwsnwsngcngcnttyccngtnacnmgnaaracnaay aargayathacnwsnathgcnwsnaayggnggnmgngtnwsntgyatgaargtntggccn ccnathggnaaraaraarttygaracnytnwsntayytnccngayytnwsngaygtngar ytngcnaargargtngaytayytnytnmgnaayaartggathccntgygtngarttygar ytngargtnathaayacnaarcayggnttygtntaymgngarcayggnaayacnccnggn taytaygayggnmgntaytggacnatgtggaarytnccnytnttyggntgyacngaywsn gcncargtnytnaargargtngargartgyaaraargartayccnggngcnttyathmgn athathggnttygayaayacnmgncargtncartgyathwsnttyathgcntayaarccn ccnwsnttyacngargcn