PART A)
Following the detailed instructions from the homework page, after the creation of the token via Huggingface and the login to the Chatterjee Lab server, we get on with the work.
The protein to work on is the human SuperOxide Dismutase [Cu-Zn], or SODC_HUMAN. This protein, normally tasked with the elimination of radicals within a cell, can aggregate under certain conditions and cause cytotoxic effects. However, we are interested in it due to its essential role in the appearance of Amyotrophic Lateral Sclerosis or ALS.
A simple amino acid change can cause the most aggressive form of the disease, and we are implementing that particular change (A4V) into our sequence.
From this original sequence
<aside> 💡
MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
</aside>
We convert to
<aside> 💡
MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ
</aside>
We then input the protein sequence in the API and select the length of the peptides we want to create (12 aa)

Some of the resulting sequences are:
| Binder | Pseudo Perplexity | |
|---|---|---|
| 0 | WHSGAVALRHKK | 14.165596 |
| 1 | WRYYAAGVRLKX | 11.318179 |
| 2 | WLYGVVAVEHWX | 13.552779 |
| 3 | WHYPAVGARWKK | 15.630948 |
| Binder | Pseudo Perplexity | |
|---|---|---|
| 0 | WLYYVTAAELWX | 19.933115 |
| 1 | WRYYVVAVELKX | 17.575723 |
| 2 | KLSPATALRHGX | 10.905200 |
| 3 | WRYYAVAVRHKK | 19.026527 |
We also include, from the literature, the sequence FLYRWLPSRRGG.
With all this, we are ready to go to AlphaFold-Multimer to test for possible multimers.
The first combination is the sequence with the lowest Pseudo Perplexity, which should mean the best chance of it being an accurate prediction (Based n the data the model has been trained on.)
Inputting both the mutated sequence of SODC and the peptide’s sequence separated by :, the results were as follows:

For the 1st prediction we use KLSPATALRHGX

For the 2nd prediction we use WRYYAAGVRLKX

3rd: WLYGVVAVEHWX

4th: WLYYVTAAELWX

5th: FLYRWLPSRRGG

Mutated protein alone
Each 3D structure has a certain level confidence given to it by the model, in the form of the IDDT metric. This essentially measures the accuracy of each residue’s spatial orientation and prediction.