PART A)

Following the detailed instructions from the homework page, after the creation of the token via Huggingface and the login to the Chatterjee Lab server, we get on with the work.

The protein to work on is the human SuperOxide Dismutase [Cu-Zn], or SODC_HUMAN. This protein, normally tasked with the elimination of radicals within a cell, can aggregate under certain conditions and cause cytotoxic effects. However, we are interested in it due to its essential role in the appearance of Amyotrophic Lateral Sclerosis or ALS.

A simple amino acid change can cause the most aggressive form of the disease, and we are implementing that particular change (A4V) into our sequence.

From this original sequence

<aside> 💡

MATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

</aside>

We convert to

<aside> 💡

MATKVVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ

</aside>

We then input the protein sequence in the API and select the length of the peptides we want to create (12 aa)

Screenshot 2025-03-07 at 13.29.50.png

Some of the resulting sequences are:

Binder Pseudo Perplexity
0 WHSGAVALRHKK 14.165596
1 WRYYAAGVRLKX 11.318179
2 WLYGVVAVEHWX 13.552779
3 WHYPAVGARWKK 15.630948
Binder Pseudo Perplexity
0 WLYYVTAAELWX 19.933115
1 WRYYVVAVELKX 17.575723
2 KLSPATALRHGX 10.905200
3 WRYYAVAVRHKK 19.026527

We also include, from the literature, the sequence FLYRWLPSRRGG.

With all this, we are ready to go to AlphaFold-Multimer to test for possible multimers.

The first combination is the sequence with the lowest Pseudo Perplexity, which should mean the best chance of it being an accurate prediction (Based n the data the model has been trained on.)

Inputting both the mutated sequence of SODC and the peptide’s sequence separated by :, the results were as follows:

Screenshot 2025-03-07 at 14.33.28.png

For the 1st prediction we use KLSPATALRHGX

Screenshot 2025-03-10 at 12.29.02.png

For the 2nd prediction we use WRYYAAGVRLKX

Screenshot 2025-03-07 at 16.25.20.png

3rd: WLYGVVAVEHWX

Screenshot 2025-03-09 at 19.39.13.png

4th: WLYYVTAAELWX

Screenshot 2025-03-09 at 19.53.40.png

5th: FLYRWLPSRRGG

Screenshot 2025-03-09 at 19.57.06.png

Mutated protein alone

Each 3D structure has a certain level confidence given to it by the model, in the form of the IDDT metric. This essentially measures the accuracy of each residue’s spatial orientation and prediction.