Hacker news

  • Top
  • New
  • Past
  • Ask
  • Show
  • Jobs

The Unreasonable Redundancy of Nature's Protein Folds (https://research.ligo.bio)

158 points by ray__ 4 days ago | 59 comments | View on ycombinator

jyounker 3 days ago |

None of this seems particularly surprising to someone who was an undergraduate level of biochemistry knowledge. Thirty years ago the professor in my Proteins class made a few relevant important points in his lectures:

1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)

2) Those were generally in the reaction center.

3) Almost all single sequence replacements had no measurable effect on protein structure and function.

4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.

Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.

[Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]

resiros 3 days ago |

Evolution discovered a bunch of structural patterns at different layers (fragments, folds..) that are energetically favorable, versatile, easily foldable, robust to mutations and then kept reusing them. As a result it sampled more and more in these parts of the space. That's why the fold space is uneven.

Are there any folds and patterns that evolution evolution has not discovered that are also useful? I think Baker Group created a bunch of new folds. I'm not sure if they are as useful as the one discovered by Evolution. After all, Evolution had more compute power than us.

hirenj 4 days ago |

This approach is pretty much like the TED approach from a few years back. As far as I remember there wasn’t a ridiculous amount of fold diversity there either. It turns out evolution isn’t averse to a bit of liberal protein plagiarism.

https://www.science.org/doi/10.1126/science.adq4946

photochemsyn 3 days ago |

This does reveal the weakness of AlphaFold approaches for answering questions like “what is possible in the protein folding space if you use the 20 canonical amino acids” since the data used to train AlphaFold is limited to existing experimentally determined protein structures.

We don’t even know if this is like body plans (four legs for mammals, why not six?) i.e. is this about physical limitations of the folding space (did evolution explore most of the space and hold onto the most useful folds, or are the common set of folds one of those accident-of-history results?). Then there’s the issue that folding takes place as the protein chain exits the ribosomal tunnel so that’s a whole other constraint on what kinds of folds might be selected. For that matter, why not other genetically determined complex amino acids instead of just the canonical set?

Also, a common evolutionary process in eukaryotes is duplication of protein sequences and shuffling of code blocks which might represent folding domains, which might tend to lock in the existing collection of folds rather than generating novel folds. That’s not so clear.

This weakness of AlphaFold has some modern practical relevance since non-canonical amino acids and modified proteins are increasingly used medically, and their structures mostly seem to be determined using the direct experimental methods, eg:

https://pmc.ncbi.nlm.nih.gov/articles/PMC10296201/

“Non-Canonical Amino Acids as Building Blocks for Peptidomimetics: Structure, Function, and Applications” (2023)

h_a_n_k 4 days ago |

cool post! it's funny how many things in this world are naturally graphs. i think it's neat how, especially in biology, a lot of high-dimensional objects, like protien sequences, converge onto lower-dimensional representations, like protein structures.

i did neuroscience for grad school, and i was always amazed by how often complex neural activity could be well represented by lower dimensional representations--clean manifolds, attractor dynamics, etc. i think, in general, biology (evolution) doesn't penalize against redundancy too hard (hence things like genetic drift, neutral theory of evolution, etc.).

anyway, super cool stuff. agree with you that probs more useful to explore the search space via 'less natural' structures, given how forgiving evolution is to redundancy. probs where the most information can be found

dekhn 3 days ago |

Proteins are truly amazing. I've studied them for decades and they still manage to surprise; for example, i worked with protein structural prediction for decades and assumed that structure was necessary for function, but some proteins remain mostly unfolded and still carry out complex mechanistic tasks.

flobosg 3 days ago |

My PhD thesis addressed a similar question. I did a survey of sub-domain sized fragments shared between different protein folds. It turns out that there are plenty, even among folds considered evolutionarily distant.

dekhn 3 days ago |

I worked with a foodie who was also a protein scientist (https://scienceandfooducla.wordpress.com/2016/02/23/kent-kir...) and he once pointed out: nearly everything you need to know about protein folding, you can learn from an egg.

ifh-hn 3 days ago |

No real clue what this stuff is about, way over my head, but kudos on an article where it's all there on the page instead of needing scripts to pull text and images from different places!

throwaway81523 4 days ago |

This crashed my browser. Use reader mode.

novia 3 days ago |

gosh the scrolling on that site was so jumpy!

Schlagbohrer 3 days ago |

Can we please retire the headline trend of "The Unreasonable ___ of ____ "

spwa4 3 days ago |

This is just repeating the fact that the proteins life actually uses are a very small part of the total possible ones. First, there's no real length limit, but all life's proteins are limited to a few thousand amino acids. Most barely get past hundred.

(note: there are bigger proteins, including ones so big you can see them with the naked eye (e.g. a hair) but they consists of multiple repeats of the same small building block. There are many such building blocks. And the very few exceptions to that are "not really" part of eukaryot cells, but of cell organelles that have their own DNA)

But even if you just take the first 4 amino acids, there's half a million possible combinations. Life uses less than 1000 of those.

In other words: DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design. Or at least, it is pretty obvious that it's possible to do A LOT better than natural selection.