A key goal in biology is to decipher the function of all proteins. By studying the relationship between protein sequences and function, we can understand which parts of a protein’s sequence or region give the protein certain functions. However, because there are more than 250 million proteins known to date (uniprot.org), it is not practical to experimentally test every one of them for function. In recent years, scientists have developed methods (alignment algorithms) that compare similar protein sequences, wherein the conserved regions—the most significant portions of proteins—are arranged into groups. Proteins belonging to the same group are thought to have similar functions. Nevertheless, many protein parts, such as intrinsically disordered regions, are difficult to compare since they don't fall within these groups. These intrinsically disordered regions are structurally flexible protein segments that play regulatory roles, such as helping form biomolecular condensates. These disordered regions tend to accumulate many sequence changes (mutations) over evolutionary time. The more differences in the sequence there are, the more unaligned or disordered they are, and the harder it is to compare them and determine their function.
Scientists from the research group of Agnes Toth-Petroczy at the Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) in Dresden, Germany, and the Center for Systems Biology Dresden (CSBD) have now developed a new algorithm that can compare these intrinsically disordered regions. SHARK (Similarity/Homology Assessment by Relating K-mers), the new algorithm, has been added to SHARK-dive, a machine learning tool that outperforms traditional alignment methods in finding evolutionary similarities in sequences that can't be aligned. “Intrinsically disordered regions are involved in many functions of an organism, and they evolve faster than the structured parts of proteins, making it hard to find similarities between them with the current methods. It has been difficult to study their functions and their evolution, even though they make up about 21% of all proteins,” explains Chi Fung Willis Chow, doctoral student in the Toth-Petroczy group and first author of the study. He adds, “With SHARK-dive, we have now a tool that can identify intrinsically disordered regions that are different in their sequence but similar in function, something that the current alignment methods struggle with.”
“SHARK-dive not only identifies intrinsically disordered regions with similar functions, but it also uncovers hidden sequence patterns that explain distant similarities and functional connections. This helps generate hypotheses about the factors that drive these relationships, making SHARK-dive a valuable tool for studying and understanding disordered proteins,” explains Agnes Toth-Petroczy, who oversaw this study. “We hope that SHARK-dive will help create a collection of functions for intrinsically disordered regions. Researchers would be better able to look into the relationship between the functions and sequences in these disordered, difficult-to-align areas.”
Chi Fung Willis Chow, Soumyadeep Ghosh, Anna Hadarovich, and Agnes Toth-Petroczy: SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences, PNAS, October 9, 2024, 121 (42) e2401622121, doi.org/10.1073/pnas.2401622121