Analyzing massive amounts of data, a multi-disciplinary team of University of Missouri researchers used a cutting-edge computer algorithm to find identical DNA sequences in different plant and animal species.
The research lays groundwork for future basic research studies into the reasons that plants and animals developed different genetic mechanisms and functions. The information also creates a foundation for discoveries that may improve human life. With the ability to better decipher DNA related to disease, the code-analyzing computer program could help in the development of new medicines or breeding better crops.
“Our algorithm found identical sequences of DNA located at completely different places on multiple plant genomes,” said Dmitry Korkin, lead author and assistant professor of computer science. “No one has ever been able to do that before on such a scale.”
“Our discovery helps solve some of the mysteries of plant evolution,” said Gavin Conant, co-author and assistant professor of animal sciences at CAFNR. “Basic research on the plant genome provides raw materials and improves techniques for creating medicines and crops.”
Previous studies found long strings of identical code in different species of animals’ DNA. But before this new MU research, which was published in the Proceedings of the National Academy of Sciences, computer programs had never been powerful enough to find identical sequences in plant DNAs, because the identical sections weren’t found at the same points.
Six Animals, Six Plants
The genomes of six animals (dog, chicken, human, mouse, macaque and rat) were compared to each other. Likewise, six plant species (Arabidopsis, soybean, rice, cottonwood, sorghum and grape) were compared to each other. Comparing all the genetic sequences took four weeks with 48 computer processors doing one million searches per hour for a grand total of approximately 32 billion searches.
“You would expect to see convergent evolution, but we don’t,” Conant said. “Plants and animals are both complex multi-cellular organisms that have to deal with many of the same environmental conditions, like taking in air and water and dealing with weather variations, but their genomes code for solutions to these challenges in different ways.”
“The same algorithm can be used to find identical sequential patterns in an organism’s entire set of proteins,” said Korkin. “That could potentially lead to finding new targets for existing drugs or studying these drugs’ side effects.”
The project, part of the Big Data Research and Development Initiative, is a national priority of the White House Office of Science and Technology Policy. The PNAS paper, titled “Long Identical Multispecies Elements in Plant and Animal Genomes,” involved collaboration between the Universities of Missouri, California and Arizona.