Principal component analysis (PCA) is nothing new. In fact, the math for PCA processes, in which a large amount of data can be categorized or compared by discovering distinct patterns in such fields as spectra and microscopy, has been around since Steven Van Doren, professor of biochemistry, was an undergraduate at Oklahoma State University double majoring in biochemistry and computer science in the early ’80s.
Van Doren recalled how at that time he had to write many lines of code to perform a PCA command that Jia Xu, a postdoctoral fellow who works in his laboratory, can now insert using one line of code from Python, a popular open-sourced coding language widely used by the scientific community given its ability to work on a number of platforms.“We realize it has much wider applications beyond what we study in our lab. We are really dedicating this software to help people monitor changes.”
― Steven Van Doren
Xu, with Van Doren’s assistance, developed a new Python-based software program in the fall of 2015 that has the potential to analyze a series of extremely complicated images – such as a digital video – with unprecedented accuracy and speed.
“It’s a routine technique, but it’s never been used in this way before. PCA is the engine, but he’s built a fine multi-purpose SUV and it goes anywhere,” Van Doren said of Xu, who graduated with his doctoral degree in biochemistry in December 2015 after arriving at the University of Missouri in 2010.
The official name of that “SUV” is TREND, an acronym for “TRacking Equilibrium and Nonequilibrium shifts in Data. Although its roots trace back to their field of bimolecular nuclear magnetic resonance (NMR) spectroscopy, the two quickly realized that TREND could read pretty much anything, including MRI images, video images and computerized tomography (CT) scans. Their findings were published in Biophysical Journal Jan. 24.
“We realize it has much wider applications beyond what we study in our lab,” Van Doren said. “We are really dedicating this software to help people monitor changes. That’s what scientists do a lot is watch how things change and then learn from that.”
Finding a solution
Van Doren and Xu conduct NMR research relating to a wide variety of health-related topics including innate immunity, cancer progression, lung diseases and cardiovascular diseases. In August 2015, they were doing research on the phosomannomutase (PMM) enzyme, which is connected to the bacteria that help promote infections in both people and plants.
Van Doren commissioned Xu to figure out how tightly phosphorylated sugars were bound to the PMM enzyme, which is very complex in nature. To do so, Xu wanted to find a quicker and more efficient way to analyze and monitor the dephosphorylation (the removal of a phosphate group from an organic compound) of the spectra of the PMM enzyme. This change involves several peaks that are changing and moving throughout the spectra due to the chemical reaction in the enzyme. Previous methods to analyze such data could take weeks or even months.
Xu spoke with Yan Fulcher, a former postdoctoral fellow in Van Doren’s lab who had done work using PCA. After reading up on the most recent academic journals on the topic, Xu devised a PCA-based algorithm that would turn into the lifeblood of what would become TREND. Although he never had any formal educational training in computer programming, he had begun to teach himself coding languages as a hobby while growing up in China.“It’s taking processes that are independent of each other and resolving them. You can do video editing of sorts that you never could before.”
― Steven Van Doren
“Jia was very tuned into the literature and he was very adventurous in trying PCA in new ways,” Van Doren said. “He’s very computationally adept. He’s kind of like a fish in water in that respect.”
Xu implemented the algorithm directly on the original spectra that had been captured using the department’s 800-megazhertz NMR spectrometer.
He saw positive results within minutes about how tightly molecules can stick together. Then he discovered he could get the same results even by applying the algorithm to raw data. These were unprecedented feats.
“We didn’t understand at the beginning, but we just tried many different kinds of data and it worked,” Xu said.
Essentially, TREND (and PCA) works by tracking the progression of certain principal components. In the first video tested with the software, a video from the Max Planck Institute for Biophysical Chemistry showing a chest cavity through MRIs, the first principal component was breathing; the second, the heart beating. What makes the software even more remarkable is the fact that it has the ability to isolate these principal components and even remove them from the video. Van Doren shows two different videos from a demonstration that he recently gave at a conference. One shows on the breathing; the other, only the heartbeat.
“It’s taking processes that are independent of each other and resolving them,” Van Doren said. “You can do video editing of sorts that you never could before.”
With TREND, each dot on this graph correlates to exact position of the sun setting in the video above using principal component analysis. Materials courtesy of Steven Van Doren.
The next step
In November 2015, Xu earned the honor of top student speaker for his presentation on the topic at the Great Plains Regional Annual Symposiums on Protein and Biomolecular NMR.
“People recognized the power of it and got excited about it at that time,” said Van Doren, who has been at Mizzou for 22 years.
In February 2016, a prototype of TREND was ready for testing. A corresponding article about the applications of the software in relation to molecular recognition and the expanded applicability of PCA to the NMR field came out in the journal Analytical Chemistry in July 2016.
With the help of the Office of Technology Management and Industry Relations (OTMIR), Van Doren is in the process of turning TREND into a licensed product in the U.S. and in other countries, one that could better the imaging analysis capabilities of several industries, including (but not limited to) pharmaceuticals, medical scanning, geospatial imaging and surveillance.
“You can just let your imagination be your guide as to how many kinds of imaging this can be applied to,” Van Doren said.
TREND runs on Windows, Mac OS X and four types of Linux operating systems. It can read a variety of images, text lists and video formats.
Although Van Doren is interested in TREND’s commercial viability, he is equally excited for other members of the scientific community to try the program out for themselves to see how it can assist in furthering scientific research. The software is available to download for free through an application process on its website.
Anyone with an academic e-mail address will automatically be approved for download. Van Doren encourages anyone from CAFNR – and broader MU community – to download it and see what new uses they can come up with for the technology, whether that be inserting a complex spectra or just tinkering with a personal video shot on a cell phone.
“There are other people in the College who use imaging data who potentially could benefit from using this to trap the changes of the imaging data and to simplify the imaging data,” Van Doren said.