Data-efficient Machine Learning of Biomolecules
Scientists from KIT and FZJ from the Helmholtz Research Field Information, as well as from Helmholtz-AI, in collaboration with DLR, have published a study in Communications Biology that suggests how modern and classic methods of deep machine learning can be efficiently combined. (Source: Steinbuch Centre for Computing – News)
Life is determined at the cellular level by various biomolecules. They constitute the machinery of living organisms and play a crucial role in the functioning of each cell. Machine learning is increasingly being used to study their function and related structure. Members of the Multiscale Biomolecular Simulation research group and the Helmholtz AI team, in cooperation with Forschungszentrum Jülich and the German Aerospace Center (DLR), have now proposed a method to combine modern and classical deep machine learning methods to build models even in data-poor scenarios.
The researchers use a deep learning approach to predict spatial neighborhoods between RNA building blocks (called nucleotides). Similar to what happens in a LEGO model, when individual Lego bricks are replaced in one location, the bricks in the neighborhood must adjust so that the entire structure still fits together. The BARNACLE model proposed in the study uses this idea for RNA: nucleotides that are spatially close together in RNA are also more likely to mutate together during evolution. And it is precisely these emergent mutation patterns that the model looks for. To train the model, it relies on a combination of self-supervised pre-training on lots of sequence data and efficient use of the few structural data. BARNACLE showed significant improvement with this approach over established classical statistical approaches but also other neural networks. It also shows that the method is transferable to related tasks with similar data constraints.
The results of this study were published in the paper “RNA Contact Prediction by Data Efficient Deep Learning” in the journal Communications Biology.
KIT/A. Grindler, 16.10.2023
The original press release can be found at:
Data-efficient Machine Learning of Biomolecules
The original publication can be found at (Open Access):
Oskar Taubert, Fabrice von der Lehr, Alina Bazarova, Christian Faber, Philipp Knechtges, Marie Weiel, Charlotte Debus, Daniel Coquelin, Achim Basermann, Achim Streit, Stefan Kesselheim, Markus Götz & Alexander Schug, RNA contact prediction by data efficient deep learning. Communications Biology. 2023, 6:913. DOI: 10.5445/IR/1000162205
Localization in the Helmholtz Research Field Information:
Helmholtz Research Field Information, Program 1: Engineering Digital Futures, Topic 1: Enabling Computational- & Data-Intensive Science and Engineering
Contact:
Dr. Markus Götz
Department Data Analytics, Access and Applications (D3A)
Steinbuch Centre for Computing (SCC)
Karlsruher Institute of Technology (KIT)
Phone: +49 721 608-29178
E-Mail: markus.goetz@kit.edu
Prof. Dr. Alexander Schug
Institute for Advanced Simulation (IAS)
Jülich Supercomputing Centre (JSC)
Forschungszentrum Jülich
Tel.: +49 2461 61-9095
E-Mail: al.schug@fz-juelich.de
Contact for this press release:
Achim Grindler
Public Relations, Service Management
Karlsruher Institut für Technologie (KIT)
Phone: +49 721 608-24506
E-Mail: achim.grindler@kit.edu



