News

LCSB researcher wins prize in international machine learning challenge

  • 28 septembre 2020
  • Thème
    Sciences de la vie & médecine

Assistant Prof. Enrico Glaab, head of the Biomedical Data Science group at the LCSB, recently participated in the Metagenomics Diagnosis for IBD Challenge. The aim of this international competition: finding the best classification algorithm for diagnosis of Inflammatory Bowel Disease. The challenge gathered 50 participants in 26 teams from all over the world and Enrico Glaab was ranked amongst the three best performers for the model he developed. On top of contributing to establish standards in computational metagenomics data analysis, he was awarded a 2000 USD prize and will join the organisers to write scientific articles describing the outcome of the challenge.

 SBV Improver challenge to better diagnose Inflammatory Bowel Disease

The Metagenomics Diagnosis for IBD Challenge (MEDIC) aimed to investigate the diagnostic potential of metagenomics data to classify patients with Inflammatory Bowel Disease (IBD). This umbrella term includes a spectrum of chronic inflammatory disorders such as Ulcerative colitis and Crohn’s disease. While their respective clinical and pathological features are well-known, establishing a diagnosis is still a challenge and identifying novel biomarkers could lead to more effective, patient-tailored treatments.

Potential risk and protective factors, such as diet and microbiome composition, seem to play a key role in these diseases and links have been established in animal models and human studies. The characterisation of the gut microbiome via sequencing of metagenomes from faecal or intestine biopsy samples previously showed differences in bacterial species composition and abundances between IBD patients and healthy controls. Therefore, the analysis of metagenomics data could provide new biomarkers and a less invasive diagnostic method.

“sbv IMPROVER (Systems Biology Verification — Industrial Methodology for Process Verification in Research) is aninternational project that uses crowd-sourcing and verification in order to develop improved methods to analyse complex biomedical data. By participating, we can extend the available tools and knowledge in Systems Biology,” explains Enrico Glaab. “When I heard about MEDIC, I was interested in particular because metagenomics is still a relatively new data type that represents a challenge for machine learning, with complex interrelations between a large number of features measured for a limited number of biosamples.”

Metagenomics data and machine learning

During the challenge, the participants had to address a basic question: Can you predict IBD status using metagenomics data? The idea was to verify if the provided data were sufficient to allow a classification algorithm to distinguish between IBD patients and healthy controls, and between Ulcerative colitis and Crohn’s disease.

Using next-generation sequencing technologies, biologists can determine the relative abundances of many microbial species in samples from different patients. It also possible to estimate their functional role within the microbiome and the metabolic pathways they might influence. This type of data was provided by the organisers to the participants at the beginning of the challenge. It was the starting point for the data processing and to develop a machine learning model predictive of IBD status.

Reducing dimensionality and innovating with meta-learning

“The data offered many opportunities to apply new bioinformatics analysis strategies,” details Prof. Glaab. “For example, apart from using the original features from the data for model building, we also investigated versions of the data in which species that affect the same metabolic pathways were grouped together to investigate pathway alterations and reduce the dimensionality of the data. By comparing these different approaches, we could learn which of the original or derived features from the data are most informative for the prediction task.”

The LCSB researcher also used the data to explore a new methodology called meta-learning. Instead of directly applying a machine learning approach, he added a higher-level heuristic search procedure, using an algorithm to select the most adequate machine learning approach among multiple possible choices for a specific dataset. He clarifies: “The main idea behind the used method is to learn from available training data not only how to optimise the parameters for a specific machine learning model, but also how to choose the best modelling approach.”

Testing this new methodology paid off as the code and predictions submitted by Prof. Glaab were amongst the best three, showcasing his data science skills and winning a 2000 USD prize. He is now planning to collaborate with other competitors and the organisers to publish the results in scientific journals and further investigate the outcomes of the challenge.

Visit the webpage of the Biomedical Data Science group.