In a groundbreaking advancement at the intersection of biophotonics and computational science, researchers from the Universitat Oberta de Catalunya (UOC) and the Institute of Photonic Sciences (ICFO) have unveiled an unprecedented, open-access Raman spectral database dedicated to biomolecules. This pioneering repository encapsulates spectral signatures of 140 vital biomolecules across key classes including nucleic acids, proteins, lipids, and carbohydrates, offering a robust foundation for molecular identification through Raman spectroscopy. This technique, discovered by Nobel laureate Chandrasekhara Venkata Raman in 1928, capitalizes on inelastic scattering of photons to reveal intricate details about a molecule’s chemical composition and structural nuances, making it an indispensable tool in analytical and medical sciences.
Raman spectroscopy thrives on detecting the shift in wavelength of light scattered by molecules when illuminated by a laser source. This shift serves as a molecular fingerprint, correlating to the vibrational modes intrinsic to specific chemical bonds within biomolecules. The challenge historically has been the fragmentation and restricted accessibility of comprehensive Raman spectral data for diverse biomolecules, which constrains its potential, especially in biomedical diagnostics. Addressing this bottleneck, the multidisciplinary research team led by Marcelo Terán at UOC’s Artificial Intelligence for Human Well-being (AIWELL) group embarked on creating a standardized, searchable spectral library that democratizes data access and fosters accelerated biomedical innovation.
This open spectral library integrates data synthesized from numerous leading academic publications and original experimental validations to mitigate the prevalent scarcity of open-access Raman spectra. Upon meticulous collation, the team deployed advanced algorithmic solutions grounded in classical computer vision to automate the extraction and indexing of spectral features from diverse sources, ensuring data integrity and reproducibility. These algorithms exhibit perfect accuracy in identifying molecular signatures among the top ten likely candidates for given spectra, exemplified by high-confidence detection of collagen and categorization by molecular type, such as distinguishing proteins from lipids.
The implications of this development are profound in medical sciences, where non-invasive, rapid molecular analysis is paramount. Raman spectroscopy, empowered by this extensive database, can precisely decode the biochemical milieu of biological samples, enabling real-time monitoring of physiological changes at the molecular level. This is particularly salient for diseases like cancer, where the molecular landscape undergoes significant transformation during disease progression. As Terán emphasizes, the availability of an open, high-quality spectral repository is vital for training artificial intelligence models that can interpret complex spectral data with unprecedented objectivity and speed, circumventing the traditional reliance on human visual peak analysis, which is inherently subjective and labor-intensive.
This spectral database project exemplifies the synergy between data science and photonic analysis. By leveraging AI-driven methodologies, the researchers have laid the groundwork for a scalable, collaborative platform that invites contributions from the global scientific community. The envisaged expansion aims to evolve this repository into a preeminent, communal resource, establishing new benchmarks for spectral data sharing—a practice still emerging in the Raman spectroscopy domain. Open science principles underpin this initiative, acknowledging that the acceleration of biomedical research hinges on transparent data exchange and replicable experiments.
The technical cornerstone of Raman spectral measurement involves precise excitation of samples by monochromatic light sources and capturing their scattered light with high spectral resolution spectrometers. Every biomolecule imparts a unique Raman shift pattern corresponding to molecular vibrations such as stretching, bending, or twisting motions of bonds. By compiling these spectral fingerprints into a standardized, searchable format, the new library transcends prior limitations, enabling algorithmic spectral matching with high fidelity and consistency.
Moreover, this resource promotes the training and validation of computational models that integrate Raman spectral information with machine learning, thus bridging the gap between raw spectral data and actionable biomedical insights. The anticipated broad adoption of this database is poised to catalyze breakthroughs in diagnostics, facilitating early detection and patient-tailored monitoring of multifaceted diseases. As large-scale AI models require vast, diverse, and reliably annotated datasets, this spectral library directly addresses that critical need, propelling clinical translation of spectroscopic data.
Beyond its immediate applications, the initiative aligns with broader scientific values at UOC, which champions open science and ethical, human-centered technology. By positioning this database as an open-access resource, UOC and ICFO reinforce the global movement toward more inclusive, transparent research ecosystems. This also dovetails with sustainable development goals related to good health and well-being, underscoring the societal impact of accessible biomedical data.
The meticulous peer-reviewed publication describing this work appeared in the prestigious journal Chemometrics and Intelligent Laboratory Systems, ensuring rigorous academic scrutiny and visibility. The research, conducted collaboratively by data engineers, photonics experts, and biochemists, exemplifies the interdisciplinary approach required to conquer complex scientific challenges. The open provision of the associated algorithms and data through platforms like GitHub further exemplifies the commitment to reproducibility and community engagement.
Future pathways for this project include continuous database enrichment through community contributions, enhanced algorithmic refinement for more nuanced spectral discrimination, and deeper integration with emerging biomedical AI platforms. Such enhancements will ensure the library’s relevance as new biomolecular spectra become characterized, reinforcing its utility across research disciplines.
Ultimately, this open Raman spectral library is not only a technical triumph but a paradigm shift toward open, data-driven biochemical research. It promises to streamline molecular identification workflows, reduce analytical variability, and foster a collaborative spirit among scientists tackling some of the most pressing health challenges. Through this initiative, UOC and ICFO solidify their role at the forefront of molecular spectroscopy research, empowering a data-rich future for precision medicine and biomedical discovery.
Subject of Research: Not applicable
Article Title: Open Raman spectral library for biomolecule identification
News Publication Date: 15-Sep-2025
Web References:
Universitat Oberta de Catalunya: https://www.uoc.edu/en
Institute of Photonic Sciences: https://www.icfo.eu/
Raman spectral database: https://github.com/mteranm/ramanbiolib
Raman spectroscopy (Wikipedia): https://en.wikipedia.org/wiki/C._V._Raman
AIWELL research group: https://recerca.uoc.edu/grupos/37378/detalle?lang=en
eHealth Centre: https://www.uoc.edu/en/research/centres/ehealth
Open Science Office: https://biblioteca.uoc.edu/en/page/Open-Science-Office/
O2 institutional repository: https://openaccess.uoc.edu/home?locale=ca
References:
Terán, M., Ruiz, J. J., Loza-Alvarez, P., Masip, D., & Merino, D. (2025). Open Raman spectral library for biomolecule identification. Chemometrics and Intelligent Laboratory Systems, 264, 105476. https://doi.org/10.1016/j.chemolab.2025.105476
Keywords: Molecular biology, Computer science
Tags: accessible biomolecular dataAI and biophotonics integrationanalytical and medical sciencesbiomedical diagnostics innovationbiomolecule identificationmolecular identification techniquesmultidisciplinary research in biophotonicsopen-access Raman spectral databaseRaman spectroscopy advancementsspectral signatures of biomoleculesUOC AIWELL group initiativesUOC research project



