4th August 2017: We have changed the way the system handles thresholds for links between spectra and motifs. All experiments can now set a threshold on both probability and overlap_score. If you do not wish to threshold on one or the other, set the respective threshold to zero. All experiments have been migrated such that the new settings give identical output to the old ones. We think that this improved flexibility will make the system more user friendly.
4th August 2017: Our new paper on the use of MS2LDA to investigate the variability in substructure prevalence across large experiments is now published in Analytical Chemistry: Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics.
26th September 2017: Our application note describing this Web application is now published in Bioinformatics: Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry.
Metabolomics is the large-scale, untargeted studies of the small molecules involved in essential life-sustaining chemical processes (metabolites). Untargeted metabolomics has provided insights into a wide array of fields, such as medical diagnostics, drug discovery, personalised medicine and many others. Measurements in metabolomics studies are routinely performed using liquid chromatography mass spectrometry (LC-MS) instruments. Using tandem mass spectrometry, fragmentation peaks characteristic to a compound can be obtained and used to help establish the identity of the compound.
Fragmentation spectra, which provide the characteristic fingerprints of compounds, also contain structural information where a subset of fragment peaks may correspond to a shared chemical substructure in a class of compounds. The aim of this site is to provide an online platform that allows users to perform unsupervised substructure discovery in fragmentation experiments, decompose fragmentation experiments into characterized substructures (Mass2Motifs) found in MS/MS spectra of reference compounds, and integrate fragmentation analysis with comparative metabolomics experiments.
How does it work? In our proposed method (MS2LDA), discrete fragment and neutral loss features are extracted from fragmentation spectra. Related features that tend to co-occur are detected using the Latent Dirichlet Allocation model. The figure below shows the analogy between LDA for text and MS2LDA for fragment and neutral loss features. LDA finds topics interpreted as ‘football related’, ‘business-related’ and ‘environment related’. MS2LDA finds sets of concurring mass fragments or losses (Mass2Motifs) that can be interpreted as ‘Asparagine-related’, ‘Hexose-related’ and ‘Adenine-related’.
The tool currently accepts the fragmentation experiments in various formats (mzML, MSP, MGF) and optionally an MS1 peak list can be added to which the MS1 peaks found in the fragmentation experiment are then matched prior to running LDA or Decomposition.
Ms2lda.org provides access to the LDA and Decomposition models and includes the following visualisation features:
In addition, the following features are provided to facilitate integration with metabolomics experiments by:
The data and codes for the paper can be found at http://dx.doi.org/10.5525/gla.researchdata.313. A new version of MS2LDA that allows for topics (i.e. Mass2Motifs) to be inferred across multiple document collections (i.e. fragmentation files) at once can be found at http://github.com/sdrogers/lda. The rest of the pipeline codes to process and load fragmentation data into the pipeline can also be found there. The codes for this website itself, alongside various visualisation modules, can be found at http://github.com/sdrogers/ms2ldaviz