The SIRIUS commandline tool can be called via the “binary/startscript” by simply running the command in your commandline:

sirius --help

You can always use the --help option to get a documentation about the available commands and options.

Since version 4.4.0 the SIRIUS commandline program is designed as a toolbox that provides different tools (subcommands) for metabolite identification. This tools can be concatenated to toolchains to compute multiple analysis steps at once. We distinguish subcommands of the following categories:

  • CONFIGURATION: The config tool can be executed before every toolchain or standalone tool to set all configurations available in SIRIUS from the command line.

  • STANDALONE: Tools that run Standalone and cannot be concatenated with other subtools. These are usually tools for configuration purposes.

  • PREPROCESSING: Tools that prepare input data to be compatible with SIRIUS.

  • COMPOUND TOOL: Tools that analyze each compound (instance) of the dataset individually and can be concatenated with other tools.

  • DATASET TOOL: Tools that analyze all compounds (instances) of the dataset simultaneously and can be concatenated with other tools.

Each subtool can also be called with the --help option to get a documentation about the available options and possible follow up commands in a toolchain. For the formula tool the command would be:

sirius formula --help

SIRIUS: Identifying Molecular Formulas

One main purpose of SIRIUS is identifying the molecular formula of a measured ion. For this task SIRIUS provides the formula tool. The most basic way to use the formula tool is with the generic text/CSV input:

sirius [OPTIONS] -1 <MS FILE> -2 <MS/MS FILE> -z <PARENTMASS> --adduct <adduct> formula

Where MS FILE and MS/MS FILE are either CSV or MGF files. If MGF files are used, you might omit the -z option. If you omit the --adduct option, [M+?]+ is used as default. It is also possible to give a list of MS/MS files if you have several measurements of the same compound with different collision energies. SIRIUS will merge these MS/MS spectra into one spectrum.

The more common and recommended way is using input files in .ms or .mgf format (with MSLEVEL and PEPMASS meta information). Such files contain all spectra for a compound together with their meta data. They can also contain multiple compounds per file. Further SIRIUS is able to crawl an input directory for supported files:

sirius [OPTIONS] --input demo-data/ms formula [OPTIONS]

SIRIUS will pick the meta information (parentmass, ionization etc.) from the .ms files in the given directory. This allows SIRIUS to run in batch mode (analyzing multiple compounds without starting a new jvm process every time).

Besides the raw results like the fragmentation trees in json format, SIRIUS will output a summary containing the rank, molecularFormula, adduct, precursorFormula, rankingScore, SiriusScore, TreeScore, IsotopeScore, numExplainedPeaks, explainedIntensity, medianMassErrorFragmentPeaks(ppm), medianAbsoluteMassErrorFragmentPeaks(ppm), massErrorPrecursor(ppm) on compound level and a summary containing the top hits for all compounds on project level.

The SiriusScore is the sum of the TreeScore and the IsotopeScore. The tool uses the SiriusScore for ranking. If the IsotopeScore is negative, it is set to zero. If at least one IsotopeScore is greater than 10, the isotope pattern is considered to have good quality and only the candidates with best isotope pattern scores are selected for further fragmentation pattern analysis.

Computing fragmentation trees

If you already know the correct molecular formula and just want to compute a fragmentation tree, you can specify a single molecular formula with the option. SIRIUS will then only compute a tree for this molecular formula. If your input data is in format, the molecular formula might be already specified within the file. If a molecular formula is specified, the parent mass can be omitted. However, you still have to specify the ionization (except for default value [M+H]+):

sirius -f C20H19NO5 -2 demo-data/txt/chelidonine/_msms1.txt demo-data/txt/chelidonine_msms2.txt formula

Analysis Profiles

If you want to analyze spectra measured with Orbitrap or FT-ICR, you should specify the appropriate analysis profile. A profile is a set of configuration options and scoring functions SIRIUS will use for its analysis. For example, the and profiles having tighter constraints for the allowed mass deviation but do not rely so much on the intensity of isotope peaks. You can set the profile with the -p <name> option. By default, qtof is used.

See the following examples for running the formula sub-tool of the SIRIUS commandline tool:

ZODIAC: Improve Molecular Formula Identifications

If your input data is derived from a biological sample or any other set of derivatives, similarities between different compounds can be leveraged to improve molecular formula annotation of the individual compounds. ZODIAC builds a similarity network between molecular formula candidates of all compounds that where computed via the formula tool and re-ranks these candidates using Bayesian statistics (Gibbs Sampling). This decreases error rates (of top 1 candidates) by approximately 2 fold — on challenging datasets that contain many large compounds, improvements can be much more dramatic.

The zodiac tool can be executed after the formula tool without the need of many parameters:

sirius -i <input> -o <output> formula -c 50 zodiac

When using ZODIAC, it is reasonable to increase the maximum number of formula candidates (-c) that are stored after running . These candidates are input to ZODIAC. If the correct candidate is missing, ZODIAC cannot recover it. In order to reduce memory consumption and running time, ZODIAC uses a dynamic number of candidates per compound based on the m/z — the idea is, for low-mass compounds the correct molecular formula is much more likely to be in the, say, top 10. By default, ZODIAC uses 10 candidates for compounds with m/z lower equal to 300 (--considered-candidates-at-300 10) and 50 candidates for compounds with m/z greater equal to 800 (--considered-candidates-at-800 50).

The density of the ZODIAC network mainly depends on two parameters: --edge-threshold (default:0.95) and --minLocalConnections (default:10). The edge threshold defines the ratio of all possible edges between candidates that are discarded. Because most formula candidates are incorrect (there is only one correct candidate per compound) we assume most edges are spurious and we throw away the 95% with lowest score. However, to prevent compounds being disconnected completely from the rest of the network, we discard edges in such a way that one candidate per compound is connected to at least --minLocalConnections other compounds. This introduces an individual edge score threshold for each compound. However, when using --minLocalConnections, ZODIAC first has to create the complete network and filter edges afterwards. Thus, ZODIAC may consume a large amount of system memory.

For very large datasets, the ZODIAC network may not fit in 1TB system memory and more. Please, perform a feature alignment between your LC-MS/MS runs to reduce the number of compounds and thus reduce the size of the ZODIAC network. If this is still not sufficient, memory consumption can be dramatically decreased by setting --minLocalConnections=0. This will allow ZODIAC to filter low weight edges on the fly when creating the network. Use this setting with care, since it can result in a badly connected network that may decrease performance:

sirius -i <input> -o <output> formula -c 50 zodiac --minLocalConnections 0 --edge-threshold 0.99

CSI:FingerID: Identifying Molecular Structures

With the structure tool you can search for molecular structures with CSI:FingerID. To run CSI:FingerID you need to execute the formula tool first. You might also want to run the zodiac tool for improved formula ranking if your data is derived from a biological sample or any other set of derivatives.

With --databases you can specify the database SIRIUS should search in. Available are, among other pubchem and bio.

The structure tool will generate a structure_candidates.csv for each compound containing an ordered candidate list of structures with the CSI:FingerID score. Furthermore, a compound_identification.csv file will be generated containing the top candidates from all compounds ordered by their confidence.

sirius -i demo-data/ms/Bicuculline.ms -o <output>formula -c 10 structure --database pubchem

When running structure together with zodiac the command could look like this:

sirius -i <input> -o <output> formula -c 50 zodiac structure --database bio

CANOPUS: Predicting Compound Classes without Identification

The canopus tool allows you the predict compound classes from the probabilistic molecular fingerprint predicted by CSI:FingerID. So canopus can even provide compound class information for unidentified compound with no hit in a structure database:

sirius -i <input> -o <output> formula -c 10 structure --database pubchem canopus

PASSATUTTO: Decoy Spectra from Fragmentation Trees

The passattuto tool allows you to compute high quality decoy spectra from fragmentation trees provided by the formula tool. Assume your are using a spectral library as input you can easily create a decoy database based on this spectra:

sirius -i <spectral-lib> -o <output> formula passatutto

If no molecular formulas are annotated to the input spectra the best scoring candidate will be used for decoy computation instead.

LCMS-align: Feature detection and feature alignment

The lcms-align tool allows you to import mzML/mzXML files into SIRIUS. It performs feature detection and feature alignment based on the MS/MS spectra and creates a SIRIUS project-space which is then used to execute followup analysis steps:

sirius -i <mzml(s)> -o <output> lcms-run formula