The SIRIUS commandline tool can be called via the “binary/startscript” by simply running the command in the commandline:
NOTE: It is usually a good idea to use the
--help option to obtain an overview of the available commands and options.
This will guarantee that you will have a description of the commands that directly matches your SIRIUS version.
The SIRIUS commandline program is designed as a toolbox that provides different tools (subcommands) for metabolite identification. These tools can be concatenated to toolchains to compute multiple analysis steps at once. We distinguish the following types of subcommands:
CONFIGURATION: The config tool can be executed before every toolchain or standalone tool to set all configurations available in SIRIUS from the command line.
STANDALONE: Tools that run Standalone and cannot be concatenated with other subtools. These are usually tools for configuration or data management purposes. E.g. modifying
PREPROCESSING: Tools that prepare input data to be compatible with SIRIUS. E.g.
lcms-alignfor feature detection and alignment.
COMPOUND TOOL: Tools that analyze each compound (instance) of the dataset individually and can be concatenated with other tools. E.g.
structuredatabase search or
canopuscompound class prediction.
DATASET TOOL: Tools that analyze all compounds (instances) of the dataset simultaneously and can be concatenated with other tools. E.g. dataset-wide molecular formula annotatio with
Each subtool can also be called with the
--help option to get a documentation
about the available options and possible follow-up commands in a
toolchain. For the
formula tool the command would be:
sirius formula --help
The SIRIUS CLI toolbox can also be considered as a rudimentary workflow engine (toolchains) that works according the following principles:
- It only executes subtools that are specified in the command.
- It skips computation of a compound if a mandatory input from a previous step (subtool) is missing.
- Without an additional parameter it does not override existing results. Compounds where the result to be computed already exist will be skipped.
--recomputeis specified, existing results will be replaced with new ones for all subtools that are specified in the command.
- If a subtool is recomputed (
--recompute), the results of subtools that depend on the results of the recomputed subtool will be lost too. This is necessary because they would not be consistent with the newly computed results anymore.
- Compute missing results without recompute: Assume we have computed (
structure) results for compounds <600Da. We can now run the workflow (
canopus) on the same project-space without restricting the precursor mass to <600Da. Since for the <600Da compounds results of the
structuretools already exist, they will be skipped and not recomputed. However,
canopuswill be executed for all compounds.
- Recompute all results: If we do the same with
--recompute, all results will be recomputed.
- Recompute only one subtool: Assume we have a project-space
with complete results for
canopusand want recompute the
structureresults because we have used the wrong parameters. We can now execute
sirius -i <projekt> --recompute structure -db mydb. This will result in recomputing all
structureresults but without recomputing the
formularesults. Even the
canopusresults will not be lost since they only depend on the
fingerprintresults but not on the
strucuretool results. Note: If we would do the same with the
fingerprinttool, both, the
canopusresults would be lost.
- Proceed interrupted computations: Let’s say we have an interrupted computation. We can just rerun the same command to proceed the computation. SIRIUS will skip computation for existing results and compute only the missing ones.
- Special Case
zodiac: Since the
zodiactool is not calculated on a per compound basis but on the whole dataset, it will always be recomputed completely as soon as not all compounds contain
- Visualize results in the GUI: Results of all subtools that produce a
project-space as output (
--outputoption), can also be opened in the GUI for visualization, modification or even computation.
In the following, the most important (sub)commands and options are described shortly.
LCMS-align: Feature detection and feature alignment [Preprocessing]
lcms-align tool allows us to import multipe mzML/mzXML files into SIRIUS. It performs
feature detection and feature alignment based on the MS/MS spectra and
creates a SIRIUS project-space which is then used to execute followup
sirius -i <mzml(s)> -o <projectspace> lcms-run formula
SIRIUS: Identifying Molecular Formulas [Compound Tool]
One main purpose of SIRIUS is identifying the molecular formula of a
measured ion. For this task SIRIUS provides the
formula tool. The most basic way
to use the
formula tool is with the generic text/CSV input:
sirius [OPTIONS] -1 <MS FILE> -2 <MS/MS FILES comma separated> -z <PARENTMASS> --adduct <adduct> --output <projectspace> formula
Where MS FILE and MS/MS FILE are either CSV or MGF files. If MGF
files are used, you might omit the
-z option. If you omit the
[M+?]+ is used as default. It is also
possible to give a list of MS/MS files if you have several measurements
of the same compound with different collision energies. SIRIUS will
merge these MS/MS spectra into one spectrum.
The more common and recommended way is using input files in
format (with MSLEVEL and PEPMASS meta information). Such files contain
all spectra for a compound together with their meta data. They can also
contain multiple compounds per file. Further SIRIUS is able to crawl an
input directory for supported files:
sirius [OPTIONS] --input <inputFile> --output <projectspace> formula [OPTIONS]
SIRIUS will pick the meta information (parentmass, ionization etc.) from
.ms files in the given directory. This allows SIRIUS to run in batch
mode (analyzing multiple compounds without starting a new jvm process
Results such as scored molecular formula candidates and corresponding fragmentation trees in
.json format are written in the
If the command
write-summaries is used, SIRIUS
will output a summary file on compound level containing among others the ‘rank’, ‘molecularFormula’, ‘adduct’,
‘precursorFormula’, ‘rankingScore’, ‘SiriusScore’, ‘TreeScore’,
‘IsotopeScore’, ‘numExplainedPeaks’, ‘explainedIntensity’, ‘medianMassErrorFragmentPeaks(ppm)’,
‘medianAbsoluteMassErrorFragmentPeaks(ppm)’, ‘massErrorPrecursor(ppm)’ and a summary containing the top hits for all compounds on project
SiriusScore is the sum of the
TreeScore and the
IsotopeScore. The tool uses the
SiriusScore for ranking. If
IsotopeScore is negative, it is set to zero. If at least one
IsotopeScore is greater than 10, the isotope pattern is considered
to have good quality and only the candidates with best isotope pattern
scores are selected for further fragmentation pattern analysis.
Computing fragmentation trees
If you already know the correct molecular formula and just want to
compute a fragmentation tree, you can specify the formula using
--formulas. SIRIUS will then only compute a tree for this molecular
formula. If your input data is in
.ms format, the molecular formula might be already specified within the file. Note: the
can also be used to specify a comma-separated list of candidate molecular formulas.
sirius -i <input> --output <projectspace> formula --formulas <formula>
Datasets have different mass errors, level of noise and accuracy of isotope pattern intensities, depending, among others, on instrument type and setup.
By default, SIRIUS uses a profile for
Q-TOF data with 10 ppm mass deviation. This should not be interpreted as a Q-TOF-only profile, but is often a good default profile even for data from other instruments.
However, if you are certain that your data has mass errors much below 10 ppm - because if was measured on Orbitrap or FT-ICR - you should probably specify more stringent parameters.
Adjustments are also necessary if the data is expected to have even higher mass errors.
Both can be accomplished by specifying a different profile and mass deviations.
You may be familiar with the profile option from the GUI. Using the CLI, you can specify
-p <name> to either select
qtof (default) or
orbitrap will mainly use a different mass deviation of 5 ppm and slightly different settings for isotope scoring.
For FT-ICR data, we recommend to use the
orbitrap profile and additionally specify a lower mass deviation, as explained in the following.
You can specify the maximum allowed mass deviations for MS1 and MS2 and separately:
sirius -i <input> --output <projectspace> formula -p orbitrap --ppm-max 2 --ppm-max-ms2 5
ZODIAC: Improve Molecular Formula Identifications [Dataset Tool]
If your input data is derived from a biological sample or any other set
of derivatives, similarities between different compounds can be
leveraged to improve molecular formula annotation of the individual
compounds. ZODIAC builds a similarity network between molecular formula
candidates of all compounds that where computed via the
formula tool and
re-ranks these candidates using Bayesian statistics (Gibbs Sampling).
This decreases error rates (of top 1 candidates) by approximately 2 fold
— on challenging datasets that contain many large compounds,
improvements can be much more dramatic.
zodiac tool can be executed after the
formula tool without the need of many
sirius -i <input> -o <projectspace> formula -c 50 zodiac
When using ZODIAC, it is reasonable to increase the maximum number of
formula candidates (
-c) that are stored after running
formula. These candidates
are input to ZODIAC. If the correct candidate is missing, ZODIAC cannot
recover it. In order to reduce memory consumption and running time,
ZODIAC uses a dynamic number of candidates per compound based on the m/z
— the idea is, for low-mass compounds the correct molecular formula is
much more likely to be in the, say, top 10. By default, ZODIAC uses 10
candidates for compounds with m/z lower equal to 300 (
--considered-candidates-at-300 10) and 50
candidates for compounds with m/z greater equal to 800 (
In between these thresholds the number of candidates is calculated by interpolation.
The density of the ZODIAC network mainly depends on two parameters:
--minLocalConnections (default:10). The edge threshold defines the ratio of
all possible edges between candidates that are discarded. Because most
formula candidates are incorrect (there is only one correct candidate
per compound) we assume most edges are spurious and we throw away the
95% edges with lowest score. However, to prevent compounds being
disconnected completely from the rest of the network, we discard edges
in such a way that one candidate per compound is connected to at least
--minLocalConnections other compounds. This introduces an individual edge score threshold for
each compound. However, when using
--minLocalConnections, ZODIAC first has to create the
complete network and filter edges afterwards. Thus, ZODIAC may consume a
large amount of system memory.
For very large datasets, the ZODIAC network may not fit in 1TB system
memory and more. Please, perform a feature alignment between our
LC-MS/MS runs to reduce the number of compounds and thus reduce the size
of the ZODIAC network. If this is still not sufficient, memory
consumption can be dramatically decreased by setting
This will allow ZODIAC to filter low weight edges on the fly when creating the network.
Use this setting with care, since it can result in a badly connected
network that may decrease performance:
sirius -i <input> -o <projectspace> formula -c 50 zodiac --minLocalConnections 0 --edge-threshold 0.99
CSI:FingerID: Predicting molecular fingerprints [Compound Tool]
Molecular fingerprints can be predicted via
fingerprints command after molecular formula candidates
have been calculated running
formula. Fingerprint prediction is part of CSI:FingerID.
A fingerprint is predicted based on a specific molecular formula candidate
(with corresponding fragmentation tree). By default, fingerprints are predicted for multiple good scoring formula candidates by applying a soft score-threshold on the SIRIUS score.
sirius -i <input> -o <projectspace> formula fingerprint
CSI:FingerID: Identifying Molecular Structures [Compound Tool]
structure tool we can search with CSI:FingerID for molecular structures in a structure database.
To run structure database search, molecular fingerprints need to be predicted in advance by running the
fingerprint tool first. You might also
want to run the
zodiac tool for improved formula ranking if your data is derived
from a biological sample or any other set of derivatives.
--databases we can specify the database CSI:FingerID should search in. Available are, among other
When structure search was performed, the
write-summaries tool will generate a
structure_candidates.csv for each compound containing an ordered
candidate list of structures with the CSI:FingerID score. Furthermore, a projectspace-wide
file will be generated containing the top candidate structure for each compound
ordered by their confidence score.
sirius -i <input> -o <projectspace> formula fingerprint structure --database pubchem
structure together with
zodiac the command could look like this:
sirius -i <input> -o <projectspace> formula -c 50 zodiac fingerprint structure --database bio
CANOPUS: Database-free Compound Classes Prediction [Compound Tool]
canopus tool allows us to directly predict compound classes based on the probabilistic
molecular fingerprint that was predicted by CSI:FingerID (
fingerprint command). Notably,
canopus can even provide
compound class information for unidentified compounds with no hit in a structure database:
sirius -i <input> -o <projectspace> formula fingerprint canopus
PASSATUTTO: Decoy Spectra from Fragmentation Trees [Compound Tool]
passattuto tool allows us to compute high quality decoy spectra from
fragmentation trees provided by the
formula tool. Assume we are using a
spectral library as input us can easily create a decoy database based
on these spectra:
sirius -i <spectral-lib> -o <projectspace> formula passatutto
If no molecular formulas are annotated to the input spectra the best scoring candidate will be used for decoy computation instead.