Command Line Options

At the command line:

BamQuery --help
usage: BamQuery.py [-h] [--mode MODE] [--th_out TH_OUT] [--dbSNP DBSNP] [--c]
                                [--strandedness] [--light] [--sc] [--var] [--maxmm]
                                [--overlap] [--plots] [--m] [--dev] [--t T]
                                path_to_input_folder name_exp genome_version

======== BamQuery ========

positional arguments:
        path_to_input_folder  Path to the input folder where to find BAM_directories.tsv and peptides.tsv
        name_exp              BamQuery search Id
        genome_version        Genome human releases : v26_88 / v33_99 / v38_104; Genome mouse releases : M24 / M30

optional arguments:
                                -h, --help            show this help message and exit
                                --mode MODE           BamQuery search mode : normal / translation
                                --th_out TH_OUT       Threshold to assess expression comparation with other tissues
                                --dbSNP DBSNP         Human dbSNP : 149 / 151 / 155
                                --c                   Take into account the only common SNPs from the dbSNP database chosen
                                --strandedness        Take into account strandedness of the samples
                                --light               Display only the count and norm count for peptides and regions
                                --sc                  Query Single Cell Bam Files
                                --umi                 Count UMIs in Single Cell Bam Files
                                --var                 Keep Variants Alignments
                                --maxmm               Enable STAR to generate a larger number of alignments
                                --overlap             Count overlapping reads
                                --plots               Plot biotype pie-charts
                                --m                   Mouse genome
                                --dev                 Save all temps files
                                --t T                 Specify the number of processing threads to run BamQuery. The default is 4

Positional Arguments

path_to_input_folder

To run BamQuery, create an input folder that includes two files: BAM_directories.tsv and peptides.tsv.

├── Input
        ├── BAM_directories.tsv
        └── peptides.tsv

The BAM_directories.tsv file must contain the BAM/CRAM files in which the peptides are queried.

The peptides.tsv file must contain the peptides for which BamQuery performs the RNA expression search in the BAM/CRAM files. (See: Format Input Files).

name_exp

Name of the search as a means of identification.

genome_version

This option allows to choose between three GRCh38 human genome annotation publication versions: v26_88 / v33_99 / v38_104.

This option together with --m (which allows BamQuery to search in the mouse genome) supports two mouse genome annotation versions of GRCm38 and GRCm39, respectively: M24 / M30.

You need to donwload any of the human or mouse supported versions in BamQuery.
Genome human releases : v26_88 / v33_99 / v38_104
Genome mouse releases : M24 / M30

Please see Installation to follow the instructions for the installation of the genome versions.

Note

Gencode Human Releases supported with BamQuery

Gencode_human/release_26 : v26_88
Evidence-based annotation of the human genome (GRCh38), version 26 (Ensembl 88) https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_26/

Gencode_human/release_33 : v33_99
Evidence-based annotation of the human genome (GRCh38), version 33 (Ensembl 99) https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/

Gencode_human/release_38 : v38_104
Evidence-based annotation of the human genome (GRCh38), version 38 (Ensembl 104) https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/

Gencode Mouse Releases supported with BamQuery

Gencode_mouse/release_M24 : M24
Evidence-based annotation of the mouse genome (GRCm38), version M24 (Ensembl 99) https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M24/

Gencode_mouse/release_M30 : M30
Evidence-based annotation of the mouse genome (GRCm39), version M30 (Ensembl 107) https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M30/


Optional Arguments

--mode

BamQuery has two modes of search : normal or translation.
By default, BamQuery runs in normal mode.

Normal mode: In normal mode, BamQuery expects to find in BAM_directories.tsv the BAM/CRAM files corresponding to the RNA-seq datasets. For more information, see normal mode example.

Translation mode: Instead of BAM_directories.tsv BamQuery expects a BAM_ribo_directories.tsv corresponding to the Ribo-seq datasets. In this mode, BamQuery can be used as a means to verify the presence of ribosomal profiling reads for the peptides of interest. For more information, see translation mode example.

--th_out

The th_out option changes the threshold that is considered to highlight in black the heatmap boxes representing RNA expression in Bam files where a peptide has an average rphm>th_out.
By default, this threshold is 8.55 rphm (reads per hundred million).

--dbSNP

This option allows you to choose between three versions of dbSNPs: 149 / 151 / 155.
By default, dbSNP 0.

--c

This option allows only to choose the most COMMON SNPs from the dbSNP release that you choose with the argument above.

--strandedness

When using this option, BamQuery takes into account the strand on which the peptide is located in the genomic location to count the overlapping reads.

For each Bam file, BamQuery automatically detects the library (stranded/non-stranded, pair-end, single-end, forward or reverse direction).
By defatul, all bam files will be treated according to the pair-end, single-end library but in unstranded mode.

--light

In this mode, BamQuery only displays peptide counting and normalization. Therefore, no biotyping analysis will be performed for peptides.
For more information, see light mode example.

--sc

BamQuery expects to find in BAM_directories.tsv the BAM/CRAM files corresponding to the single cell RNA-seq datasets.
BamQuery reports the RNA-seq read count for each peptide in cell populations and generates specific output.
For more information, see single cell example.

--umi

BamQuery expects to find in BAM_directories.tsv the BAM/CRAM files corresponding to the single cell RNA-seq datasets.
BamQuery reports instead of RNA-seq read count, it reports the unique molecular identifier (UMI) count for each peptide in cell populations and generates specific output.
For more information, see single cell example.

--var

A variant alignment refers to an alignment where the mapped MCS may deviate from the reference genome sequence by a maximum of 3 nucleotides (1 amino acid).
In these cases, single nucleotide variants are taken into account even though they are not included in the selected dbSNP.

Note

Allowing BamQuery to maintain variant alignments could facilitate the evaluation of the expression of mutated MAPs. However, using this option generates a large number of alignments that would impact execution time.

--maxmm

This option changes some of the STAR parameters (in the MCS alignment process, see 2. Collect the locations of the MAP Coding Sequences (MCS) in the genome.) to allow STAR to generate a larger number of alignments.
The new values for the modified STAR parameters are:

--winAnchorMultimapNmax 20000
--outFilterMultimapNmax 100000
--outFilterMultimapScoreRange 4
--alignTranscriptsPerReadNmax 100000
--seedPerWindowNmax 1500
--seedNoneLociPerWindow 1500
--alignWindowsPerReadNmax 20000
--alignTranscriptsPerWindowNmax 1500

Warning

With this option the STAR aligner will take longer to align the MCS with the genome.

--overlap

BamQuery counts an RNA-seq read if the read completely spans the MCS, however, with this option BamQuery also counts RNA-seq reads that overlap at least 60% of the MCS.

--plots

This option sets BamQuery to produce pie charts in the biotype analysis step.

--m

This option sets BamQuery to search for peptides in the mouse genome.
Along with the --genome_version option BamQuery can be parameterized to run the search on either of the two supported GRCm38 and GRCm39 mouse genome annotation versions: M24 / M30. If only the --m option is passed as an argument, BamQuery takes the default M24 mouse genome annotation version.
By default, the mouse genome annotation versions: M24 / M30, are used with the EVA database of genomic variation for the GRCm38 and GRCm39, respectively.

--dev

This option allows you to save all intermediate files.

Warning

Intermediate files can take up a lot of space.

--t

Specify the number of processing threads to run BamQuery. By default, BamQuery runs in 4 threads.