.. _configuration:

##############
Configuration
##############

Bam_files_info.dic
*******************

BamQuery stores in a Python dictionary information about each BAM/CRAM file queried. 
The key for each bam/cram file is obtained from the path where the file is located. 
Therefore, take precautions to store bam/cram files under informative folder names that serve to differentiate them. 
For example, for all GTEx healthy tissue cram files, the file organization should be as follows: 

.. code::

        GTEx
        ├── adipose_subcutaneous
        │   ├── SRR1333352
        │   ├── SRR1338301
        │   ├── SRR1338627
        │   ├── SRR1339740
        │   ├


With this file structure, a cram path file for GTEx should look like this:  ``/home/GTEX/brain_amygdala/SRR1333352/SRR1333352.cram``.

In this example, BamQuery creates the key ``brain_amygdala_SRR1333352`` to save information related to this sample.

This information is organized in a list as follows: |br| 
0: ``/home/GTEX/brain_amygdala/SRR1333352/SRR1333352.cram`` --> Whole path to bam/cram file |br| 
1: ``80302110`` --> Total Primary Read count in the bam/cram file |br| 
2: ``brain_amygdala`` --> Tissue |br| 
3: ``Brain`` --> Tissue type  |br| 
4: ``no`` --> Shortlist |br| 
5: ``NA`` --> Sequencing  |br| 
6: ``NA`` --> Library |br| 
7: ``User_1`` --> The user that includes the bam/cram file information (first user quering a given bam file) |br| 


The Tissue, Tissue type and Shortlist fields must be provided by the first user who queries the given bam/cram file. This is done only once (see instructions below). |br| 

The sequencing and library fields are guessed directly by BamQuery from the bam/cram file. This is also done once when a user configures BamQuery to query the file taking into account its stradedness. 


Provide details to each Bam file
********************************

Every time a BAM file is queried for the first time, you need to provided some information about the origin of the file. |br| 
This is why the following exception will appear when running BamQuery:

.. py:exception:: fill in the `bam_files_tissues.csv` file with the requested information:

    Before to continue you must provide the tissue type for the bam files annotated in the file : .../output/res/AUX_files/bam_files_tissues.csv. Please enter for each sample : tissue, tissue_type, shortlist.

To resolve this, you must fill in the :code:`bam_files_tissues.csv` file with the requested information. |br| 
BamQuery stores the information, so this is a one-time operation for each BAM file. |br| 

Columns in :code:`bam_files_tissues.csv` : |br| 

For each BAM file, you must provide tissue, tissue_type, shortlist. |br| 
This classification is used by BamQuery for the elaboration of the heatmaps. See :ref:`heat maps folder`

**tissue:**
Refers to the tissue of the sample. For example: prostate

**tissue_type:**
It refers to a specific feauture of the tissue. For example: prostate tissue, can be classified as a type of SexSpecific tissue

**shortlist:**
Yes or No. This sets the BAM file as part of a selected group of samples within a tissue type to calculate the average level of transcript expression.


Once the file :code:`bam_files_tissues.csv` has been filled, you can relaunch BamQuery.

.. |br| raw:: html

      <br>