CONsensus Regulatory ELements




CONREL, a genome browser that allows for the exploration of consensus regulatory elements at different levels of abstraction. The total binding affinity of transcription factors on whole consensus region sequences is here fully exploited to characterize and annotate functional properties of regulatory elements. CONREL can be used to explore genomic loci, genes or genomic regions of interest across different cell lines and tissues.



SEARCH BY CHROMOSOME POSITION

SEARCH BY GENE NAME

SETTINGS


NOTE: The lists of tissues and the tree of cell-lines are built based on the peak format selected. Please select the peak format before any tissue or cell-line selection. Otherwise, by changing the peak format it will reset the corresponding selection.

Optional tracks to add:

TF motifs parameters:

Cell-line Tree selection

*Select 'Cell-lines consensus regions' in setting panel to be able to select and add cell-line tracks level to the genome browser


ERROR

Datasets sources

CONREL v1 provides an extensive collection of consensus promoters, enhancers and active enhancers for 198 cell-lines across 38 tissue types, which are also combined to provide global consensuses. The consensus are defined by combining H3K4me1, H3K4me3 and H3K27ac histone markers from The Encyclopedia of DNA Elements (ENCODE) and the NIH Roadmap Epigenomics Program (Roadmap) ChIP-seq peak data (data available as of September 2018 www.encodeproject.org). Those ChIP-seq dataset has been mapped to GRCh37/hg19 human assembly. 1,000 Genomes Project genotype data (release 20130502, 2,504 individuals) and the total binding affinity of thousands of transcription factor binding motifs at genomic regulatory regions is here fully combined and exploited to characterize and annotate functional properties of our collection.

CONREL v1.1 extends the collection of regulatory elements adding the liftOver mapped to the genome assembly version GRCh38/hg38. UCSC LiftOver was used to convert coordinates of consensus regions between genome assembly. 1,000 Genomes Project genotype data (release 20170504) was used.

CONREL v2 provides a new collection of consensus promoters, enhancers and active enhancers for XX cell-lines across X tissue types for Mus musculus. The ChIP-seq data has been mapped to GRCm38/mm10 mouse assembly.

Workflow

ChIP-seq data

ChIP-seq data from ENCODE (based on hg19 assembly) was downloaded for all cell lines with H3K4me1, H3K4me3 or H3K27ac histone markers peak data available. A total of 1398 ChIP-seq were available considering both narrowPeak and broadPeak datasets.
For each marker, peak regions were merged for sample replicates and then for different experiments of the same cell line. Consensus regions for each cell line were finally computed based on the markers providing a collection of global and specific CREs, for 38 tissue types and 198 different cell lines.

Cis-regulatory elements

We characterized cell line, tissue and global CREs for transcriptional regulatory elements promoter, enhancer and active enhancer. Tissue-specific consensuses were computed by merging consensus regions across cell lines that originated from the same tissue, while global consensuses were computed by merging consensus regions across all cell lines.
A summary of the number of global CREs and the percentage of genome spanned by the identified consensus regions is here reported:

 

Promoters

Enhancers

Active enhancers

# regions

%

# regions

%

# regions

%

Global narrowPeak

25,512

0.80

716,249

30.63

290,424

15.92

Global broadPeak

28,307

0.96

303,125

42.10

115,720

22.62

TF and TFBS motifs

5,424 unique TF DNA-binding sites (TFBSs) motifs are collected in the form of PFM and mined.

The motifs are collected from the databases:

  • Jaspar[1] - JASPAR_v2014, JASPAR_v2016, JASPAR_CORE
  • hPDI[2]
  • SwissRegulon[3]
  • HOCOMOCO[4] - HOCOMOCO v10
  • TRANSFAC Professional[5] - TRANSFAC v2017.1
The annotation file for all the TFBS motifs is available to download in the Download page.

[1] Khan, A., Fornes, O., Stigliani, A., et al. (2018) JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res., 46, D1284.
[2] Xie, Z., Hu, S., Blackshaw, S., et al. (2010) hPDI: a database of experimental human protein-DNA interactions. Bioinformatics, 26, 287–289.
[3] Pachkov, M., Balwierz, P. J., Arnold, P., et al. (2013) SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res., 41, D214-220.
[4] Kulakovskiy, I. V., Vorontsov, I. E., Yevshin, I. S., et al. (2018) HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res., 46, D252–D259.
[5] Matys, V., Kel-Margoulis, O. V., Fricke, E., et al. (2006) TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res., 34, D108-110.

Comparison resources

The comparison is performed between our global annotations and regulatory region annotations provided by other available resources.

Resources used for comparison:

  • SCREEN[1] - v2019-10
  • Ensembl[2] - release 100
  • GeneHancer[3]
  • EnhancerAtlas[4] - v2
  • DENdb[5]
The annotation of which other resources provide support for the overlap with our consensus region is provided in the genome browser.

[1] ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74.
[2] Zerbino, D. R., Wilder, S. P., Johnson, N., et al. (2015) The ensembl regulatory build. Genome Biol., 16, 56.
[3] Fishilevich, S., Nudel, R., Rappaport, N., et al. (2017) GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database, 2017.
[4] Gao, T., He, B., Liu, S., et al. (2016) EnhancerAtlas: a resource for enhancer annotation and analysis in 105 human cell/tissue types. Bioinformatics, btw495.
[5] Ashoor, H., Kleftogiannis, D., Radovanovic, A., et al. (2015) DENdb: database of integrated human enhancers. Database (Oxford), 2015.

How to cite CONREL

Dalfovo,D., Valentini,S., Romanel, A., Exploring functionally annotated transcriptional consensus regulatory elements with CONREL. Database (2020) Vol. 2020: article ID baaa071; doi:10.1093/database/baaa071

What is CONREL

The help contains all the information about how to use CONREL's website.



CONREL is a web-based genome browser that enables the exploration of consensus regulatory elements (CREs) across the human genome and provides an exhaustive annotation of transcription factors with enriched TBAs across all annotated CREs. While CREs allows to better characterize regulatory elements that are conserved among different experiments, cell-lines and tissues, the TBA approach allows to measure the overall one-to-one relationship between a TF and a CRE.

Navigate the help page:

Starting point

To specify your desired inputs you have to use the input form at the CONREL search page.

Search page



Fig 1 - Search tab

Setting page

In the settings page you can change the parameters that influence the output. You can choose which tracks you want to visualize and add optional tracks.
NOTE that some parameters are required and some are optional.

Under Visualization you can select the level of details for the genome browser. You can choose between:

  • Gene displays regions that include both protein-coding genes and non-coding RNA genes
  • Transcript displays transcripts and their stucture (exons, coding sequence) for both protein-coding genes and non-coding RNA genes

Required parameters

Under peaks call format you can select the consensus regulatory elements generated using the Narrow peaks dataset and/or the Broad peaks dataset as defined by the ENCODE consortium.


In Consensus regulatory element setting you can select which type of CRE (among: promoter, enhancer and active enhancer) you want to visualize


The Consensus region tracks setting permits you to visualize the consensus regolatory elements at different level of abstraction. You can choose between:

  • Global displays the CREs generated combining all cell lines. It provides an overall landscape
  • Tissue displays the CREs generated combining the cell lines originated from the same tissue
  • Cell line displays the CREs generated combining replicates and experiments for the same cell line

NOTE that the lists of tissues and the tree of cell-lines are built based on the peak format selected. A different list of tissues and cell lines was available for Narrow and Broad peaks format. Then, selecting Narrow peak, Broad peak or both will provide a different list of tissue and cell line consensus regions. Thus, changing the peak format it will reset both tissue and cell line selections.

Fig 2.1 - Setting page, required parameters


Cell-line Tree selection

Selecting the track Cell line consensus regions, a cell lines selection tree view will appear on the right of the webpage (Fig 2.2). The 198 cell lines are aggregated across the 38 tissue types. It is possible to check the box of a tissue in order to select all the cell lines of that specific tissue. Also, it is possible to expand the tissue and check the boxes of the cell lines of interest.

Fig 2.2 - Setting page, cell line selection


Optional parameters

The SNP track displays all the single-nucleotide variants with a MAF greater or equal to 0.01 from dbSNP v151


The TSS track displays the Transcription Start Sites with a confidence score grater or equal to 10. TSSs were determined by SwitchGear Genomics


The TBA information display the Total Binding Affinity of a regulatory region for a specific TF.
NOTE that the TBA is available only for global and tissue consensus regions.


The TBA p-value reflects the significance of the TBA normalized score against a PFM-specific reference distribution of normalized TBA scores. Cutoff at 1e-05, which is set as default cutoff in CONREL, allows for stringent multiple-hypothesis correction at a specific CRE; more relaxed p-value cutoffs can be set for exploratory analyses.


The Motifs count cutoff puts a threshold on the number of aligned sequences used to built a TF motif. This is an option to limit the number of TF motifs in your output.
A motif is usually derived from a set of aligned sequences. Lower count means lower sequences that supports a specific motif. This option excludes all the motifs with count (sequences aligned) below than threshold.


The Fraction motifs cutoff puts a threshold on the fraction of regions with a significant TBA score for a specific motif. This is an option to limit the number of TF motifs in your output.
This option excludes all the motifs that are enriched in TBA score in more regions (expressed as fraction over all the regions of the corresponding consensus) than the threshold provided.



Fig 2.3 - Setting page, optional parameters

Browser page

Due to computational restriction, the genome browser will be loaded within a window of ±1Mb before and after the chromosome coordinates or the gene selected.
The genome browser window can be navigated. Clicking on a region/variant/gene/trascript gives several informations about it into table format that can be copied or downloaded using different file formats.

Fig 3 - Browser page examples



Files header

Gene/Transcript elements

The headers of the downloaded files are automatically generated based on the Ensembl based annotation package:


For more information please refer to official documentation at Ensembl portal: Gene annotation in Ensembl


Consensus regions

 seqnames  The name of the chromosome
 start  The starting position of the feature in the chromosome
 end  The ending position of the feature in the chromosome
 width  The size of the region in base-pair (bp)
 strand  Defines the strand. Either "*" (=no strand) or "+" or "-"
 overlap.referenceDB  The list of resources that provide an overlap between the consensus region and a regulatory element in the reference resources

TBA info

 p.value  The p-value selected for the TBA
 TF_Symbol_and_Code  For the specific motif, the gene symbol(s) of the Transcription Factor represented by the motif and in parentheses the ID(s) of the motif
 TBA_fraction  the fraction of common alleles among 1,000 Genomes Project individuals that support TFs TBA enrichment
 GeneCard  a link out for the geneSymbol to geneCard portal



Variants info

For more info, please have a look at the official documention on dbSNP https://www.ncbi.nlm.nih.gov/snp/

 RS  dbSNP ID (i.e. rs number)
 RSPOS  Chr position reported in dbSNP
 RV  RS orientation is reversed
 VP  Variation Property. Documentation is at ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf
 GENEINFO  Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)
 dbSNPBuildID  First dbSNP Build for RS
 SAO  Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both
 SSR  Variant Suspect Reason Codes (may be more than one value added together) 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other
 WGT  Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 or more
 VC  Variation Class
 PM  Variant is Precious(Clinical,Pubmed Cited)
 TPA  Provisional Third Party Annotation(TPA) (currently rs from PHARMGKB who will give phenotype data)
 PMC  Links exist to PubMed Central article
 S3D  Has 3D structure - SNP3D table
 SLO  Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out
 NSF  Has non-synonymous frameshift A coding region variation where one allele in the set changes all downstream amino acids. FxnClass = 44
 NSM  Has non-synonymous missense A coding region variation where one allele in the set changes protein peptide. FxnClass = 42
 NSN  Has non-synonymous nonsense A coding region variation where one allele in the set changes to STOP codon (TER). FxnClass = 41
 REF  Has reference A coding region variation where one allele in the set is identical to the reference sequence. FxnCode = 8
 SYN  Has synonymous A coding region variation where one allele in the set does not change the encoded amino acid. FxnCode = 3
 U3  In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53
 U5  In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55
 ASS  In acceptor splice site FxnCode = 73
 DSS  In donor splice-site FxnCode = 75
 INT  In Intron FxnCode = 6
 R3  In 3' gene region FxnCode = 13
 R5  In 5' gene region FxnCode = 15
 OTH  Has other variant with exactly the same set of mapped positions on NCBI refernce assembly.
 CFL  Has Assembly conflict. This is for weight 1 and 2 variant that maps to different chromosomes on different assemblies.
 ASP  Is Assembly specific. This is set if the variant only maps to one assembly
 MUT  Is mutation (journal citation, explicit fact): a low frequency variation that is cited in journal and other reputable sources
 VLD  Is Validated. This bit is set if the variant has 2+ minor allele count based on frequency or genotype data.
 G5A  >5% minor allele frequency in each and all populations
 G5  >5% minor allele frequency in 1+ populations
 HD  Marker is on high density genotyping kit (50K density or greater). The variant may have phenotype associations present in dbGaP.
 GNO  Genotypes available. The variant has individual genotype (in SubInd table).
 KGPhase1  1000 Genome phase 1 (incl. June Interim phase 1)
 KGPhase3  1000 Genome phase 3
 CDA  Variation is interrogated in a clinical diagnostic assay
 LSD  Submitted from a locus-specific database
 MTP  Microattribution/third-party annotation(TPA:GWAS,PAGE)
 OM  Has OMIM/OMIA
 NOC  Contig allele not present in variant allele list. The reference sequence allele at the mapped position is not present in the variant allele list, adjusted for orientation.
 WTD  Is Withdrawn by submitter If one member ss is withdrawn by submitter, then this bit is set. If all member ss' are withdrawn, then the rs is deleted to SNPHistory
 NOV  Rs cluster has non-overlapping allele sets. True when rs set has more than 2 alleles from different submissions and these sets share no alleles in common.
 NC  Inconsistent Genotype Submission For At Least One Sample
 CAF  An ordered, comma delimited list of allele frequencies based on 1000Genomes, starting with the reference allele followed by alternate alleles as ordered in the ALT column. Where a 1000Genomes alternate allele is not in the dbSNPs alternate allele set, the allele is added to the ALT column. The minor allele is the second largest value in the list, and was previuosly reported in VCF as the GMAF. This is the GMAF reported on the RefSNP and EntrezSNP pages and VariationReporter
 COMMON  RS is a common SNP. A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more founders contribute to that minor allele frequency.
 TOPMED  An ordered, comma delimited list of allele frequencies based on TOPMed, starting with the reference allele followed by alternate alleles as ordered in the ALT column. The TOPMed minor allele is the second largest value in the list.

Download singularity image and data

Download resources


Link to Resources download data

Folder Contains
 Consensus_Regulatory_Elements_BED Contains all the BED format files of the consensus regions (for global, tissues and cell-lines). For both GRCh37 and GRCh38 human genome assemblies
 Motifs_Annotations Contains the annotation for 5,424 TF DNA-binding sites (TFBSs) motifs in a tab-separated file. The columns are defined as:
  Code A unique code that identify the motif
  Repository The list of repositories from which the motif has been mined
  GeneName Gene name of the TF corresponding to that motif
  GeneSymbol Gene symbol of the TF corresponding to that motif
  MotifLength Length in bp of the motif
  ID This number ID correspond with the ID in the reference distribution motif<ID>.RData filename
 Motifs_TBA_Reference_Distributions Contains for each motif the reference distribution in RData format files. The files are named motif<ID>.RData. The <ID> numbers correspond to the ID column in the Motifs_Annotations file available in the download folder.


Download singularity app

Download a singuarity image to run this shiny app on your local server.
With the version 2 of CONREL is now possible to run a single singularity image with different species and/or assemblies of genomes.


Link to CONREL download data

How to install and use it

1. Unpack TAR archive

tar -xvf CONREL_vX.tar

2. Prepare shiny-server configuration file

You will first need to generate a custom configuration for your user, and it will give you instructions for usage:

$ /bin/bash prepare_conf.sh


Steps:
  ----------------------------------------------------------------------
  1. Use this script to prepare your shiny-server.conf (configuration)

  /bin/bash prepare_template.sh

  ----------------------------------------------------------------------
  2. If needed, you can provide the following arguments

  Commands:
    help: show help and exit
    start: the generation of your config

  Options:
    --port:  the port for the application (e.g., shiny default is 3737)
    --user:  the user for the run_as directive in the shiny configuration
    --base: base folder with applications
    --logs: temporary folder with write for logs (not required)
    --disable-index: disable directory indexing

  ----------------------------------------------------------------------
  3. Make sure Singularity is loaded, and run the container using
    the commands shown by the template.

When you add 'start' it will do the generation. Here we don't supply any arguments so that they are randomly generated.

$ /bin/bash prepare_template.sh start


Generating shiny configuration...
port: 9870
logs: /tmp/shiny-server.gG1X2Z
base: /srv/shiny-server/shiny_genomeBrowser
Server logging will be in /tmp/shiny-server.gG1X2Z

To run your server:

  module load singularity/2.4.6
  singularity run --bind /tmp/shiny-server.gG1X2Z/logs:/var/log/shiny \ 
  --bind /tmp/shiny-server.gG1X2Z/lib:/var/lib/shiny-server \ 
  --bind shiny-server.conf:/etc/shiny-server/shiny-server.conf shiny.simg
  ---------------------------------------------------------------------------

For custom applications, also add --bind /srv/shiny-server:/srv/shiny-server
  To see your applications, open your browser to http://127.0.0.1:9870 or
  open a ssh connection from your computer to your cluster.")

The configuration is generated in your present working directory:

$ cat shiny-server.conf


run_as vanessa;
server {
  listen 9098;

  # Define a location at the base URL
  location / {

    # Host the directory of Shiny Apps stored in this directory
    site_dir /srv/shiny-server;

    # Log all Shiny output to files in this directory
    log_dir /tmp/shiny-server.PtVRXE;

    # When a user visits the base URL rather than a particular application,
    # an index of the applications available in this directory will be shown.
    directory_index on;
  }
})

You can also choose to disable the indexing, meaning that someone that navigates to the root of the server (at the port) won't be able to explore all of your apps.

$ /bin/bash prepare_template.sh --disable-index

You can also customize the port, temporary folder, 'run_as' user, and base (if somewhere other than /srv/shiny-server)


3. Start server

Once you have that template, follow the instructions to run the container. The temporary folder is already created for you.

$ singularity run --bind /tmp/shiny-server.gG1X2Z/logs:/var/log/shiny \\ 
  --bind /tmp/shiny-server.gG1X2Z/lib:/var/lib/shiny-server \\ 
  --bind shiny-server.conf:/etc/shiny-server/shiny-server.conf CONREL.simg

[2018-04-07T00:14:17.403] [INFO] shiny-server - Shiny Server v1.5.7.890 (Node.js v8.10.0)
[2018-04-07T00:14:17.405] [INFO] shiny-server - Using config file '/etc/shiny-server/shiny-server.conf'
[2018-04-07T00:14:17.456] [INFO] shiny-server - Starting listener on 0.0.0.0:9870")

Custom application

When you run the container, if you add a bind to a folder of your own apps in /srv/shiny-server/CONREL, you can add your custom applications. The bind would look something like:

--bind /path/to/apps/folder:/srv/shiny-server/CONREL

This is very useful if you want to use the very last updated version of CONREL webapp or you want to modify it as you prefered. The shiny app is already installed into the singularity image. Anyway, you can download the source code of the app from the github repository here: github, CONREL and bind to the image.