API#

SCENIC+ object#

Preprocessing#

SCENIC+ semi-automated workflow using wrapper functions#

pycistarget wrapper#

Wrapper functions to run motif enrichment analysis using pycistarget

After sets of regions have been defined (e.g. topics or DARs). The complete pycistarget workflo can be run using a single function.

this function will run cistarget based and DEM based motif enrichment analysis with or without promoter regions.

scenicplus.wrappers.run_pycistarget.run_pycistarget(region_sets: Dict[str, PyRanges], species: str, save_path: str, custom_annot: DataFrame | None = None, save_partial: bool = False, ctx_db_path: str | None = None, dem_db_path: str | None = None, run_without_promoters: bool = False, biomart_host: str = 'http://www.ensembl.org', promoter_space: int = 500, ctx_auc_threshold: float = 0.005, ctx_nes_threshold: float = 3.0, ctx_rank_threshold: float = 0.05, dem_log2fc_thr: float = 0.5, dem_motif_hit_thr: float = 3.0, dem_max_bg_regions: int = 500, annotation: List[str] = ['Direct_annot', 'Orthology_annot'], motif_similarity_fdr: float = 1e-06, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', n_cpu: int = 1, _temp_dir: str | None = None, exclude_motifs: str | None = None, exclude_collection: List[str] | None = None, **kwargs)[source]#

Wrapper function for pycistarget

Parameters:
  • region_sets (Mapping[str, pr.PyRanges]) – A dictionary of PyRanges containing region coordinates for the region sets to be analyzed.

  • species (str) – Species from which genomic coordinates come from, options are: homo_sapiens, mus_musculus, drosophila_melanogaster and gallus_gallus.

  • save_path (str) – Directory in which to save outputs.

  • custom_annot (pd.DataFrame) –

    pandas DataFrame with genome annotation for custom species (i.e. for a species other than homo_sapiens, mus_musculus, drosophila_melanogaster or gallus_gallus). This DataFrame should (minimally) look like the example below, and only contains protein coding genes: >>> custom_annot

    Chromosome Start Strand Gene Transcript_type

    8053 chrY 22490397 1 PRY protein_coding 8153 chrY 12662368 1 USP9Y protein_coding 8155 chrY 12701231 1 USP9Y protein_coding 8158 chrY 12847045 1 USP9Y protein_coding 8328 chrY 22096007 -1 PRY2 protein_coding … … … … … … 246958 chr1 181483738 1 CACNA1E protein_coding 246960 chr1 181732466 1 CACNA1E protein_coding 246962 chr1 181776101 1 CACNA1E protein_coding 246963 chr1 181793668 1 CACNA1E protein_coding 246965 chr1 203305519 1 BTG2 protein_coding

    [78812 rows x 5 columns]

  • save_partial (bool=False) – Whether to save the individual analyses as pkl. Useful to run analyses in chunks or add new settings.

  • ctx_db_path (str = None) – Path to cistarget database containing rankings of motif scores

  • dem_db_path (str = None) – Path to dem database containing motif scores

  • run_without_promoters (bool = False) – Boolean specifying wether the analysis should also be run without including promoter regions.

  • biomart_host (str = ‘http://www.ensembl.org’) – url to biomart host, make sure this host matches your assembly

  • promoter_space (int = 500) – integer defining space around the TSS to consider as promoter

  • ctx_auc_threshold (float = 0.005) – The fraction of the ranked genome to take into account for the calculation of the Area Under the recovery Curve

  • ctx_nes_threshold (float = 3.0) – The Normalized Enrichment Score (NES) threshold to select enriched features.

  • ctx_rank_threshold (float = 0.05) – The total number of ranked genes to take into account when creating a recovery curve.

  • dem_log2fc_thr (float = 0.5) – Log2 Fold-change threshold to consider a motif enriched.

  • dem_motif_hit_thr (float = 3.0) – Minimul mean signal in the foreground to consider a motif enriched.

  • dem_max_bg_regions (int = 500) – Maximum number of regions to use as background. When set to None, all regions are used

  • annotation (List[str] = ['Direct_annot', 'Orthology_annot']) – Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’.

  • path_to_motif_annotations (str = None) – Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster).

  • annotation_version (str = 'v9') – Motif collection version.

  • n_cpu (int = 1) – Number of cores to use.

  • _temp_dir (str = None) – temp_dir to use for ray.

  • exclude_motifs (str = None) – Path to csv file containing motif to exclude from the analysis.

  • exclude_collection (List[str] = None) – List of strings identifying which motif collections to exclude from analysis.

SCENIC+ wrapper#

Tools for non-automated workflow#

Cistromes#

Enhancer-to-gene linking#

TF-to-gene linking#

eGRN building#

eRegulon Class#

GSEA based approach#

GSEA approach

Downstream analysis, export and plotting#

Marker genes and regions#

eRegulon enrichment in cells#

dimensionality reduction#

eRegulon specificity score (eRSS)#

Network#

export eRegulons to eGRN network and plot.

scenicplus.networks.concentrical_layout(G, dist_genes=1, dist_TF=0.1)[source]#

Generate custom concentrical layout

Parameters:
  • G (Graph) – A networkx graph

  • dist_genes (int, optional) – Distance from the regions to the genes

  • dist_TF – Distance from the TF to the regions

scenicplus.networks.create_nx_graph(nx_tables: Dict, use_edge_tables: List = ['TF2R', 'R2G'], color_edge_by: Dict = {}, transparency_edge_by: Dict = {}, width_edge_by: Dict = {}, color_node_by: Dict = {}, transparency_node_by: Dict = {}, size_node_by: Dict = {}, shape_node_by: Dict = {}, label_size_by: Dict = {}, label_color_by: Dict = {}, layout: str = 'concentrical_layout', lc_dist_genes: float = 0.8, lc_dist_TF: float = 0.1, scale_position_by: float = 250)[source]#

Format node/edge feature tables into a graph

Parameters:
  • nx_tables (Dict) – Dictionary with node/edge feature tables as produced by create_nx_tables

  • use_edge_tables (List, optional) – List of edge tables to use

  • color_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.

  • transparency_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.

  • width_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.

  • color_node_by (Dict, optional) – A dictionary containing for a given node key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.

  • transparency_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.

  • size_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.

  • shape_node_by (Dict, optional) – A dictionary containing for a given node key the variable and shapes. The variable name has to be provided (only categorical variables accepted). Alternatively, one fixed shape can use by using ‘fixed_shape’ as variable, alterntively adding an entry fixed_shape: size to the dictionary.

  • label_size_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_label_size’ as variable, alterntively adding an entry fixed_label_size: size to the dictionary.

  • label_color_by (Dict, optional) – A dictionary containing for a given node key the variable and a color dictionary. The variable name has to be provided (only categorical variables accepted), together with a color dictionary if desired. Alternatively, one fixed color can use by using ‘fixed_label_color’ as variable, alterntively adding an entry fixed_label_color: size to the dictionary.

  • layout (str, optional) – Layout to use. Options are: ‘concentrical_layout’ (SCENIC+ custom layout) or kamada_kawai_layout (from networkx).

  • lc_dist_genes (float, optional) – Distance between regions and genes. Only used if using concentrical_layout.

  • lc_dist_TF (float, optional) – Distance between TF and regions. Only used if using concentrical_layout.

  • scale_position_by (int, optional) – Value to scale positions for visualization in pyvis.

scenicplus.networks.create_nx_tables(scplus_obj: SCENICPLUS, eRegulon_metadata_key: str = 'eRegulon_metadata', subset_eRegulons: List = None, subset_regions: List = None, subset_genes: List = None, add_differential_gene_expression: bool = False, add_differential_region_accessibility: bool = False, differential_variable: List = [])[source]#

A function to format eRegulon data into tables for plotting eGRNs.

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with eRegulons

  • eRegulon_metadata_key (str, optional) – Key where the eRegulon metadata dataframe is stored

  • subset_eRegulons (list, optional) – List of eRegulons to subset

  • subset_regions (list, optional) – List of regions to subset

  • subset_genes (list, optional) – List of genes to subset

  • add_differential_gene_expression (bool, optional) – Whether to calculate differential gene expression logFC for a given variable

  • add_differential_region_accessibility (bool, optional) – Whether to calculate differential region accessibility logFC for a given variable

  • differential_variable (list, optional) – Variable to calculate differential gene expression or region accessibility.

scenicplus.networks.export_to_cytoscape(G, pos, out_file: str, pos_scaling_factor: int = 200, size_scaling_factor: int = 1)[source]#

A function to export to cytoscape :param G: A networkx graph. :type G: Graph :param Pos: generated by running create_nx_graph. :type Pos: coordinates of graph nodes :param out_file: Path to wich to save the export. :type out_file: str :param pos_scaling_factor: Factor by which to scale the graph node coordinates. :type pos_scaling_factor: int, optional :param size_scaling_factor: Factor by which tos cale the graph node sizes. :type size_scaling_factor: int, optional

scenicplus.networks.plot_networkx(G, pos)[source]#

A function to plot networks with networkx

Parameters:
  • G (Graph) – A networkx graph

  • pos (Dict) – Position values

Correlation plot#

Coverage plot#

dotplot#

Export#

Export to loom#