API#
SCENIC+ object#
Preprocessing#
SCENIC+ semi-automated workflow using wrapper functions#
pycistarget wrapper#
Wrapper functions to run motif enrichment analysis using pycistarget
After sets of regions have been defined (e.g. topics or DARs). The complete pycistarget workflo can be run using a single function.
this function will run cistarget based and DEM based motif enrichment analysis with or without promoter regions.
- scenicplus.wrappers.run_pycistarget.run_pycistarget(region_sets: Dict[str, PyRanges], species: str, save_path: str, custom_annot: DataFrame | None = None, save_partial: bool = False, ctx_db_path: str | None = None, dem_db_path: str | None = None, run_without_promoters: bool = False, biomart_host: str = 'http://www.ensembl.org', promoter_space: int = 500, ctx_auc_threshold: float = 0.005, ctx_nes_threshold: float = 3.0, ctx_rank_threshold: float = 0.05, dem_log2fc_thr: float = 0.5, dem_motif_hit_thr: float = 3.0, dem_max_bg_regions: int = 500, annotation: List[str] = ['Direct_annot', 'Orthology_annot'], motif_similarity_fdr: float = 1e-06, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', n_cpu: int = 1, _temp_dir: str | None = None, exclude_motifs: str | None = None, exclude_collection: List[str] | None = None, **kwargs)[source]#
Wrapper function for pycistarget
- Parameters:
region_sets (Mapping[str, pr.PyRanges]) – A dictionary of PyRanges containing region coordinates for the region sets to be analyzed.
species (str) – Species from which genomic coordinates come from, options are: homo_sapiens, mus_musculus, drosophila_melanogaster and gallus_gallus.
save_path (str) – Directory in which to save outputs.
custom_annot (pd.DataFrame) –
pandas DataFrame with genome annotation for custom species (i.e. for a species other than homo_sapiens, mus_musculus, drosophila_melanogaster or gallus_gallus). This DataFrame should (minimally) look like the example below, and only contains protein coding genes: >>> custom_annot
Chromosome Start Strand Gene Transcript_type
8053 chrY 22490397 1 PRY protein_coding 8153 chrY 12662368 1 USP9Y protein_coding 8155 chrY 12701231 1 USP9Y protein_coding 8158 chrY 12847045 1 USP9Y protein_coding 8328 chrY 22096007 -1 PRY2 protein_coding … … … … … … 246958 chr1 181483738 1 CACNA1E protein_coding 246960 chr1 181732466 1 CACNA1E protein_coding 246962 chr1 181776101 1 CACNA1E protein_coding 246963 chr1 181793668 1 CACNA1E protein_coding 246965 chr1 203305519 1 BTG2 protein_coding
[78812 rows x 5 columns]
save_partial (bool=False) – Whether to save the individual analyses as pkl. Useful to run analyses in chunks or add new settings.
ctx_db_path (str = None) – Path to cistarget database containing rankings of motif scores
dem_db_path (str = None) – Path to dem database containing motif scores
run_without_promoters (bool = False) – Boolean specifying wether the analysis should also be run without including promoter regions.
biomart_host (str = ‘http://www.ensembl.org’) – url to biomart host, make sure this host matches your assembly
promoter_space (int = 500) – integer defining space around the TSS to consider as promoter
ctx_auc_threshold (float = 0.005) – The fraction of the ranked genome to take into account for the calculation of the Area Under the recovery Curve
ctx_nes_threshold (float = 3.0) – The Normalized Enrichment Score (NES) threshold to select enriched features.
ctx_rank_threshold (float = 0.05) – The total number of ranked genes to take into account when creating a recovery curve.
dem_log2fc_thr (float = 0.5) – Log2 Fold-change threshold to consider a motif enriched.
dem_motif_hit_thr (float = 3.0) – Minimul mean signal in the foreground to consider a motif enriched.
dem_max_bg_regions (int = 500) – Maximum number of regions to use as background. When set to None, all regions are used
annotation (List[str] = ['Direct_annot', 'Orthology_annot']) – Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’.
path_to_motif_annotations (str = None) – Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster).
annotation_version (str = 'v9') – Motif collection version.
n_cpu (int = 1) – Number of cores to use.
_temp_dir (str = None) – temp_dir to use for ray.
exclude_motifs (str = None) – Path to csv file containing motif to exclude from the analysis.
exclude_collection (List[str] = None) – List of strings identifying which motif collections to exclude from analysis.
SCENIC+ wrapper#
Tools for non-automated workflow#
Cistromes#
Enhancer-to-gene linking#
TF-to-gene linking#
eGRN building#
eRegulon Class#
GSEA based approach#
Downstream analysis, export and plotting#
Marker genes and regions#
eRegulon enrichment in cells#
dimensionality reduction#
eRegulon specificity score (eRSS)#
Network#
export eRegulons to eGRN network and plot.
- scenicplus.networks.concentrical_layout(G, dist_genes=1, dist_TF=0.1)[source]#
Generate custom concentrical layout
- Parameters:
G (Graph) – A networkx graph
dist_genes (int, optional) – Distance from the regions to the genes
dist_TF – Distance from the TF to the regions
- scenicplus.networks.create_nx_graph(nx_tables: Dict, use_edge_tables: List = ['TF2R', 'R2G'], color_edge_by: Dict = {}, transparency_edge_by: Dict = {}, width_edge_by: Dict = {}, color_node_by: Dict = {}, transparency_node_by: Dict = {}, size_node_by: Dict = {}, shape_node_by: Dict = {}, label_size_by: Dict = {}, label_color_by: Dict = {}, layout: str = 'concentrical_layout', lc_dist_genes: float = 0.8, lc_dist_TF: float = 0.1, scale_position_by: float = 250)[source]#
Format node/edge feature tables into a graph
- Parameters:
nx_tables (Dict) – Dictionary with node/edge feature tables as produced by create_nx_tables
use_edge_tables (List, optional) – List of edge tables to use
color_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.
transparency_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.
width_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.
color_node_by (Dict, optional) – A dictionary containing for a given node key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.
transparency_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.
size_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.
shape_node_by (Dict, optional) – A dictionary containing for a given node key the variable and shapes. The variable name has to be provided (only categorical variables accepted). Alternatively, one fixed shape can use by using ‘fixed_shape’ as variable, alterntively adding an entry fixed_shape: size to the dictionary.
label_size_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_label_size’ as variable, alterntively adding an entry fixed_label_size: size to the dictionary.
label_color_by (Dict, optional) – A dictionary containing for a given node key the variable and a color dictionary. The variable name has to be provided (only categorical variables accepted), together with a color dictionary if desired. Alternatively, one fixed color can use by using ‘fixed_label_color’ as variable, alterntively adding an entry fixed_label_color: size to the dictionary.
layout (str, optional) – Layout to use. Options are: ‘concentrical_layout’ (SCENIC+ custom layout) or kamada_kawai_layout (from networkx).
lc_dist_genes (float, optional) – Distance between regions and genes. Only used if using concentrical_layout.
lc_dist_TF (float, optional) – Distance between TF and regions. Only used if using concentrical_layout.
scale_position_by (int, optional) – Value to scale positions for visualization in pyvis.
- scenicplus.networks.create_nx_tables(scplus_obj: SCENICPLUS, eRegulon_metadata_key: str = 'eRegulon_metadata', subset_eRegulons: List = None, subset_regions: List = None, subset_genes: List = None, add_differential_gene_expression: bool = False, add_differential_region_accessibility: bool = False, differential_variable: List = [])[source]#
A function to format eRegulon data into tables for plotting eGRNs.
- Parameters:
scplus_obj (SCENICPLUS) – A SCENICPLUS object with eRegulons
eRegulon_metadata_key (str, optional) – Key where the eRegulon metadata dataframe is stored
subset_eRegulons (list, optional) – List of eRegulons to subset
subset_regions (list, optional) – List of regions to subset
subset_genes (list, optional) – List of genes to subset
add_differential_gene_expression (bool, optional) – Whether to calculate differential gene expression logFC for a given variable
add_differential_region_accessibility (bool, optional) – Whether to calculate differential region accessibility logFC for a given variable
differential_variable (list, optional) – Variable to calculate differential gene expression or region accessibility.
- scenicplus.networks.export_to_cytoscape(G, pos, out_file: str, pos_scaling_factor: int = 200, size_scaling_factor: int = 1)[source]#
A function to export to cytoscape :param G: A networkx graph. :type G: Graph :param Pos: generated by running create_nx_graph. :type Pos: coordinates of graph nodes :param out_file: Path to wich to save the export. :type out_file: str :param pos_scaling_factor: Factor by which to scale the graph node coordinates. :type pos_scaling_factor: int, optional :param size_scaling_factor: Factor by which tos cale the graph node sizes. :type size_scaling_factor: int, optional