API#

SCENIC+ object#

Combine single-cell expression data and single-cell accessibility into a single SCENIC+ object.

This object will be used for downstream analysis, including: region-to-gene and TF-to-gene linking, enhancer-driven-GRN building and further downstream analysis.

This object can be generated from both single-cell multi-omics data (i.e. gene expression and chromatin accessibility from the same cell), and seperate single-cell chromatin accessbility data and single-cell gene expression data from the same or similar sample.

In the second case both data modalities should have a common cell metadata field with values linking both modalities (e.g. common celltype annotation).

class scenicplus.scenicplus_class.SCENICPLUS(X_ACC: spmatrix | ndarray | DataFrame, X_EXP: spmatrix | ndarray | DataFrame, metadata_regions: DataFrame, metadata_genes: DataFrame, metadata_cell: DataFrame, menr: Mapping[str, Mapping[str, Any]], dr_cell: Mapping[str, iterable] | None = None, dr_region: Mapping[str, iterable] | None = None, uns: Mapping[str, Any] = {})[source]#

An object containing: gene expression, chromatin accessbility and motif enrichment data.

SCENICPLUS stores the data matrices X_ACC (chromatin accessbility) and X_EXP (gene expression) together with region annotation metadata_regions, gene annotation metadata_genes, cell annotation metadata_cell and motif enrichment data menr.

Use create_SCENICPLUS_object to generate instances of this class.

Parameters:
X_ACC: sparse.spmatrix, np.ndarray, pd.DataFrame

A #regions x #cells data matrix

X_EXP: sparse.spmatrix, np.ndarray, pd.DataFrame

A #cells x #genes data matrix

metadata_regions: pd.DataFrame

A pandas.DataFrame containing region metadata annotation of length #regions

metadata_genes: pd.DataFrame

A pandas.DataFrame containing gene metadata annotation of length #genes

metadata_cell: pd.DataFrame

A pandas.DataFrame containing cell metadata annotation of lenght #cells

menr: dict

A Dict containing motif enrichment results for topics of differentially accessbile regions (DARs), generate by pycistarget. Should take the form {‘region_set_name1’: {region_set1: result, ‘region_set2’: result}, ‘region_set_name2’: {‘region_set1’: result, ‘region_set2’: result}, ‘topic’: {‘topic1’: result, ‘topic2’: result}} region set names, which aren’t topics, should be columns in the metadata_cell dataframe

dr_cell: dict

A Dict containing dimmensional reduction coordinates of cells.

dr_region: dict

A Dict containing dimmensional reduction coordinates of regions.

Attributes:
n_cells: int

Returns number of cells.

n_genes: int

Returns number of genes.

n_regions: int

Returns number of regions.

cell_names: List[str]

Returns cell names

gene_names: List[str]

Returns gene names

region_names: List[str]

Returns region names

Methods

add_cell_data(cell_data)

Add cell metadata

add_gene_data(gene_data)

Add gene metadata

add_region_data(region_data)

Add region metadata

subset([cells, regions, genes, return_copy])

Subset object

to_df(layer)

Generate a DataFrame.

add_cell_data(cell_data: DataFrame)[source]#

Add cell metadata

Parameters:
cell_data: pd.DataFrame

A DataFrame containing cell metdata indexed with cell barcodes.

add_gene_data(gene_data: DataFrame)[source]#

Add gene metadata

Parameters:
gene_data: pd.DataFrame

A DataFrame containing gene metadata indexed with gene names.

add_region_data(region_data: DataFrame)[source]#

Add region metadata

Parameters:
region_data: pd.DataFrame

A DataFrame containing region metadata indexed with region names.

subset(cells=None, regions=None, genes=None, return_copy=False)[source]#

Subset object

Parameters:
cells: List[str]

A list of cells to keep default: None

regions: List[str]

A list of regions to keep default: None

genes: List[str]

A list of genes to keep default:None

return_copy: bool

A boolean specifying wether to update the object (False) or return a copy (True)

to_df(layer) DataFrame[source]#

Generate a DataFrame.

The data matrix X_EXP or X_ACC is returned as DataFrame, with cell_names as index, and gene_names or region_names as columns.

Parameters:
layer: str

ACC to return accessibility data and EXP to return expression data.

scenicplus.scenicplus_class.create_SCENICPLUS_object(GEX_anndata: AnnData, cisTopic_obj: CistopicObject, menr: Mapping[str, Mapping[str, Any]], multi_ome_mode: bool = True, nr_metacells: int | Mapping[str, int] | None = None, nr_cells_per_metacells: int | Mapping[str, int] = 10, meta_cell_split: str = '_', key_to_group_by: str | None = None, imputed_acc_obj: CistopicImputedFeatures | None = None, imputed_acc_kwargs: Mapping[str, Any] = {'scale_factor': 1000000}, normalize_imputed_acc: bool = False, normalize_imputed_acc_kwargs: Mapping[str, Any] = {'scale_factor': 10000}, cell_metadata: DataFrame | None = None, region_metadata: DataFrame | None = None, gene_metadata: DataFrame | None = None, bc_transform_func: Callable | None = None, ACC_prefix: str = 'ACC_', GEX_prefix: str = 'GEX_') SCENICPLUS[source]#

Function to create instances of SCENICPLUS

Parameters:
GEX_anndata: sc.AnnData

An instance of AnnData containing gene expression data and metadata.

cisTopic_obj: CistopicObject

An instance of pycisTopic.cistopic_class.CistopicObject containing chromatin accessibility data and metadata.

menr: dict

A dict mapping annotations to motif enrichment results

multi_ome_mode: bool

A boolean specifying wether data is multi-ome (i.e. combined scATAC-seq and scRNA-seq from the same cell) or not default: True

nr_metacells: int

For non multi_ome_mode, use this number of meta cells to link scRNA-seq and scATAC-seq If this is a single integer the same number of metacells will be used for all annotations. This can also be a mapping between an annotation and the number of metacells per annotation. default: None

nr_cells_per_metacells: int

For non multi_ome_mode, use this number of cells per metacell to link scRNA-seq and scATAC-seq. If this is a single integer the same number of cells will be used for all annotations. This can also be a mapping between an annotation and the number of cells per metacell per annotation. default: 10

meta_cell_split: str

Character which is used as seperator in metacell names default: ‘_’

key_to_group_by: str

For non multi_ome_mode, use this cell metadata key to generate metacells from scRNA-seq and scATAC-seq. Key should be common in scRNA-seq and scATAC-seq side default: None

imputed_acc_obj: CistopicImputedFeatures

An instance of CistopicImputedFeatures containing imputed chromatin accessibility. default: None

imputed_acc_kwargs: dict

Dict with keyword arguments for imputed chromatin accessibility. default: {‘scale_factor’: 10**6}

normalize_imputed_acc: bool

A boolean specifying wether or not to normalize imputed chromatin accessbility. default: False

normalize_imputed_acc_kwargs: dict

Dict with keyword arguments for normalizing imputed accessibility. default: {‘scale_factor’: 10 ** 4}

cell_metadata: pd.DataFrame

An instance of DataFrame containing extra cell metadata default: None

region_metadata: pd.DataFrame

An instance of DataFrame containing extra region metadata default: None

gene_metadata: pd.DataFrame

An instance of DataFrame containing extra gene metadata default: None

bc_transform_func: func

A function used to transform gene expression barcode layout to chromatin accessbility layout. default: None

ACC_prefix: str

String prefix to add to cell metadata coming from cisTopic_obj

GEX_prefix: str

String prefix to add to cell metadata coming from GEX_anndata

Examples

>>> scplus_obj = create_SCENICPLUS_object(GEX_anndata = rna_anndata, cisTopic_obj = cistopic_obj, menr = motif_enrichment_dict)

Preprocessing#

Filter outlier genes and regions.

scenicplus.preprocessing.filtering.filter_genes(SCENICPLUS_obj: SCENICPLUS, min_pct: int = 0, max_pct: int = 100, return_copy=False) SCENICPLUS[source]#

Filter scenciplus object genes

Parameters:
SCENICPLUS_obj

An instance of :class: ~scenicplus.scenicplus_class.SCENICPLUS.

min_pct

only keep genes which are expressed in at least min_pct of cells. default: 0

max_pct

only keep genes which are expressed in maximal max_pct of cells. default: 100

return_copy

If set to True a new SCENICPLUS object will be generated containing filtered data. default: False

scenicplus.preprocessing.filtering.filter_regions(SCENICPLUS_obj: SCENICPLUS, min_pct: int = 0, max_pct: int = 100, return_copy=False) SCENICPLUS[source]#

Filter scenciplus object regions

Parameters:
SCENICPLUS_obj

An instance of :class: ~scenicplus.scenicplus_class.SCENICPLUS.

min_pct

only keep regions which are accessible in at least min_pct of cells. default: 0

max_pct

only keep regions which are accessible in maximal max_pct of cells. default: 100

return_copy

If set to True a new SCENICPLUS object will be generated containing filtered data. default: False

SCENIC+ semi-automated workflow using wrapper functions#

pycistarget wrapper#

Wrapper functions to run motif enrichment analysis using pycistarget

After sets of regions have been defined (e.g. topics or DARs). The complete pycistarget workflo can be run using a single function.

this function will run cistarget based and DEM based motif enrichment analysis with or without promoter regions.

scenicplus.wrappers.run_pycistarget.run_pycistarget(region_sets: Dict[str, PyRanges], species: str, save_path: str, custom_annot: DataFrame | None = None, save_partial: bool = False, ctx_db_path: str | None = None, dem_db_path: str | None = None, run_without_promoters: bool = False, biomart_host: str = 'http://www.ensembl.org', promoter_space: int = 500, ctx_auc_threshold: float = 0.005, ctx_nes_threshold: float = 3.0, ctx_rank_threshold: float = 0.05, dem_log2fc_thr: float = 0.5, dem_motif_hit_thr: float = 3.0, dem_max_bg_regions: int = 500, annotation: List[str] = ['Direct_annot', 'Orthology_annot'], motif_similarity_fdr: float = 1e-06, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', n_cpu: int = 1, _temp_dir: str | None = None, exclude_motifs: str | None = None, exclude_collection: List[str] | None = None, **kwargs)[source]#

Wrapper function for pycistarget

Parameters:
  • region_sets (Mapping[str, pr.PyRanges]) – A dictionary of PyRanges containing region coordinates for the region sets to be analyzed.

  • species (str) – Species from which genomic coordinates come from, options are: homo_sapiens, mus_musculus, drosophila_melanogaster and gallus_gallus.

  • save_path (str) – Directory in which to save outputs.

  • custom_annot (pd.DataFrame) –

    pandas DataFrame with genome annotation for custom species (i.e. for a species other than homo_sapiens, mus_musculus, drosophila_melanogaster or gallus_gallus). This DataFrame should (minimally) look like the example below, and only contains protein coding genes: >>> custom_annot

    Chromosome Start Strand Gene Transcript_type

    8053 chrY 22490397 1 PRY protein_coding 8153 chrY 12662368 1 USP9Y protein_coding 8155 chrY 12701231 1 USP9Y protein_coding 8158 chrY 12847045 1 USP9Y protein_coding 8328 chrY 22096007 -1 PRY2 protein_coding … … … … … … 246958 chr1 181483738 1 CACNA1E protein_coding 246960 chr1 181732466 1 CACNA1E protein_coding 246962 chr1 181776101 1 CACNA1E protein_coding 246963 chr1 181793668 1 CACNA1E protein_coding 246965 chr1 203305519 1 BTG2 protein_coding

    [78812 rows x 5 columns]

  • save_partial (bool=False) – Whether to save the individual analyses as pkl. Useful to run analyses in chunks or add new settings.

  • ctx_db_path (str = None) – Path to cistarget database containing rankings of motif scores

  • dem_db_path (str = None) – Path to dem database containing motif scores

  • run_without_promoters (bool = False) – Boolean specifying wether the analysis should also be run without including promoter regions.

  • biomart_host (str = ‘http://www.ensembl.org’) – url to biomart host, make sure this host matches your assembly

  • promoter_space (int = 500) – integer defining space around the TSS to consider as promoter

  • ctx_auc_threshold (float = 0.005) – The fraction of the ranked genome to take into account for the calculation of the Area Under the recovery Curve

  • ctx_nes_threshold (float = 3.0) – The Normalized Enrichment Score (NES) threshold to select enriched features.

  • ctx_rank_threshold (float = 0.05) – The total number of ranked genes to take into account when creating a recovery curve.

  • dem_log2fc_thr (float = 0.5) – Log2 Fold-change threshold to consider a motif enriched.

  • dem_motif_hit_thr (float = 3.0) – Minimul mean signal in the foreground to consider a motif enriched.

  • dem_max_bg_regions (int = 500) – Maximum number of regions to use as background. When set to None, all regions are used

  • annotation (List[str] = ['Direct_annot', 'Orthology_annot']) – Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’.

  • path_to_motif_annotations (str = None) – Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster).

  • annotation_version (str = 'v9') – Motif collection version.

  • n_cpu (int = 1) – Number of cores to use.

  • _temp_dir (str = None) – temp_dir to use for ray.

  • exclude_motifs (str = None) – Path to csv file containing motif to exclude from the analysis.

  • exclude_collection (List[str] = None) – List of strings identifying which motif collections to exclude from analysis.

SCENIC+ wrapper#

Wrapper functions to run SCENIC+ analysis

After the SCENIC+ object has been generated the complete SCENIC+ workflow can be run with a single function.

Following operation will be done:

  1. Defining a search space surounding each gene

  2. Region to gene linking & TF to gene linking

  3. eGRN building

  4. eGRN scoring (target gene and region based AUC and RSS)

  5. Calculating TF-cistrome correlations

  6. Dimensionality reductions

  7. export to UCSC tracks and to loom file

In case the process is killed prematurely, the function can be restared and the workflow will resume from the last step that was succesfully completed.

scenicplus.wrappers.run_scenicplus.run_scenicplus(scplus_obj: SCENICPLUS, variable: List[str], species: str, assembly: str, tf_file: str, save_path: str, biomart_host: str | None = 'http://www.ensembl.org', upstream: List | None = [1000, 150000], downstream: List | None = [1000, 150000], region_ranking: CisTopicImputedFeatures | None = None, gene_ranking: CisTopicImputedFeatures | None = None, simplified_eGRN: bool | None = False, calculate_TF_eGRN_correlation: bool | None = True, calculate_DEGs_DARs: bool | None = True, export_to_loom_file: bool | None = True, export_to_UCSC_file: bool | None = True, tree_structure: Sequence[str] = (), path_bedToBigBed: str | None = None, n_cpu: int | None = 1, _temp_dir: str | None = '', save_partial: bool | None = False, **kwargs)[source]#

Wrapper to run SCENIC+

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object.

  • variables (List[str]) – Variables to use for RSS, TF-eGRN correlation and markers.

  • species (str) – Species from which data comes from. Possible values: ‘hsapiens’, ‘mmusculus’, ‘dmelanogaster’

  • assembly (str) – Genome assembly to which the data was mapped. Possible values: ‘hg38’

  • tf_file (str) – Path to file containing genes that are TFs

  • save_path (str) – Folder in which results will be saved

  • biomart_host (str, optional) – Path to biomart host. Make sure that the host matches your genome assembly

  • upstream (str, optional) – Upstream space to use for region to gene relationships

  • downstream (str, optional) – Upstream space to use for region to gene relationships

  • region_ranking (class::CisTopicImputedFeatures, optional) – Precomputed region ranking

  • gene_ranking (class::CisTopicImputedFeatures, optional) – Precomputed gene ranking

  • simplified_eGRN (bool, optional) – Whether to output simplified eGRNs (only TF-G sign rather than TF-G_R-G)

  • calculate_TF_eGRN_correlation (bool, optional) – Whether to calculate the TF-eGRN correlation based on the variables

  • calculate_DEGs_DARs (bool, optional) – Whether to calculate DARs/DEGs based on the variables

  • export_to_loom_file (bool, optional) – Whether to export data to loom files (gene based/region based)

  • export_to_UCSC_file (bool, optional) – Whether to export region-to-gene links and eregulons to bed files

  • tree_structure (sequence, optional) – Tree structure for loom files

  • path_bedToBigBed (str, optional) – Path to convert bed files to big bed when exporting to UCSC (required if files are meant to be used in a hub)

  • n_cpu (int, optional) – Number of cores to use

  • _temp_dir (str, optional) – Temporary directory for ray

Tools for non-automated workflow#

Cistromes#

Merging, scoring and assessing TF correlations of cistromes

scenicplus.cistromes.TF_cistrome_correlation(scplus_obj: SCENICPLUS, variable: str | None = None, use_pseudobulk: bool = True, auc_key: str = 'Cistromes_AUC', signature_key: str = 'Unfiltered', out_key: str = 'Unfiltered', subset: List[str] | None = None)[source]#

Get correlation between gene expression and cistrome accessibility

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with pseudobulk matrices (scplus_obj.uns[‘Pseudobulk’])

  • variable (str, optional) – Variable used to create the pseudobulks. Must be a key in scplus_obj.uns[‘Pseudobulk’]. Only required if use_pseudobulk is False.

  • use_pseudobulk (bool, optional) – Whether to use pseudobulk matrix or actual values.

  • cistromes_key (str, optional) – Key to retrieve the pseudobulk matrices. Cistrome accessibility will be retrieved from scplus_obj.uns[‘Pseudobulk’][variable][‘Cistromes_AUC’][cistromes_key] and gene expression from scplus_obj.uns[‘Pseudobulk’][variable][‘Expression’].

  • out_key (str, optional) – Ouput key. Correlations will be stored at scplus_obj.uns[‘TF_cistrome_correlation’][variable][out_key].

  • subset (List, optional) – Subset of cells to be used to calculate correlations. Default: None (All)

scenicplus.cistromes.cistrome_correlation(scplus_obj: SCENICPLUS, variable: str | None = None, use_pseudobulk: bool = True, auc_key: str = 'Cistromes_AUC', signature_key1: str = 'Gene_based', signature_key2: str = 'Region_based', out_key: str = 'Unfiltered', subset: List[str] | None = None)[source]#

Get correlation between gene-based and region-based AUC

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with pseudobulk matrices (scplus_obj.uns[‘Pseudobulk’])

  • variable (str, optional) – Variable used to create the pseudobulks. Must be a key in scplus_obj.uns[‘Pseudobulk’]. Only required if use_pseudobulk is False.

  • use_pseudobulk (bool, optional) – Whether to use pseudobulk matrix or actual values.

  • auc_key (str, optional) – Key to retrieve the pseudobulk matrices. Cistrome AUCs will be retrieved from scplus_obj.uns[‘Pseudobulk’][variable][‘Cistromes_AUC’][cistromes_key][signature_key1] and gene expression from scplus_obj.uns[‘Pseudobulk’][variable][‘Cistromes_AUC’][cistromes_key][signature_key2].

  • out_key (str, optional) – Output key. Correlations will be stored at scplus_obj.uns[‘cistrome_correlation’][variable][out_key].

  • subset (List, optional) – Subset of cells to be used to calculate correlations. Default: None (All)

scenicplus.cistromes.generate_pseudobulks(scplus_obj: SCENICPLUS, variable: str, normalize_expression: bool = True, auc_key: str = 'Cistromes_AUC', signature_key: str = 'Unfiltered', nr_cells: int = 10, nr_pseudobulks: int = 100, seed: int = 555)[source]#

Generate pseudobulks based on the cistrome AUC matrix and gene expression

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with scored cistromes (scplus_obj.uns[‘Cistromes_AUC’][cistromes_key])

  • variable (str, optional) – Variable to create pseudobulks by. It must ve a column in scplus_obj.metadata_cell

  • cistromes_key (str, optional) – Key to store where cistromes AUC values are stored. Cistromes AUC will retrieved from scplus_obj.uns[‘Cistromes_AUC’][cistromes_key] and the pseudobulk matrix will be stored at scplus_obj.uns[‘Pseudobulk’][‘Cistromes_AUC’][variable][cistromes_key] and scplus_obj.uns[‘Pseudobulk’][variable][‘Expression’]

  • nr_cells (int, optional) – Number of cells to include per pseudobulk.

  • nr_pseudobulks (int, optional) – Number of pseudobulks to generate per class

  • seed (int) – Seed to ensure that pseudobulk are reproducible.

scenicplus.cistromes.merge_cistromes(scplus_obj: SCENICPLUS, cistromes_key: str = 'Unfiltered', subset: PyRanges | None = None)[source]#

Generate cistromes from motif enrichment tables

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with motif enrichment results from pycistarget (scplus_obj.menr). Several analyses can be included in the slot (topics/DARs/other; and different methods [Homer/DEM/cistarget]).

  • cistromes_key (str, optional) – Key to store cistromes. Cistromes will stored at scplus_obj.uns[‘Cistromes’][siganture_key]

  • subset (list) – A PyRanges containing a set of regions that regions in cistromes must overlap. This is useful when aiming for cell type specific cistromes for example (e.g. providing the cell type’s MACS peaks)

scenicplus.cistromes.prune_plot(scplus_obj: SCENICPLUS, name: str, pseudobulk_variable: str | None = None, auc_key: str = 'Cistromes_AUC', signature_key: str = 'Unfiltered', use_pseudobulk: bool = True, show_dot_plot: bool = True, show_line_plot: bool = False, color_dict=None, subset=None, seed=555, ax: axes | None = None, **kwargs)[source]#

Plot cistrome accessibility versus TF expression

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object.

  • cistrome_name (str, optional) – Cistrome to plot. The TF name (or followed by extended) is enough to quert.

  • pseudobulk_variable (str, optional) – Key to retrieve the pseudobulk matrices. Cistrome accessibility will be retrieved from scplus_obj.uns[‘Pseudobulk’][pseudobulk_variable][‘Cistromes_AUC’][cistromes_key] and gene expression from scplus_obj.uns[‘Pseudobulk’][pseudobulk_variable][‘Expression’].

  • cistromes_key (str, optional) – Key to retrieve the pseudobulk matrices. Cistrome accessibility will be retrieved from scplus_obj.uns[‘Pseudobulk’][pseudobulk_variable][‘Cistromes_AUC’][cistromes_key] and gene expression from scplus_obj.uns[‘Pseudobulk’][pseudobulk_variable][‘Expression’].

  • use_pseudobulk (bool, optional) – Whether to use pseudobulk matrix or actual values.

  • show_dot_plot (bool, optional) – Whether to show dots in plot

  • show_line_plot (bool, optional) – Whether to show line fitting to plot

  • color_dict (Dict, optional) – Color dictionary to specify colors

  • subset (List, optional) – Subset of pseudobulks/cells to use

  • seed (int) – Seed to ensure that pseudobulk are reproducible.

  • ax (plt.axes, optional) – matplotlib axes to plot to.

  • **kwargs – Parameters for seaborn plotting.

scenicplus.cistromes.score_cistromes(scplus_obj: SCENICPLUS, ranking: CistopicImputedFeatures, cistromes_key: str = 'Unfiltered', enrichment_type: str = 'region', auc_threshold: float = 0.05, normalize: bool = False, n_cpu: int = 1)[source]#

Get enrichment of a region signature in cells using AUCell (Van de Sande et al., 2020)

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with motif enrichment results from pycistarget (scplus_obj.menr).

  • rankings (CistopicImputedFeatures) – A CistopicImputedFeatures object with ranking values

  • cistromes_key (str, optional) – Key to store where cistromes are stored. Cistromes will retrieved from scplus_obj.uns[‘Cistromes’][siganture_key] and AUC values will be stored in scplus_obj.uns[‘Cistromes_AUC’][cistromes_key].

  • enrichment_type (str) – Whether features are genes or regions

  • auc_threshold (float) – The fraction of the ranked genome to take into account for the calculation of the Area Under the recovery Curve. Default: 0.05

  • normalize (bool) – Normalize the AUC values to a maximum of 1.0 per regulon. Default: False

  • num_workers (int) – The number of cores to use. Default: 1

References

Van de Sande, B., Flerin, C., Davie, K., De Waegeneer, M., Hulselmans, G., Aibar, S., … & Aerts, S. (2020). A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nature Protocols, 15(7), 2247-2276.

Enhancer-to-gene linking#

Link enhancers to genes based on co-occurence of chromatin accessbility of the enhancer and gene expression.

Both linear methods (spearman or pearson correlation) and non-linear methods (random forrest or gradient boosting) are used to link enhancers to genes.

The correlation methods are used to seperate regions which are infered to have a positive influence on gene expression (i.e. positive correlation) and regions which are infered to have a negative influence on gene expression (i.e. negative correlation).

scenicplus.enhancer_to_gene.calculate_regions_to_genes_relationships(SCENICPLUS_obj: SCENICPLUS, search_space_key: str = 'search_space', mask_expr_dropout: bool = False, genes: List[str] | None = None, importance_scoring_method: str = 'GBM', importance_scoring_kwargs: dict = {'learning_rate': 0.01, 'max_features': 0.1, 'n_estimators': 500}, correlation_scoring_method: str = 'SR', ray_n_cpu: int | None = None, add_distance: bool = True, key_added: str = 'region_to_gene', inplace: bool = True, **kwargs)[source]#

Calculates region to gene relationships using non-linear regression methods and correlation

Parameters:
SCENICPLUS_obj: SCENICPLUS

instance of SCENICPLUS class containing expression data and chromatin accessbility data

search_space_key: str = ‘search_space’

a key in SCENICPLUS_obj.uns.keys pointing to a dataframe containing the search space surounding each gene.

mask_expr_dropout: bool = False

Wether or not to exclude cells which have zero counts for a gene from the calculations

genes: List[str] None

list of genes for which to calculate region gene scores. Default is None, i.e. all genes

importance_scoring_method: str = GBM

method used to score region to gene importances. Available regression analysis are: ‘RF’ (Random Forrest regression), ‘ET’ (Extra Trees regression), ‘GBM’ (Gradient Boostin regression).

importance_scoring_kwargs: dict = GBM_KWARGS

arguments to pass to the importance scoring function

correlation_scoring_method: str = SR

method used to calculate region to gene correlations. Available correlation analysis are: ‘PR’ (pearson correlation), ‘SR’ (spearman correlation).

ray_n_cpu: int = None

number of cores to use for ray multi-processing. Does not use ray when set to None

add_distance: bool = True

Wether or not to return region to gene distances

key_added: str = region_to_gene

Key in SCENICPLUS_obj.uns under which to store region to gene links, only stores when inplace = True

inplace: bool = True

Wether or not store the region to gene links in the SCENICPLUS_obj, if False a pd.DataFrame will be returned.

scenicplus.enhancer_to_gene.export_to_UCSC_interact(SCENICPLUS_obj: SCENICPLUS, species: str, outfile: str, region_to_gene_key: str = ' region_to_gene', pbm_host: str = 'http://www.ensembl.org', bigbed_outfile: str | None = None, path_bedToBigBed: str | None = None, assembly: str | None = None, ucsc_track_name: str = 'region_to_gene', ucsc_description: str = 'interaction file for region to gene', cmap_neg: str = 'Reds', cmap_pos: str = 'Greens', key_for_color: str = 'importance', vmin: int = 0, vmax: int = 1, scale_by_gene: bool = True, subset_for_eRegulons_regions: bool = True, eRegulons_key: str = 'eRegulons') DataFrame[source]#

Exports interaction dataframe to UCSC interaction file and (optionally) UCSC bigInteract file.

Parameters:
SCENICPLUS_obj: SCENICPLUS

An instance of class scenicplus_class.SCENICPLUS containing region to gene links in .uns.

species: str

Species corresponding to your datassets (e.g. hsapiens)

outfile: str

Path to output file

region_to_gene_key: str =’ region_to_gene’

Key in SCENICPLUS_obj.uns.keys() under which to find region to gene links.

pbm_host:str = ‘http://www.ensembl.org’

Url of biomart host relevant for your assembly.

bigbed_outfile:str = None

Path to which to write the bigbed output.

path_bedToBigBed: str= None

Path to bedToBigBed program, used to convert bed file to bigbed format. Can be downloaded from http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed

assembly: str = None

String identifying the assembly of your dataset (e.g. hg39).

ucsc_track_name: str = ‘region_to_gene’

Name of the exported UCSC track

ucsc_description: str = ‘interaction file for region to gene’

Description of the exported UCSC track

cmap_neg: str = ‘Reds’

Matplotlib colormap used to color negative region to gene links.

cmap_pos: str = ‘Greens’

Matplotlib colormap used to color positive region to gene links.

key_for_color: str = ‘importance’

Key pointing to column in region to gene links used to map cmap colors to.

vmin: int = 0

vmin of region to gene link colors.

vmax: int = 1

vmax of region to gene link colors.

scale_by_gene: bool = True

Boolean specifying wether to scale importance scores of regions linking to the same gene from 0 to 1

subset_for_eRegulons_regions: bool = True

Boolean specifying wether or not to subset region to gene links for regions and genes in eRegulons.

eRegulons_key: str = ‘eRegulons’

key in SCENICPLUS_obj.uns.keys() under which to find eRegulons.

Returns:
pd.DataFrame with region to gene links formatted in the UCSC interaction format.
scenicplus.enhancer_to_gene.get_search_space(SCENICPLUS_obj: SCENICPLUS, species=None, assembly=None, pr_annot=None, pr_chromsizes=None, predefined_boundaries=None, use_gene_boundaries=False, upstream=[1000, 150000], downstream=[1000, 150000], extend_tss=[10, 10], remove_promoters=False, biomart_host='http://www.ensembl.org', inplace=True, key_added='search_space', implode_entries=True)[source]#

Get search space surrounding genes to calculate enhancer to gene links

Parameters:
SCENICPLUS_obj: SCENICPLUS

a pr.SCENICPLUS.

species: string, optional

Name of the species (e.g. hsapiens) on whose reference genome the search space should be calculated. This will be used to retrieve gene annotations from biomart. Annotations can also be manually provided using the parameter [pr_annot]. Default: None

assembly: string, optional

Name of the assembly (e.g. hg38) of the reference genome on which the search space should be calculated. This will be used to retrieve chromosome sizes from the UCSC genome browser. Chromosome sizes can also be manually provided using the parameter [pr_chromsizes]. Default: None

pr_annot: pr.PyRanges, optional

A pr.PyRanges containing gene annotation, including Chromosome, Start, End, Strand (as ‘+’ and ‘-‘), Gene name and Transcription Start Site. Default: None

pr_chromsizes: pr.PyRanges, optional

A pr.PyRanges containing size of each chromosome, containing ‘Chromosome’, ‘Start’ and ‘End’ columns. Default: None

predefined_boundaries: pr.PyRanges, optional

A pr.PyRanges containing predefined genomic domain boundaries (e.g. TAD boundaries) to use as boundaries. If given, use_gene_boundaries will be ignored.

use_gene_boundaries: bool, optional

Whether to use the whole search space or stop when encountering another gene. Default: False

upstream: List, optional

Search space upstream. The minimum (first position) means that even if there is a gene right next to it these bp will be taken. The second position indicates the maximum distance. Default: [1000, 100000]

downstream: List, optional

Search space downstream. The minimum (first position) means that even if there is a gene right next to it these bp will be taken. The second position indicates the maximum distance. Default: [1000, 100000]

extend_tss: list, optional

Space around the TSS consider as promoter. Default: [10,10]

remove_promoters: bool, optional

Whether to remove promoters from the search space or not. Default: False

biomart_host: str, optional

Biomart host to use to download TSS annotation. Please make sure this host matches the expression data (i.e. matching gene names) otherwise a lot of genes are potentially lost.

inplace: bool, optional

If set to True, store results into scplus_obj, otherwise return results.

key_added: str, optional

Key under which to add the results under scplus.uns.

implode_entries: bool, optional

When a gene has multiple start/end sites it has multiple distances and gene width. If this parameter is set to True these multiple entries per region and gene will be put in a list, generating a single entry. If this parameter is set to False these multiple entries will be kept.

Return
——
pd.DataFrame

A data frame containing regions in the search space for each gene

TF-to-gene linking#

Link transcription factors (TFs) to genes based on co-expression of TF and target genes.

Both linear methods (spearman or pearson correlation) and non-linear methods (random forrest or gradient boosting) are used to link TF to genes.

The correlation methods are used to seperate TFs which are infered to have a positive influence on gene expression (i.e. positive correlation) and TFs which are infered to have a negative influence on gene expression (i.e. negative correlation).

scenicplus.TF_to_gene.calculate_TFs_to_genes_relationships(scplus_obj: SCENICPLUS, tf_file: str, method: str = 'GBM', ray_n_cpu: int = 1, key: str = 'TF2G_adj', **kwargs)[source]#

A function to calculate TF to gene relationships using arboreto and correlation

Parameters:
scplus_obj

An instance of SCENICPLUS

tf_file

Path to a file specifying with genes are TFs

method

Whether to use Gradient Boosting Machines (GBM) or random forest (RF)

ray_n_cpu

Number of cpus to use

key

String specifying where in the .uns slot to store the adjacencies matrix in :param:`SCENICPLUS_obj` default: “TF2G_adj”

**kwargs

Parameters to pass to ray.init

scenicplus.TF_to_gene.load_TF2G_adj_from_file(SCENICPLUS_obj: SCENICPLUS, f_adj: str, inplace=True, key='TF2G_adj', rho_threshold=0.03)[source]#

Function to load TF2G adjacencies from file

Parameters:
SCENICPLUS_obj

An instance of SCENICPLUS

f_adj

File path to TF2G adjacencies matrix

inplace

Boolean specifying wether or not to store adjacencies matrix in SCENICPLUS_obj under slot .uns[key]. Default: True

key_added

String specifying where in the .uns slot to store the adjacencies matrix in SCENICPLUS_obj Default: “TF2G_adj”

rho_threshold

A floating point number specifying from which absolute value to consider a correlation coefficient positive or negative. Default: 0.03

eGRN building#

eRegulon Class#

eRegulon class stores the transcription factor together with its target regions and genes.

scenicplus.grn_builder.modules.create_emodules(SCENICPLUS_obj: SCENICPLUS, region_to_gene_key: str = 'region_to_gene', cistromes_key: str = 'Unfiltered', order_regions_to_genes_by: str = 'importance', quantiles: tuple = (0.85, 0.9), top_n_regionTogenes_per_gene: tuple = (5, 10, 15), top_n_regionTogenes_per_region: tuple = (), binarize_using_basc: bool = False, min_regions_per_gene: int = 0, rho_dichotomize: bool = True, keep_only_activating: bool = False, rho_threshold: float = 0.03, keep_extended_motif_annot: bool = False, disable_tqdm=False, ray_n_cpu=None, **kwargs) Tuple[List[str], List[eRegulon]][source]#

A function to create e_modules of :class: ~scenicplus.grn_builder.modules.eRegulon. This function will binarize region-to-gene links using various parameters and couples these links to a TF based on infromation in the motif enrichment (menr) field of the SCENICPLUS_obj.

Parameters:
SCENICPLUS_obj

An instance of :class: ~scenicplus.scenicplus_class.SCENICPLUS, containing motif enrichment data under the slot .menr.

region_to_gene_key

A key specifying under which to find region-to-gene links in the .uns slot of SCENICPLUS_obj. Default: “region_to_gene”

cistromes_key

A key specifying which cistromes to use in .uns[‘Cistromes’] slot of SCENICPLUS_obj

quantiles

A tuple specifying the quantiles used to binarize region-to-gene links Default: (0.85, 0.90)

top_n_regionTogenes_per_gene

A tuple specifying the top n region-to-gene links to take PER GENE in order to binarize region-to-gene links. Default: (5, 10, 15)

top_n_regionTogenes_per_region

A tuple specifying the top n region-to-gene links to take PER REGION in order to binarize region-to-gene links. Default: ()

binarize_using_basc:

A boolean specifying wether or not to binarize region-to-gene links using BASC. Hopfensitz M, et al.

min_regions_per_gene:

An integer specifying a lower limit on regions per gene (after binarization) to consider for further analysis. Default: 0

rho_dichotomize:

A boolean specifying wether or not to split region-to-gene links based on postive/negative correlation coefficients. default: True

keep_only_activating:

A boolean specifying wether or not to only retain region-to-gene links with a positive correlation coefficient. default: False

rho_threshold:

A floating point number specifying from which absolute value to consider a correlation coefficient positive or negative. default: 0.03

keep_extended_motif_annot

A boolean specifying wether or not keep extended motif annotations for further analysis. default: False

ray_n_cpu

An integer specifying the number of parallel cores to use to binarize region to gene links default: None (i.e. run using single core)

**kwargs

Extra keyword arguments passed to ray.init function

Returns:
A set of relevant TFs, A list of :class: ~scenicplus.grn_builder.modules.eRegulon.

References

Hopfensitz M, et al. Multiscale binarization of gene expression data for reconstructing Boolean networks. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):487-98.

class scenicplus.grn_builder.modules.eRegulon(transcription_factor: str, cistrome_name: str, is_extended: bool, regions2genes: List[namedtuple], context=frozenset({}), in_leading_edge: List[bool] | None = None, gsea_enrichment_score: float | None = None, gsea_pval: float | None = None, gsea_adj_pval: float | None = None)[source]#

An eRegulon is a gene signature that defines the target regions and genes of a Transcription Factor (TF).

Parameters:
transcription_factor

A string specifying the transcription factor of the eRegulon

cistrome_name

A string specifying the cistrome name

is_extended

A boolean specifying wether the cistromes comes from an extended annotation or not

regions2genes

A list of named tuples containing information on region-to-gene links (region, gene, importance score, correlation coefficient)

context

The context in which this eRegulon was created (this can be different threshold, activating / repressing, …). default: frozenset()

in_leading_edge

A list specifying which genes are in the leading edge of a gsea analysis. default: None

gsea_enrichment_score

A float containing the enrichment score of a gsea analysis. default: None

gsea_pval

A float containing the p value of a gsea analysis. default: None

gsea_adj_pval

A float containing an adjusted p value of a gsea analysis. default: None

Attributes:
target_genes

Return target genes of this eRegulon.

target_regions

Return target regions of this eRegulon.

n_target_genes

Return number of target genes.

n_target_regions

Return number of target regions.

Methods

subset_leading_edge([inplace])

Subset eReglon on leading edge.

property n_target_genes#

Return number of target genes.

property n_target_regions#

Return number of target regions.

subset_leading_edge(inplace=True)[source]#

Subset eReglon on leading edge.

Parameters:

inplace – if set to True update eRegulon else return new eRegulon after subset.

property target_genes#

Return target genes of this eRegulon.

property target_regions#

Return target regions of this eRegulon.

scenicplus.grn_builder.modules.merge_emodules(SCENICPLUS_obj: SCENICPLUS | None = None, e_modules: list | None = None, e_modules_key: str = 'eRegulons', rho_dichotomize: bool = True, key_to_add: str = 'eRegulons', inplace: bool = True)[source]#

Function to merge list of eRegulon

Parameters:
SCENICPLUS_obj

An instance of :class: ~scenicplus.scenicplus_class.SCENICPLUS, containing eRegulons in slot .uns. default: None

e_modules

A list of eRegulon. default: None

e_modules_key

A string key specifying where to find a list of eRegulon under .uns slot of`SCENICPLUS_obj`. default: “eRegulons”

rho_dichotomize

A boolean specifying wether or not to split region-to-gene links based on postive/negative correlation coefficients. default: True

key_to_add

A string key specifying where to store the list of merged eRegulon in the .uns slot of SCENICPLUS_obj.

inplace

A boolean if set to True update the SCENICPLUS_obj otherwise return list of merged eRegulon

GSEA based approach#

GSEA approach

Generate enhancer drive GRNs (eGRS) using the GSEA approach.

Using this approach we will test if the gene set obtained from region-to-gene links, where the regions have a high score for a motif of a certain TF (region indicated in black in the diagram: r8, r11, r18, r20, r22, r24), are enriched in the top of the ranking based on the TF-to-gene links of the same TF (bottom right panel).

Only genes, from the set, and the regions linked to these genes in the top of the ranking (i.e. leading edge) will be kept.

This aproach is done seperatly for positive and negative TF and region to gene links.

Generating following four combinations:

Possible eRegulons#

TF-to-gene relationship

region-to-gene relationship

biological role

positive (+)

positive (+)

TF opens chromatin and activates gene expression

positive (+)

negative (-)

When the TF is expressed the target gene is also expressed but the regions linked to the gene are closed.

negative (-)

positive (+)

When the TF is expressed the target gene is not expressed. When the target gene is expressed, regions linked to this gene are open. TF could be a chromatin closing repressor.

negative (-)

negative (-)

When the TF is expressed the target gene is not expressed. When the target gene is expressed, regions linked to this gene are closed.

Left panel indicates the TF-to-gene links (blue) and region-to-gene links (yellow). Witdh of arrows correspond the strength of the connections based on non-linear regression.

Top right panel indicates regions with a high score for a motif of a TF

Bottom right panel shows GSEA analysis, with on the left genes ranked by TF-to-gene connection strength and on the right the gene-set obtained from region-to-gene links. In the diagram: g2, g6, and g10 are located in the top of the TF-to-gene ranking (i.e. leading edge), only these genes and the regions linked to these genes: r20, r18, r11, r8, r24 and r22 will be kept.

scenicplus.grn_builder.gsea_approach.build_grn(SCENICPLUS_obj: SCENICPLUS, adj_key='TF2G_adj', cistromes_key='Unfiltered', region_to_gene_key='region_to_gene', order_regions_to_genes_by='importance', order_TFs_to_genes_by='importance', gsea_n_perm=1000, quantiles=(0.85, 0.9), top_n_regionTogenes_per_gene=(5, 10, 15), top_n_regionTogenes_per_region=(), binarize_using_basc=False, min_regions_per_gene=0, rho_dichotomize_tf2g=True, rho_dichotomize_r2g=True, rho_dichotomize_eregulon=True, keep_only_activating=False, rho_threshold=0.03, NES_thr=0, adj_pval_thr=1, min_target_genes=5, inplace=True, key_added='eRegulons', ray_n_cpu=None, merge_eRegulons=True, keep_extended_motif_annot=False, disable_tqdm=False, **kwargs)[source]#

Build GRN using GSEA approach

Parameters:
  • SCENICPLUS_obj – An instance of :class: ~scenicplus.scenicplus_class.SCENICPLUS

  • adj_key – Key under which to find TF2G adjacencies default: “TF2G_adj”

  • region_to_gene_key – Key under which to find R2G adjacnecies default: “region_to_gene”

  • gsea_n_perm – Int specifying number of gsea permutations to run for p value calculation default: 1000

  • quantiles – A tuple specifying the quantiles used to binarize region-to-gene links Default: (0.85, 0.90)

  • top_n_regionTogenes_per_gene – A tuple specifying the top n region-to-gene links to take PER GENE in order to binarize region-to-gene links. Default: (5, 10, 15)

  • top_n_regionTogenes_per_region – A tuple specifying the top n region-to-gene links to take PER REGION in order to binarize region-to-gene links. Default: ()

  • binarize_using_basc – A boolean specifying wether or not to binarize region-to-gene links using BASC. Hopfensitz M, et al.

  • min_regions_per_gene – An integer specifying a lower limit on regions per gene (after binarization) to consider for further analysis. Default: 0

  • rho_dichotomize – A boolean specifying wether or not to split region-to-gene links based on postive/negative correlation coefficients. default: True

  • keep_only_activating – A boolean specifying wether or not to only retain region-to-gene links with a positive correlation coefficient. default: False

  • rho_threshold – A floating point number specifying from which absolute value to consider a correlation coefficient positive or negative. default: 0.03

  • NES_thr – Float specifying threshold on gsea NES value defaut: 0

  • adj_pval_thr – Float specifying threshold on gsea adjusted p value default: 1

  • min_target_genes – Int specifying minumum number of target genes in leading edge default: 5

  • inplace – Boolean specifying wether to store results in SCENICPLUS_obj default: True

  • key_added – Key specifying in under which key to store result in SCENICPLUS_obj.uns default: “eRegulons”

  • ray_n_cpu – Int specifying number of cores to use default: None

  • merge_eRegulons – Boolean specifying wether to merge eRegulons form the same TF but different thresholding approaches default: True

  • keep_extended_motif_annot – A boolean specifying wether or not keep extended motif annotations for further analysis. default: False

  • **kwargs – Additional keyword arguments passed to ray.init

References

Hopfensitz M, et al. Multiscale binarization of gene expression data for reconstructing Boolean networks. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):487-98.

Downstream analysis, export and plotting#

Marker genes and regions#

Calculate differentially expressed genes (DEGs) and differentially accessible regions (DARs).

scenicplus.diff_features.get_differential_features(scplus_obj: SCENICPLUS, variable, use_hvg: bool | None = True, contrast_type: List | None = ['DARs', 'DEGs'], adjpval_thr: float | None = 0.05, log2fc_thr: float | None = 0.5849625007211562, min_cells: int | None = 2)[source]#

Get DARs of DEGs given reference variable.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object.

  • variable (str) – Variable to compute DARs/DEGs by (has to be included in scplus_obj.metadata_cell)

  • use_hvg (bool, optional) – Whether to use only highly variable genes/regions

  • contrast_type (list, optional) – Wheter to compute DARs and/or DEGs per variable

  • adjpval_thr (float, optional) – P-value threshold

  • log2fc_thr – Log2FC threshold

eRegulon enrichment in cells#

Score eRegulon target genes and regions in cells using AUC algorithm.

scenicplus.eregulon_enrichment.binarize_AUC(scplus_obj: SCENICPLUS, auc_key: str | None = 'eRegulon_AUC', out_key: str | None = 'eRegulon_AUC_thresholds', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], n_cpu: int | None = 1)[source]#

Binarize eRegulons using AUCell

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object with eRegulons AUC.

auc_key: str, optional

Key where the AUC values are stored

out_key: str, optional

Key where the AUCell thresholds will be stored (in scplus_obj.uns)

signature_keys: List, optional

Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

n_cpu: int

The number of cores to use. Default: 1

scenicplus.eregulon_enrichment.get_eRegulons_as_signatures(scplus_obj: SCENICPLUS, eRegulon_metadata_key: str = 'eRegulon_metadata', key_added: str = 'eRegulon_signatures')[source]#

Format eRegulons for scoring

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object with eRegulons metadata computed.

eRegulon_metadata_key: str, optional

Key where the eRegulon metadata is stored (in scplus_obj.uns)

key_added: str, optional

Key where formated signatures will be stored (in scplus_obj.uns)

scenicplus.eregulon_enrichment.make_rankings(scplus_obj: SCENICPLUS, target: str = 'region', seed: int = 123)[source]#

A function to generate rankings per cell based on the imputed accessibility scores per region or the gene expression per cell.

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with motif enrichment results from pycistarget (scplus_obj.menr).

  • target (str, optional) – Whether rankings should be done based on gene expression or region accessibilty. Default: ‘region’

  • seed (int, optional) – Random seed to ensure reproducibility of the rankings when there are ties

scenicplus.eregulon_enrichment.score_eRegulons(scplus_obj: SCENICPLUS, ranking: CistopicImputedFeatures, inplace: bool = True, eRegulon_signatures_key: str = 'eRegulon_signatures', key_added: str = 'eRegulon_AUC', enrichment_type: str = 'region', auc_threshold: float = 0.05, normalize: bool = False, n_cpu: int = 1)[source]#

Score eRegulons using AUCell

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object with formatted eRegulons.

ranking: `class::CistopicImputedFeatures`

A CistopicImputedFeatures object containing rankings, generated using the function make_rankings.

inplace: bool, optional

If set to True store result in scplus_obj, otherwise it is returned.

eRegulon_signatures_key: str, optional

Key where formated signatures are stored (in scplus_obj.uns)

key_added: str, optional

Key where formated AUC values will be stored (in scplus_obj.uns)

enrichment_type: str, optional

Whether region or gene signatures are being used

auc_threshold: float

The fraction of the ranked genome to take into account for the calculation of the Area Under the recovery Curve. Default: 0.05

normalize: bool

Normalize the AUC values to a maximum of 1.0 per regulon. Default: False

n_cpu: int

The number of cores to use. Default: 1

dimensionality reduction#

Dimmensionality reduction and clustering based on target genes and target regions AUC.

scenicplus.dimensionality_reduction.find_clusters(scplus_obj: SCENICPLUS, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], k: int | None = 10, res: List[float] | None = [0.6], seed: int | None = 555, scale: bool | None = True, prefix: str | None = '', selected_regulons: List[int] | None = None, selected_cells: List[str] | None = None, **kwargs)[source]#

Performing leiden cell or region clustering and add results to SCENICPLUS object’s metadata.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • k (int, optional) – Number of neighbours in the k-neighbours graph. Default: 10

  • res (float, optional) – Resolution parameter for the leiden algorithm step. Default: 0.6

  • seed (int, optional) – Seed parameter for the leiden algorithm step. Default: 555

  • scale (bool, optional) – Whether to scale the enrichment prior to the clustering. Default: False

  • prefix (str, optional) – Prefix to add to the clustering name when adding it to the correspondent metadata attribute. Default: ‘’

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • selected_cells (list, optional) – A list with selected features cells to cluster. Default: None (use all cells)

scenicplus.dimensionality_reduction.harmony(scplus_obj: SCENICPLUS, variable: str, auc_key: str = 'eRegulon_AUC', out_key: str = 'eRegulon_AUC_harmony', signature_keys: List[str] = ['Gene_based', 'Region_based'], scale: bool | None = True, random_state: int | None = 555, **kwargs)[source]#

Apply harmony batch effect correction (Korsunsky et al, 2019) over eRegulon AUC values :param scplus_obj: A SCENICPLUS object with eRegulon AUC values calculated. :type scplus_obj: class::SCENICPLUS :param variable: Variable in scplus.metadata_cell to correct by. :type variable: str :param auc_key: Key where AUC values are stored. :type auc_key: str, optional :param out_key: Key where corrected eRegulon values will be output :type out_key: str, optional :param signature_keys: Whether to scale probability matrix prior to correction. Default: True :type signature_keys: list, optional :param scale: Whether to scale the AUC values :type scale: bool, optional :param random_state: Random seed used to use with harmony. Default: 555 :type random_state: int, optional

References

Korsunsky, I., Millard, N., Fan, J., Slowikowski, K., Zhang, F., Wei, K., … & Raychaudhuri, S. (2019). Fast, sensitive and accurate integration of single-cell data with Harmony. Nature methods, 16(12), 1289-1296.

scenicplus.dimensionality_reduction.plot_AUC_given_ax(scplus_obj: ~scenicplus.scenicplus_class.SCENICPLUS, reduction_name: str, feature: str, ax: <module 'matplotlib.axes' from '/home/docs/checkouts/readthedocs.org/user_builds/scenicplus/envs/latest/lib/python3.8/site-packages/matplotlib/axes/__init__.py'>, auc_key: str = 'eRegulon_AUC', signature_key: str = 'Gene_based', cmap: <module 'matplotlib.cm' from '/home/docs/checkouts/readthedocs.org/user_builds/scenicplus/envs/latest/lib/python3.8/site-packages/matplotlib/cm.py'> = <matplotlib.colors.ListedColormap object>, dot_size: int = 10, alpha: float = 1, scale: bool = False, selected_cells: ~typing.List | None = None)[source]#

Plot eRegulon AUC values on dimmensionality reduction

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object with dimensionality reductions.

reduction_name: str

Name of the dimensionality reduction to use.

feature: str

eRegulon to plot, should be included in scplus_obj.uns[auc_key][signature_key] matrix.

ax: matplotlib.axes

matplotlib axes to which to plot.

auc_key: str, optional

key in scplus_obj.uns under which the AUC values are stored

signature_key: str, optional

key in scplus_obj.uns[auc_key] to plot (usually Gene_based or Region_based)

cmap: matplotlib.cm, optional

color map to use for plotting.

dot_size: int, optional

Dot size in the plot. Default: 10

alpha: float, optional

Transparency value for the dots in the plot. Default: 1

scale: bool, optional

Wether to scale AUC values before plotting

selected_cells: List, optional

A list with selected cells to plot.

scenicplus.dimensionality_reduction.plot_eRegulon(scplus_obj: ~scenicplus.scenicplus_class.SCENICPLUS, reduction_name: str, auc_key: str | None = 'eRegulon_AUC', signature_keys: ~typing.List[str] | None = ['Gene_based', 'Region_based'], normalize_tf_expression: bool | None = True, cmap: str | matplotlib.cm | None = <matplotlib.colors.ListedColormap object>, dot_size: int | None = 10, alpha: float | int | None = 1, scale: bool | None = False, selected_regulons: ~typing.List[int] | None = None, selected_cells: ~typing.List[str] | None = None, figsize: ~typing.Tuple[float, float] | None = (6.4, 4.8), num_columns: int | None = 3, save: str | None = None)[source]#

Plot TF expression and eRegulon AUC (gene and region based) into dimensionality reduction.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A cisTopic object with dimensionality reductions in class::CistopicObject.projections.

  • reduction_name (str) – Name of the dimensionality reduction to use

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • normalize_tf_expression (bool, optional) – Whether logCPM normalize TF expression. Default: True

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • dot_size (int, optional) – Dot size in the plot. Default: 10

  • alpha (float, optional) – Transparency value for the dots in the plot. Default: 1

  • scale (bool, optional) – Whether to scale the cell-topic or topic-regions contributions prior to plotting. Default: False

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • selected_cells (list, optional) – A list with selected features cells to cluster. Default: None (use all cells)

  • figsize (tuple, optional) – Size of the figure. If num_columns is 1, this is the size for each figure; if num_columns is above 1, this is the overall size of the figure (if keeping default, it will be the size of each subplot in the figure). Default: (6.4, 4.8)

  • num_columns (int, optional) – For multiplot figures, indicates the number of columns (the number of rows will be automatically determined based on the number of plots). Default: 1

  • save (str, optional) – Path to save plot. Default: None.

scenicplus.dimensionality_reduction.plot_metadata(scplus_obj: ~scenicplus.scenicplus_class.SCENICPLUS, reduction_name: str, variables: ~typing.List[str], remove_nan: bool | None = True, show_label: bool | None = True, show_legend: bool | None = False, cmap: str | matplotlib.cm | None = <matplotlib.colors.ListedColormap object>, dot_size: int | None = 10, text_size: int | None = 10, alpha: float | int | None = 1, seed: int | None = 555, color_dictionary: ~typing.Dict[str, str] | None = {}, figsize: ~typing.Tuple[float, float] | None = (6.4, 4.8), num_columns: int | None = 1, selected_cells: ~typing.List[str] | None = None, save: str | None = None)[source]#

Plot categorical and continuous metadata into dimensionality reduction.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with dimensionality reductions.

  • reduction_name (str) – Name of the dimensionality reduction to use

  • variables (list) – List of variables to plot. They should be included in class::SCENICPLUS.metadata_cell.

  • remove_nan (bool, optional) – Whether to remove data points for which the variable value is ‘nan’. Default: True

  • show_label (bool, optional) – For categorical variables, whether to show the label in the plot. Default: True

  • show_legend (bool, optional) – For categorical variables, whether to show the legend next to the plot. Default: False

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • dot_size (int, optional) – Dot size in the plot. Default: 10

  • text_size (int, optional) – For categorical variables and if show_label is True, size of the labels in the plot. Default: 10

  • alpha (float, optional) – Transparency value for the dots in the plot. Default: 1

  • seed (int, optional) – Random seed used to select random colors. Default: 555

  • color_dictionary (dict, optional) – A dictionary containing an entry per variable, whose values are dictionaries with variable levels as keys and corresponding colors as values. Default: None

  • figsize (tuple, optional) – Size of the figure. If num_columns is 1, this is the size for each figure; if num_columns is above 1, this is the overall size of the figure (if keeping default, it will be the size of each subplot in the figure). Default: (6.4, 4.8)

  • num_columns (int, optional) – For multiplot figures, indicates the number of columns (the number of rows will be automatically determined based on the number of plots). Default: 1

  • selected_cells (list, optional) – A list with selected cells to plot.

  • save (str, optional) – Path to save plot. Default: None.

scenicplus.dimensionality_reduction.plot_metadata_given_ax(scplus_obj, ax: <module 'matplotlib.axes' from '/home/docs/checkouts/readthedocs.org/user_builds/scenicplus/envs/latest/lib/python3.8/site-packages/matplotlib/axes/__init__.py'>, reduction_name: str, variable: str, remove_nan: bool | None = True, show_label: bool | None = True, show_legend: bool | None = False, cmap: str | matplotlib.cm | None = <matplotlib.colors.ListedColormap object>, dot_size: int | None = 10, text_size: int | None = 10, alpha: float | int | None = 1, seed: int | None = 555, color_dictionary: ~typing.Dict[str, str] | None = {}, selected_cells: ~typing.List[str] | None = None)[source]#

Plot categorical and continuous metadata into dimensionality reduction.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with dimensionality reductions.

  • ax (matplotlib.axes) – Axes to which to plot metadata.

  • reduction_name (str) – Name of the dimensionality reduction to use

  • variable (Str) – Variable to plot. It should be included in class::SCENICPLUS.metadata_cell.

  • remove_nan (bool, optional) – Whether to remove data points for which the variable value is ‘nan’. Default: True

  • show_label (bool, optional) – For categorical variables, whether to show the label in the plot. Default: True

  • show_legend (bool, optional) – For categorical variables, whether to show the legend next to the plot. Default: False

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • dot_size (int, optional) – Dot size in the plot. Default: 10

  • text_size (int, optional) – For categorical variables and if show_label is True, size of the labels in the plot. Default: 10

  • alpha (float, optional) – Transparency value for the dots in the plot. Default: 1

  • seed (int, optional) – Random seed used to select random colors. Default: 555

  • color_dictionary (dict, optional) – A dictionary containing an entry per variable, whose values are dictionaries with variable levels as keys and corresponding colors as values. Default: None

  • figsize (tuple, optional) – Size of the figure. If num_columns is 1, this is the size for each figure; if num_columns is above 1, this is the overall size of the figure (if keeping default, it will be the size of each subplot in the figure). Default: (6.4, 4.8)

  • num_columns (int, optional) – For multiplot figures, indicates the number of columns (the number of rows will be automatically determined based on the number of plots). Default: 1

  • selected_cells (list, optional) – A list with selected cells to plot.

  • save (str, optional) – Path to save plot. Default: None.

scenicplus.dimensionality_reduction.run_eRegulons_pca(scplus_obj: SCENICPLUS, scale: bool | None = True, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], reduction_name: str | None = 'eRegulons_PCA', random_state: int | None = 555, selected_regulons: List[int] | None = None, selected_cells: List[str] | None = None, n_pcs: int | None = 50, **kwargs)[source]#

Run UMAP and add it to the dimensionality reduction dictionary.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • scale (bool, optional) – Whether to scale the cell-topic or topic-regions contributions prior to the dimensionality reduction. Default: False

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • reduction_name (str, optional) – Key used to store dimensionality reduction in scplud_obj.dr_cell. Default: eRegulon_AUC.

  • random_state (int, optional) – Seed parameter for running UMAP. Default: 555

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • selected_cells (list, optional) – A list with selected features cells to cluster. Default: None (use all cells)

  • n_pcs (int, optional) – Number of principle components to calculate. Default: 50

  • **kwargs – Parameters to pass to umap.UMAP.

scenicplus.dimensionality_reduction.run_eRegulons_tsne(scplus_obj: SCENICPLUS, scale: bool | None = True, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], reduction_name: str | None = 'eRegulons_tSNE', random_state: int | None = 555, selected_regulons: List[int] | None = None, selected_cells: List[str] | None = None, **kwargs)[source]#

Run TSNE and add it to the dimensionality reduction dictionary.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • scale (bool, optional) – Whether to scale the enrichments prior to the dimensionality reduction. Default: False

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • reduction_name (str, optional) – Key used to store dimensionality reduction in scplud_obj.dr_cell. Default: eRegulon_AUC.

  • random_state (int, optional) – Seed parameter for running UMAP. Default: 555

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • selected_cells (list, optional) – A list with selected features cells to cluster. Default: None (use all cells)

  • **kwargs – Parameters to pass to the tSNE functions.

scenicplus.dimensionality_reduction.run_eRegulons_umap(scplus_obj: SCENICPLUS, scale: bool | None = True, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], reduction_name: str | None = 'eRegulons_UMAP', random_state: int | None = 555, selected_regulons: List[int] | None = None, selected_cells: List[str] | None = None, **kwargs)[source]#

Run UMAP and add it to the dimensionality reduction dictionary.

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • scale (bool, optional) – Whether to scale the cell-topic or topic-regions contributions prior to the dimensionality reduction. Default: False

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • reduction_name (str, optional) – Key used to store dimensionality reduction in scplud_obj.dr_cell. Default: eRegulon_AUC.

  • random_state (int, optional) – Seed parameter for running UMAP. Default: 555

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • selected_cells (list, optional) – A list with selected features cells to cluster. Default: None (use all cells)

  • **kwargs – Parameters to pass to umap.UMAP.

eRegulon specificity score (eRSS)#

Calculate the specificty of eRegulons in clusters of cells.

Calculates the distance between the real distribution of eRegulon AUC values and a fictional distribution where the eRegulon is only expressed/accessible in cells of a certain cluster.

scenicplus.RSS.plot_rss(scplus_obj: SCENICPLUS, rss_key: str, top_n: int | None = 5, selected_groups: List[str] | None = None, num_columns: int | None = 1, figsize: Tuple[float, float] | None = (6.4, 4.8), fontsize: int | None = 12, save: str | None = None)[source]#

Plot RSS values per group

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • rss_key (str, optional) – Key to extract RSS values from.

  • top_n (int, optional) – Number of top eRegulons to highlight.

  • selected_groups (List, optional) – Groups to plot. Default: None (all)

  • num_columns (int, optional) – Number of columns for multiplotting

  • figsize (tuple, optional) – Size of the figure. If num_columns is 1, this is the size for each figure; if num_columns is above 1, this is the overall size of the figure (if keeping default, it will be the size of each subplot in the figure). Default: (6.4, 4.8)

  • fontsize (int, optional) – Size of the eRegulons names in plot.

  • save (str, optional) – Path to save plot. Default: None.

scenicplus.RSS.regulon_specificity_scores(scplus_obj: SCENICPLUS, variable: str, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], selected_regulons: List[int] | None = None, scale: bool | None = False, out_key_suffix: str | None = '')[source]#

Calculate the Regulon Specificty Scores (RSS). [doi: 10.1016/j.celrep.2018.10.045]

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • variable (str) – Variable to calculate the RSS values for.

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • scale (bool, optional) – Whether to scale the enrichment prior to the clustering. Default: False

  • out_key_suffix (str, optional) – Suffix to add to the variable name to store the values (at scplus_obj.uns[‘RSS’])

Network#

export eRegulons to eGRN network and plot.

scenicplus.networks.concentrical_layout(G, dist_genes=1, dist_TF=0.1)[source]#

Generate custom concentrical layout

Parameters:
  • G (Graph) – A networkx graph

  • dist_genes (int, optional) – Distance from the regions to the genes

  • dist_TF – Distance from the TF to the regions

scenicplus.networks.create_nx_graph(nx_tables: Dict, use_edge_tables: List = ['TF2R', 'R2G'], color_edge_by: Dict = {}, transparency_edge_by: Dict = {}, width_edge_by: Dict = {}, color_node_by: Dict = {}, transparency_node_by: Dict = {}, size_node_by: Dict = {}, shape_node_by: Dict = {}, label_size_by: Dict = {}, label_color_by: Dict = {}, layout: str = 'concentrical_layout', lc_dist_genes: float = 0.8, lc_dist_TF: float = 0.1, scale_position_by: float = 250)[source]#

Format node/edge feature tables into a graph

Parameters:
  • nx_tables (Dict) – Dictionary with node/edge feature tables as produced by create_nx_tables

  • use_edge_tables (List, optional) – List of edge tables to use

  • color_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.

  • transparency_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.

  • width_edge_by (Dict, optional) – A dictionary containing for a given edge key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.

  • color_node_by (Dict, optional) – A dictionary containing for a given node key the variable and color map to color edges by. If the variable is categorical, the entry ‘categorical_color’ can be provided as a dictionary with category: color. If it is a continuous variable a color map can be provided as continuous_color and entried v_max and v_min can be provided to control the min and max values of the scale. Alternatively, one fixed color can use by using ‘fixed_color’ as variable, alterntively adding an entry fixed_color: color to the dictionary.

  • transparency_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min alpha values. The variable name has to be provided (only continuous variables accepted), together with v_max/v_mix parameters if desired. Alternatively, one fixed alpha can use by using ‘fixed_alpha’ as variable, alterntively adding an entry fixed_alpha: size to the dictionary.

  • size_node_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_size’ as variable, alterntively adding an entry fixed_size: size to the dictionary.

  • shape_node_by (Dict, optional) – A dictionary containing for a given node key the variable and shapes. The variable name has to be provided (only categorical variables accepted). Alternatively, one fixed shape can use by using ‘fixed_shape’ as variable, alterntively adding an entry fixed_shape: size to the dictionary.

  • label_size_by (Dict, optional) – A dictionary containing for a given node key the variable and the max and min sizes. The variable name has to be provided (only continuous variables accepted), together with max_size/min_size parameters if desired. Alternatively, one fixed size can use by using ‘fixed_label_size’ as variable, alterntively adding an entry fixed_label_size: size to the dictionary.

  • label_color_by (Dict, optional) – A dictionary containing for a given node key the variable and a color dictionary. The variable name has to be provided (only categorical variables accepted), together with a color dictionary if desired. Alternatively, one fixed color can use by using ‘fixed_label_color’ as variable, alterntively adding an entry fixed_label_color: size to the dictionary.

  • layout (str, optional) – Layout to use. Options are: ‘concentrical_layout’ (SCENIC+ custom layout) or kamada_kawai_layout (from networkx).

  • lc_dist_genes (float, optional) – Distance between regions and genes. Only used if using concentrical_layout.

  • lc_dist_TF (float, optional) – Distance between TF and regions. Only used if using concentrical_layout.

  • scale_position_by (int, optional) – Value to scale positions for visualization in pyvis.

scenicplus.networks.create_nx_tables(scplus_obj: SCENICPLUS, eRegulon_metadata_key: str = 'eRegulon_metadata', subset_eRegulons: List = None, subset_regions: List = None, subset_genes: List = None, add_differential_gene_expression: bool = False, add_differential_region_accessibility: bool = False, differential_variable: List = [])[source]#

A function to format eRegulon data into tables for plotting eGRNs.

Parameters:
  • scplus_obj (SCENICPLUS) – A SCENICPLUS object with eRegulons

  • eRegulon_metadata_key (str, optional) – Key where the eRegulon metadata dataframe is stored

  • subset_eRegulons (list, optional) – List of eRegulons to subset

  • subset_regions (list, optional) – List of regions to subset

  • subset_genes (list, optional) – List of genes to subset

  • add_differential_gene_expression (bool, optional) – Whether to calculate differential gene expression logFC for a given variable

  • add_differential_region_accessibility (bool, optional) – Whether to calculate differential region accessibility logFC for a given variable

  • differential_variable (list, optional) – Variable to calculate differential gene expression or region accessibility.

scenicplus.networks.export_to_cytoscape(G, pos, out_file: str, pos_scaling_factor: int = 200, size_scaling_factor: int = 1)[source]#

A function to export to cytoscape :param G: A networkx graph. :type G: Graph :param Pos: generated by running create_nx_graph. :type Pos: coordinates of graph nodes :param out_file: Path to wich to save the export. :type out_file: str :param pos_scaling_factor: Factor by which to scale the graph node coordinates. :type pos_scaling_factor: int, optional :param size_scaling_factor: Factor by which tos cale the graph node sizes. :type size_scaling_factor: int, optional

scenicplus.networks.plot_networkx(G, pos)[source]#

A function to plot networks with networkx

Parameters:
  • G (Graph) – A networkx graph

  • pos (Dict) – Position values

Correlation plot#

Plot correlation and overlap of eRegulons

scenicplus.plotting.correlation_plot.correlation_heatmap(scplus_obj: SCENICPLUS, auc_key: str | None = 'eRegulon_AUC', signature_keys: List[str] | None = ['Gene_based', 'Region_based'], scale: bool | None = False, linkage_method: str | None = 'average', fcluster_threshold: float | None = 0.1, selected_regulons: List[int] | None = None, cmap: str | None = 'viridis', plotly_height: int | None = 1000, fontsize: int | None = 3, save: str | None = None, use_plotly: int | None = True, figsize: Tuple[int, int] | None = (20, 20))[source]#

Plot correlation between eRegulons enrichment,

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulons AUC computed.

  • auc_key (str, optional) – Key to extract AUC values from. Default: ‘eRegulon_AUC’

  • signature_keys (List, optional) – Keys to extract AUC values from. Default: [‘Gene_based’, ‘Region_based’]

  • scale (bool, optional) – Whether to scale the enrichments prior to the dimensionality reduction. Default: False

  • linkage_method (str, optional) – Linkage method to use for clustering. See scipy.cluster.hierarchy.linkage.

  • fcluster_threshold (float, optional) – Threshold to use to divide hierarchical clustering into clusters. See scipy.cluster.hierarchy.fcluster.

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • plotly_height (int, optional) – Height of the plotly plot. Width will be adjusted accordingly

  • fontsize (int, optional) – Labels fontsize

  • save (str, optional) – Path to save heatmap as file

  • use_plotly (bool, optional) – Use plotly or seaborn to generate the image

  • figsize (tupe, optional) – Matplotlib figsize, used only if use_plotly == False

scenicplus.plotting.correlation_plot.fisher_exact_test_heatmap(scplus_obj, gene_or_region_based: str = 'Gene_based', signature_key: str | None = 'eRegulon_signatures', selected_regulons: List[int] | None = None, linkage_method: str | None = 'average', fcluster_threshold: float | None = 0.1, cmap: str | None = 'viridis', plotly_height: int | None = 1000, fontsize: int | None = 3, save: str | None = None, use_plotly: int | None = True, figsize: Tuple[int, int] | None = (20, 20), vmin=None, vmax=None, return_data=False)[source]#

Plot jaccard index of regions/genes

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulon signatures.

  • method (str) – Wether to use Jaccard (jaccard) or normalized intersection (intersect) as metric

  • gene_or_region_based (str) – Gene_based or Region_based eRegulon signatures to use.

  • signature_key (List, optional) – Key to extract eRegulon signatures from

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • linkage_method (str, optional) – Linkage method to use for clustering. See scipy.cluster.hierarchy.linkage.

  • fcluster_threshold (float, optional) – Threshold to use to divide hierarchical clustering into clusters. See scipy.cluster.hierarchy.fcluster.

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • plotly_height (int, optional) – Height of the plotly plot. Width will be adjusted accordingly

  • fontsize (int, optional) – Labels fontsize

  • save (str, optional) – Path to save heatmap as file

  • use_plotly (bool, optional) – Use plotly or seaborn to generate the image

  • figsize (tupe, optional) – Matplotlib figsize, used only if use_plotly == False

  • return_data (boolean, optional) – Return data

  • plot_dendrogram (boolean, optional) –

scenicplus.plotting.correlation_plot.jaccard_heatmap(scplus_obj: SCENICPLUS, method: str = 'jaccard', gene_or_region_based: str = 'Gene_based', signature_key: str | None = 'eRegulon_signatures', selected_regulons: List[int] | None = None, linkage_method: str | None = 'average', fcluster_threshold: float | None = 0.1, cmap: str | None = 'viridis', plotly_height: int | None = 1000, fontsize: int | None = 3, save: str | None = None, use_plotly: int | None = True, figsize: Tuple[int, int] | None = (20, 20), vmin=None, vmax=None, return_data=False)[source]#

Plot jaccard index of regions/genes

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with eRegulon signatures.

  • method (str) – Wether to use Jaccard (jaccard) or normalized intersection (intersect) as metric

  • gene_or_region_based (str) – Gene_based or Region_based eRegulon signatures to use.

  • signature_key (List, optional) – Key to extract eRegulon signatures from

  • selected_regulons (list, optional) – A list with selected regulons to be used for clustering. Default: None (use all regulons)

  • linkage_method (str, optional) – Linkage method to use for clustering. See scipy.cluster.hierarchy.linkage.

  • fcluster_threshold (float, optional) – Threshold to use to divide hierarchical clustering into clusters. See scipy.cluster.hierarchy.fcluster.

  • cmap (str or 'matplotlib.cm', optional) – For continuous variables, color map to use for the legend color bar. Default: cm.viridis

  • plotly_height (int, optional) – Height of the plotly plot. Width will be adjusted accordingly

  • fontsize (int, optional) – Labels fontsize

  • save (str, optional) – Path to save heatmap as file

  • use_plotly (bool, optional) – Use plotly or seaborn to generate the image

  • figsize (tupe, optional) – Matplotlib figsize, used only if use_plotly == False

  • return_data (boolean, optional) – Return data

  • plot_dendrogram (boolean, optional) –

Coverage plot#

Plot chromatin accessibility profiles and region to gene arcs.

scenicplus.plotting.coverageplot.coverage_plot(SCENICPLUS_obj: SCENICPLUS, bw_dict: Mapping[str, str], region: str, genes_violin_plot: str | List | None = None, genes_arcs: str | List | None = None, gene_height: int = 1, exon_height: int = 4, meta_data_key: str | None = None, pr_consensus_bed: PyRanges | None = None, region_bed_height: int = 1, pr_gtf: PyRanges | None = None, pr_interact: PyRanges | None = None, bw_ymax: float | None = None, color_dict: Mapping[str, str] | None = None, cmap='tab20', plot_order: list | None = None, figsize: tuple = (6, 8), fontsize_dict={'bigwig_label': 9, 'bigwig_tick_label': 5, 'gene_label': 9, 'title': 12, 'violinplots_xlabel': 9, 'violinplots_ylabel': 9}, gene_label_offset=3, arc_rad=0.5, arc_lw=1, cmap_violinplots='Greys', violinplots_means_color='black', violinplots_edge_color='black', violoinplots_alpha=1, width_ratios_dict={'bigwig': 3, 'violinplots': 1}, height_ratios_dict={'arcs': 5, 'bigwig_violin': 1, 'custom_ax': 2, 'genes': 0.5}, sort_vln_plots: bool = False, add_custom_ax: int | None = None) Figure[source]#

Inspired by: https://satijalab.org/signac/reference/coverageplot

Generates a figure showing chromatin accessibility coverage tracks, gene expression violin plots and region to gene interactions.

Parameters:
SCENICPLUS_obj

An instance of class ~scenicplus.scenicplus_class.SCENICPLUS.

bw_dict

A dict containing celltype annotations/cell groups as keys and paths to bigwig files as values.

region

A string specifying the region of interest, in chr:start-end format.

genes_violin_plot

A list or string specifying for which gene(s) to plot gene expression violin plots. default: None

genes_arcs

A list or string specifying for which gene(s) to plot arcs default: None (i.e. all genes in window)

gene_height

An int specifying the size of non-exon parts of a gene (shown underneath the coverage plot). default: 1

exon_height

An int specifying the size of nexon parts of a gene (shown underneath the coverage plot). default: 2

meta_data_key

A key specifying were to find annotations corresponding to the keys in bw_dict in SCENICPLUS_obj.metadata_cell default: None

pr_consensus_bed

An instance of class pyranges.PyRanges containing consensus peaks. Names of these peaks should correspond to annotations. default: None

region_bed_height

An int specifying with which height to draw lines corresponding to regions in pr_consensus_bed default: 1

pr_gtf

An instance of class pyranges.PyRanges containing gtf-formatted information about gene structure. default: None

pr_interact

An instance of class pyranges.PyRanges containing region to gene link information in ucsc interact format (use scenicplus.utils.get_interaction_pr to create such an object). default: None

bw_ymax

A float specifying the maximum height at which to draw coverage plots. default: None

color_dict

A dict specifying colors for each key in bw_dict. default: None

cmap

A string specifying a matplotlib.cm colormap to use for coloring coverage plots. default: ‘tab20’

plot_order

A list specifying the order to use for plotting coverage plots and violin plots. default: None

figsize

A tuple specifying the fig size/ default: (6, 8)

fontsize_dict

A dictionary specifying the fontsize of various labels in the plot. default: {‘bigwig_label’: 9, ‘gene_label’: 9, ‘violinplots_xlabel’: 9, ‘title’: 12, ‘bigwig_tick_label’: 5}

gene_label_offset

The y-offset of the label underneath each gene. default: 3

arc_rad

The amount of radians the region-to-gene arc should be bend. default: 0.5

arc_lw

The linewidth of the region-to-gene arcs. default: 1

cmap_violinplots

A string specifying a matplotlib.cm colormap to use for coloring violinplots. default: ‘Greys’

violinplots_means_color

A string specifying which color to use to indicate the mean value in violinplots. default: ‘black’

violinplots_edge_color

A string specifying which color to use for the edge of the violinplots. default: ‘black’

violoinplots_alpha

The alpha value of the violinplots facecolor. default: 1

width_ratios_dict

A dict specifying the ratio in vertical direction each part of the plot should use. default: {‘bigwig’: 3, ‘violinplots’: 1}

height_ratios_dict

A dict specifying the ratio in horizontal direction each part of the plot should use. default: {‘bigwig_violin’: 1, ‘genes’: 0.5, ‘arcs’: 5}

dotplot#

Plot TF expression, motif enrichment and AUC values of target genes and regions in a dotplot.

scenicplus.plotting.dotplot.dotplot(df_dotplot: DataFrame, ax: axes | None = None, region_set_key: str = 'Name', size_var: str = 'TF_expression', color_var: str = 'Cistrome_AUC', order_group: list | None = None, order_cistromes: list | None = None, order_cistromes_by_max: str = 'TF_expression', cluster: str | None = None, n_clust: int = 2, min_point_size: float = 3, max_point_size: float = 30, cmap: str = 'viridis', vmin: float = 0, vmax: float | None = None, x_tick_rotation: float = 45, x_tick_ha: str = 'right', fontsize: float = 9, z_score_expr: bool = True, z_score_enr: bool = True, grid_color='grey', grid_lw=0.5, highlight=None, highlight_lw=1, highlight_lc='black', figsize: Tuple[float, float] | None = (10, 10), use_plotly=True, plotly_height=1000, save=None)[source]#

DEPRECATED Function to generate dotplot

Parameters:
  • df_dotplot (pd.DataFrame) – A pd.DataFrame with a column Group and varaibles to use for the dotplot

  • ax (plt.axes) – An ax object, only require if use_plotly == False.

  • enrichment_variable (str, optional) – Variable to use for motif/TF enrichment. Default: ‘Cistrome_AUC;

  • region_set_key (str, optional) – Name of the column where TF/cistromes are. Default: ‘Cistrome’

  • size_var (str, optional) – Variable in df_dotplot to code the dot size with. Default: ‘Cistrome_AUC’

  • color_var (str, optional) – Variable in df_dotplot to code the dot color with. Default: ‘Cistrome_AUC’

  • order_group (list, optional) – Order to plot the groups by. Default: None (given)

  • order_cistromes (list, optional) – Order to plot the cistromes by. Default: None (given)

  • order_cistromes_by_max (str, optional) – Variable to order the cistromes with by values in group, to form a diagonal. Default: None

  • cluster (str, optional) – Whether to cluster cistromes (or TFs)/groups. It will be overwritten if order_cistromes/order_group/order_cistromes_by_max are specified.

  • n_clust (int, optional) – Number of clusters to form if cluster is specified. Default: 2

  • min_point_size (int, optional) – Minimal point size. Only available with Cistrome AUC values.

  • max_point_size (int, optional) – Maximum point size. Only available with Cistrome AUC values.

  • cmap (str, optional) – Color map to use.

  • vmin (int, optional) – Minimal color value.

  • vmax (int, optional) – Maximum color value

  • x_tick_rotation (int, optional) – Rotation of the x-axis label. Only if use_plotly = False

  • fontsize (int, optional) – Labels fontsize

  • z_score_expr (bool, optional) – Whether to z-normalize expression values

  • z_score_enr (bool, optional) – Whether to z-normalize enrichment values

  • grid_color (str, optional) – Color for grid. Only if use_plotly = False

  • grid_lw (float, optional) – Line width of grid. Only if use_plotly = False

  • highlight (list, optional) – Whether to highlight specific TFs/cistromes. Only if use_plotly = False

  • highlight_lw (float, optional) – Line width of highlight. Only if use_plotly = False

  • highlight_lc (str, optional) – Color of highlight. Only if use_plotly = False

  • figsize (tuple, optional) – Figure size. Only if use_plotly = False

  • use_plotly (bool, optional) – Whether to use plotly for plotting. Default: True

  • plotly_height (int, optional) – Height of the plotly plot. Width will be adjusted accordingly

  • save (str, optional) – Path to save dotplot as file

scenicplus.plotting.dotplot.generate_dotplot_df(scplus_obj: SCENICPLUS, size_matrix: DataFrame, color_matrix: DataFrame, scale_size_matrix: bool = True, scale_color_matrix: bool = True, group_variable: str | None = None, subset_eRegulons: list | None = None) DataFrame[source]#

Function to generate dotplot dataframe from cistrome AUC enrichment

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object.

size_matrix: pd.DataFrame

A pd.DataFrame containing values to plot using size scale.

color_matrix

A pd.DataFrame containing values to plot using color scale.

scale_size_matrix: bool

Scale size matrix between 0 and 1 along index.

scale_color_matrix: bool

Scale color matrix between 0 and 1 along index.

group_variable: str:

Variable by which to group cell barcodes by (needed if the index of size or color matrix are cells.)

subset_eRegulons: List

List of eRegulons to plot.

Returns
——-
pd.DataFrame
scenicplus.plotting.dotplot.generate_dotplot_df_AUC(scplus_obj: SCENICPLUS, auc_key: str, enrichment_key: str, group_variable: str, subset_cells: list | None = None, subset_eRegulons: list | None = None, use_pseudobulk: bool = False, normalize_expression: bool = True, standardize_expression: bool = False, standardize_auc: bool = False)[source]#

DEPRECATED Function to generate dotplot dataframe from cistrome AUC enrichment

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with motif enrichment results from pycistarget (scplus_obj.menr).

  • enrichment_key (str) – Key of the motif enrichment result to use.

  • group_variable (str) – Group variable to use to calculate TF expression per group. Levels of this variable should match with the entries in the selected motif enrichment dictionary.

  • subset_cells (list, optional) – Subset of classes to use. Default: None (use all)

  • subset_eRegulons (list, optional) – List of cistromes to use. Default: None (use all)

  • use_pseudobulk (bool, optional) – Whether to use pseudobulk to calculate gene expression. Only available if there is pseudobulk profiles for group_variable. Default: False

  • normalize_expression (bool, optional) – Whether to log(CPM) normalize gene expression. Default: True

  • standardize_expression (bool, optional) – Wheter to standarize gene expression between o and 1. Default: False

  • standardize_auc (bool, optional) – Wheter to standarize AUC values between o and 1. Default: False

Returns:
A data frame with enrichment values per TF and cell group to use as input for dotplot().
scenicplus.plotting.dotplot.generate_dotplot_df_motif_enrichment(scplus_obj: SCENICPLUS, enrichment_key: str, group_variable: str | None = None, barcode_groups: dict | None = None, subset: list | None = None, subset_TFs: list | None = None, use_pseudobulk: bool = False, use_only_direct: bool = False, normalize_expression: bool = True, standardize_expression: bool = False)[source]#

DEPRECATED Function to generate dotplot dataframe from motif enrichment results

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENICPLUS object with motif enrichment results from pycistarget (scplus_obj.menr).

  • enrichment_key (str) – Key of the motif enrichment result to use.

  • group_variable (str, optional) – Group variable to use to calculate TF expression per group. Levels of this variable should match with the entries in the selected motif enrichment dictionary. Only required if barcode groups is not provided.

  • barcode_groups (Dict, optional) – A dictionary containing cell barcodes per class to calculate TF expression per group. Keys of the dictionary should match the keys in the selected motif enrichment dictionary. Only required if group_variable is not provided.

  • subset (list, optional) – Subset of classes to use. Default: None (use all)

  • subset_TFs (list, optional) – List of TFs to use. Default: None (use all)

  • use_pseudobulk (bool, optional) – Whether to use pseudobulk to calculate gene expression. Only available if there is pseudobulk profiles for group_variable. Default: False

  • use_only_direct (bool, optional) – Whether to only use NES values of directly annotated motifs. Default: False

  • normalize_expression (bool, optional) – Whether to log(CPM) normalize gene expression. Default: True

  • standardize_expression (bool, optional) – Wheter to standarize gene expression between o and 1. Default: False

Returns:
A data frame with enrichment values per TF and cell group to use as input for dotplot().
scenicplus.plotting.dotplot.heatmap_dotplot(scplus_obj: SCENICPLUS, size_matrix: DataFrame, color_matrix: DataFrame, scale_size_matrix: bool = True, scale_color_matrix: bool = True, group_variable: str | None = None, subset_eRegulons: list | None = None, sort_by: str = 'color_val', index_order: list | None = None, save: str | None = None, figsize: tuple = (5, 8), split_repressor_activator: bool = True, orientation: str = 'vertical')[source]#

Function to generate dotplot dataframe from cistrome AUC enrichment

Parameters:
scplus_obj: `class::SCENICPLUS`

A SCENICPLUS object.

size_matrix: pd.DataFrame

A pd.DataFrame containing values to plot using size scale.

color_matrix

A pd.DataFrame containing values to plot using color scale.

scale_size_matrix: bool

Scale size matrix between 0 and 1 along index.

scale_color_matrix: bool

Scale color matrix between 0 and 1 along index.

group_variable: str:

Variable by which to group cell barcodes by (needed if the index of size or color matrix are cells.)

subset_eRegulons: List

List of eRegulons to plot.

sort_by: str

Sort by color_val or size_val.

index_order: list

Order of index to plot.

figsize: tuple

size of the figure (x, y).

split_repressor_activator: bool

Wether to split the plot on repressors/activators.

orientation: str

Plot in horizontal or vertical orientation

Export#

Export to loom#

export SCENIC+ object to target genes and target regions loom file.

These files can be visualized in the SCope single cell viewer.

scenicplus.loom.export_to_loom(scplus_obj: SCENICPLUS, signature_key: str, out_fname: str, eRegulon_metadata_key: str | None = 'eRegulon_metadata', auc_key: str | None = 'eRegulon_AUC', auc_thr_key: str | None = 'eRegulon_AUC_thresholds', keep_direct_and_extended_if_not_direct: bool | None = False, selected_features: List[str] | None = None, selected_cells: List[str] | None = None, cluster_annotation: List[str] | None = None, tree_structure: Sequence[str] = (), title: str | None = None, nomenclature: str = 'Unknown')[source]#

Create SCope [Davie et al, 2018] compatible loom files

Parameters:
  • scplus_obj (class::SCENICPLUS) – A SCENIC+ object with eRegulons, AUCs, and AUC thresholds computed

  • signature_key (str) – Whether a ‘Gene_based’ or a ‘Region_based’ file should be produced. Possible values: ‘Gene_based’ or ‘Region_based’

  • out_fname (str) – Path to output file.

  • eRegulon_metadata_key (str, optional) – Slot where the eRegulon metadata is stored.

  • auc_key (str, optional) – Slot where the eRegulon AUC are stored

  • auc_thr_key (str, optional) – Slot where AUC thresholds are stored

  • keep_direct_and_extended_if_not_direct (bool, optional) – Keep only direct eregulons and add extended ones only if there is not a direct one.

  • selected_features (List, optional) – A list with selected genes/region to use.

  • selected_cells (List, optional) – A list with selected cells to use.

  • cluster_annotations (List, optional) – A list with variables to use as cluster annotations. By default, those included in the DEGs/DARs dictionary will be used (plus those specificed here)

  • tree_structure (sequence, optional) – A sequence of strings that defines the category tree structure. Needs to be a sequence of strings with three elements. Default: ()

  • title (str, optional) – The title for this loom file. If None than the basename of the filename is used as the title. Default: None

  • nomenclature (str, optional) – The name of the genome. Default: ‘Unknown’

References

Davie, K., Janssens, J., Koldere, D., De Waegeneer, M., Pech, U., Kreft, Ł., … & Aerts, S. (2018). A single-cell transcriptome atlas of the aging Drosophila brain. Cell, 174(4), 982-998. Van de Sande, B., Flerin, C., Davie, K., De Waegeneer, M., Hulselmans, G., Aibar, S., … & Aerts, S. (2020). A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nature Protocols, 15(7), 2247-2276.