nulogo

Annotation Module

flyqma.annotation provides several tools for labeling distinct subpopulations of cells within an image. Subpopulations are identified on the basis of their clonal marker expression level using a novel unsupervised classification strategy. Please see the Fly-QMA manuscript for a detailed description of the annotation strategy and its various parameters.

class flyqma.annotation.labelers.AttributeLabeler(label, attribute, labels)[source]

Assigns label to cell measurement data based on an existing attribute.

Attributes:

label (str) - name of label field to be added

attribute (str) - existing cell attribute used to determine labels

labeler (vectorized func) - callable that maps attribute values to labels

assign_labels(data)[source]

Assign labels by adding <label> field to cell measurement data.

Args:

data (pd.DataFrame) - cells measurement data with <attribute> field

class flyqma.annotation.labelers.CelltypeLabeler(label='celltype', attribute='genotype', labels=None)[source]

Assigns <celltype> to cell measurement data based on <genotype> attribute.

Attributes:

label (str) - name of label field to be added

attribute (str) - existing cell attribute used to determine labels

labeler (vectorized func) - callable that maps attribute values to labels

class flyqma.annotation.annotation.Annotation(attribute, sampler_type='radial', sampler_kwargs={}, min_num_components=3, max_num_components=10, num_labels=3)[source]

Object for assigning labels to measurements. Object is trained on one or more graphs by fitting a bivariate mixture model and using a model selection procedure to select an optimal number of components.

The trained model may then be used to label measurements in other graphs, either through direct prediction via the bivariate mixture model or through a hybrid prediction combining the bivariate mixture model with a marginal univariate model.

Attributes:

classifier (Classifier derivative) - callable object

attribute (str) - attribute used to determine labels

sampler_type (str) - either ‘radial’, ‘neighbors’, ‘community’

sampler_kwargs (dict) - keyword arguments for sampler

min_num_components (int) - minimum number of mixture components

max_num_components (int) - maximum number of mixture components

num_labels (int) - maximum number of unique labels to be assigned

Parameters:

kwargs: keyword arguments for Classifier

annotate(graph, bivariate_only=False, threshold=0.8, alpha=0.9, sampler_type=None, sampler_kwargs=None)[source]

Annotate graph of measurements.

Args:

graph (spatial.WeightedGraph)

bivariate_only (bool) - if True, only use posteriors evaluated using the bivariate mixture model. Otherwise, use the marginal univariate posterior by default, replacing uncertain values with their counterparts estimated by the bivariate model.

threshold (float) - minimum marginal posterior probability of a given label before spatial context is considered

alpha (float) - attenuation factor

sampler_type (str) - either ‘radial’, ‘neighbors’ or ‘community’

sampler_kwargs (dict) - keyword arguments for sampling

Returns:

labels (np.ndarray[int]) - labels for each measurement in graph

combine_posteriors(posterior, marginal_posterior, threshold=0.8)[source]

Replace uncertain posterior probablilities with their more certain marginal counterparts. If the maximum marginal posterior probability for a given sample does not meet the specified threshold while the maximum bivarite posterior probability does, the latter value is used. Otherwise, the marginal value is used.

Args:

posterior (np.ndarray[float]) - posterior probabilities of each label

marginal_posterior (np.ndarray[float]) - marginal posterior probabilities of each label

threshold (float) - minimum marginal posterior probability of a given label before spatial context is considered

Returns:

combined (np.ndarray[float])

classmethod copy(src)[source]

Instantiate from another <source> annotator instance.

static diffuse_posteriors(graph, posterior, alpha=0.9)[source]

Diffuse estimated posterior probabilities of each label along the weighted edges of the graph.

Args:

graph (Graph) - graph connecting adjacent measurements

posterior (np.ndarray[float]) - posterior probabiltiy of each label

alpha (float) - attenuation factor

Returns:

diffused_posteriors (np.ndarray[float])

evaluate_marginal_posterior(sample, margin)[source]

Evaluates posterior probability of each label using only the specified marginal distribution.

Args:

sample (np.ndarray[float]) - sample values

margin (int) - index of desired margin

Returns:

marginal_posterior (np.ndarray[float])

classmethod from_data(data, attribute, xykey=None, **kwargs)[source]

Instantiate annotation object from measurement data.

Args:

data (pd.DataFrame) - measurement data containing <attribute>, as well as <xykey> fields

attribute (str) - name of attribute used to classify cells

xykey (list) - name of attributes defining measurement x/y position

kwargs: keyword arguments for Annotation

Returns:

annotator (Annotation derivative)

classmethod from_layer(layer, attribute, **kwargs)[source]

Instantiate from layer.

Args:

layer (data.Layer) - image layer instance

attribute (str) - name of attribute used to classify cells

kwargs: keyword arguments for Annotation

Returns:

annotator (Annotation derivative)

get_sample(graph, sampler_type, sampler_kwargs)[source]

Get sample to be annotated. A sample consists of a columns of measured levels adjoined to a column of levels averaged over the neighborhood of each measurement.

Args:

graph (spatial.WeightedGraph)

sampler_type (str) - either ‘radial’, ‘neighbors’ or ‘community’

sampler_kwargs (dict) - keyword arguments for sampling

Returns:

sample (np.ndarray[float]) - sampled levels

get_sampler(graph, sampler_type=None, sampler_kwargs=None)[source]

Instantiate sampler.

Args:

graph (spatial.WeightedGraph)

sampler_type (str) - either ‘radial’, ‘neighbors’ or ‘community’

sampler_kwargs (dict) - keyword arguments for sampling

Returns:

sampler

train(*graphs)[source]

Train classifier on a series of graphs.

Args:

graphs (Graph or WeightedGraph) - graphs of adjacent measurements

class flyqma.annotation.annotation.AnnotationIO[source]

Methods for saving and loading an Annotation instance.

classmethod load(path)[source]

Load annotator from file.

Args:

path (str) - path to annotation directory

Returns:

annotator (Annotation derivative)

property parameters

Dictionary of parameter values.

save(dirpath, data=False, image=False, **kwargs)[source]

Save annotator to specified path.

Args:

dirpath (str) - directory in which annotator is to be saved

data (bool) - if True, save training data

image (bool) - if True, save classifier image

kwargs: keyword arguments for image rendering

Mixture Models

Tools for fitting univariate and bivariate gaussian mixture models.

class flyqma.annotation.mixtures.univariate.MixtureProperties[source]

Properties for guassian mixture models.

property AIC

AIC score.

property BIC

BIC score.

property bounds

Low and upper bounds of support.

property component_pdfs

Returns stacked array of component PDFs.

property components

Individual model components.

property lbound

Lower bound of support.

property log_likelihood

Maximized log likelihood.

property means

Mean value of each component.

property num_components

Number of model components.

property num_samples

Number of samples.

property pdf

Gaussian Mixture PDF.

property scale_factor

Scaling factor for log-transformed support.

property stds

Standard deviation of each component.

property support

Distribution support.

property support_size

Size of support.

property ubound

Upper bound of support.

class flyqma.annotation.mixtures.univariate.UnivariateMixture(*args, values=None, **kwargs)[source]

Univariate Gaussian mixture model.

Attributes:

values (array like) - values to which model was fit

Inherited attributes:

See sklearn.mixture.GaussianMixture

estimate_required_samples(SNR=5.0)[source]

Returns minimum number of averaged samples required to achieve the specified signal to noise (SNR) ratio.

classmethod from_logsample(sample, n=3, max_iter=10000, tol=1e-08, covariance_type='diag', n_init=10)[source]

Instantiate from log-transformed sample.

classmethod from_parameters(mu, sigma, weights=None, values=None, **kwargs)[source]

Instantiate model from parameter vectors.

classmethod from_sample(sample, n, **kwargs)[source]

Instantiate from log-normally distributed sample.

get_component_pdf(idx, weighted=True)[source]

Returns PDF for indexed component.

logsample(N)[source]

Returns <N> samples of log-transformed variable.

multi_logsample(N, m=10)[source]

Returns <N> log-transformed samples as well as <N> log-transformed samples averaged over <m> other samples from the same component.

multi_sample(N, m=10)[source]

Returns <N> samples as well as <N> samples averaged over <m> other samples from the same component.

sample(N)[source]

Returns <N> samples of variable.

sample_component(component_idx, N)[source]

Returns <N> log-transformed samples from indexed component.

class flyqma.annotation.mixtures.bivariate.BivariateMixture(*args, values=None, **kwargs)[source]

Bivariate Gaussian mixture model.

Inherited attributes:

values (array like) - values to which model was fit

See sklearn.mixture.GaussianMixture

get_marginal_mixture(margin)[source]

Returns univariate mixture model for specified <margin>.

class flyqma.annotation.mixtures.bivariate.BivariateMixtureProperties[source]

Extension properties for bivariate mixtures.

property extent

Extent for x and y axes.

property supportx

Support for first dimension.

class flyqma.annotation.mixtures.visualization.BivariateVisualization[source]

Visualization methods for bivariate mixture models.

property tick_positions

Tick positions.

visualize(size_ratio=4, figsize=(2, 2), contours=None, **kwargs)[source]

Visualize joint and marginal distributions.

class flyqma.annotation.mixtures.visualization.MixtureVisualization[source]

Visualization methods for mixture models.

property summary

Returns text-based summary of mixture model.

flyqma.annotation.mixtures.visualization.figure(func)[source]

Decorator for creating axis.

flyqma.annotation.mixtures.visualization.surface_figure(func)[source]

Decorator for creating joint axis.

Model Selection

Tools for statistical model selection.

class flyqma.annotation.model_selection.univariate.SelectionIO[source]

Methods for saving and loading a model selection instance.

classmethod load(path)[source]

Load model selection instance from file.

Args:

path (str) - model selection directory

Returns:

selector (UnivariateModelSelection derivative)

static load_model(path)[source]

Load model from <path> directory.

save(dirpath, image=False, **kwargs)[source]

Save classifier to specified path.

Args:

dirpath (str) - directory in which classifier is to be saved

image (bool) - if True, save model image

kwargs: keyword arguments for image rendering

Returns:

path (str) - model selection directory

class flyqma.annotation.model_selection.univariate.UnivariateModelSelection(values, attribute, min_num_components=3, max_num_components=8, num_labels=3, models=None)[source]

Class for performing univariate mixture model selection. The optimal model is chosen based on BIC score.

property AIC

AIC scores for each model.

property AIC_optimal

Model with AIC optimal number of components.

property BIC

BIC scores for each model.

property BIC_optimal

Model with BIC optimal number of components.

static fit_model(values, num_components, num_labels, **kwargs)[source]

Fit model with specified number of components.

fit_models()[source]

Fit model with each number of components.

property models

List of models ordered by number of components.

property parameters

Dictionary of instance parameters.

class flyqma.annotation.model_selection.bivariate.BivariateModelSelection(values, attribute, min_num_components=3, max_num_components=8, num_labels=3, models=None)[source]

Bivariate extension for model selection.

static fit_model(values, num_components, num_labels, **kwargs)[source]

Fit model with specified number of components.

static load_model(path)[source]

Load model from <path> directory.

class flyqma.annotation.model_selection.visualization.ModelSelectionVisualization[source]

Methods for visualizing model selection procedure.

plot_models(panelsize=(3, 2), **kwargs)[source]

Plot model for each number of components.

Label Assignment

Tools for unsupervised classification of cell measurements.

class flyqma.annotation.classification.classifiers.Classifier(values, attribute=None, num_labels=3, log=True, cmap=None)[source]

Classifier base class. Children of this class must possess a means attribute, as well as a predict method.

Attributes:

values (array like) - basis for clustering

attribute (str or list) - attribute(s) used to determine labels

log (bool) - indicates whether clustering performed on log values

num_labels (int) - number of output labels

classifier (vectorized func) - maps value to label_id

labels (np.ndarray[int]) - predicted labels

cmap (matplotlib.colors.ColorMap) - colormap for label_id

parameters (dict) - {param name: param value} pairs

fig (matplotlib.figures.Figure) - histogram figure

build_classifier()[source]

Build function that returns the most probable label for each of a series of values.

build_colormap(cmap, vmin=-1)[source]

Build normalized colormap for class labels.

Args:

cmap (matplotlib.colormap)

vmin (float) - lower bound for colorscale

Returns:

colormap (func) - function mapping class labels to colors

evaluate_classifier(data)[source]

Assign class labels to <data>.

Args:

data (pd.DataFrame) - must contain necessary attributes

Returns:

labels (np.ndarray[int])

classmethod from_grouped_measurements(data, attribute, groupby=None, **kwargs)[source]

Fit classifier to data grouped by a specified feature.

Args:

data (pd.DataFrame) - measurement data

groupby (str) - attribute used to group measurement data

attribute (str or list) - attribute(s) on which to cluster

kwargs: keyword arguments for classifier

Returns:

classifier (Classifier derivative)

classmethod from_measurements(data, attribute, **kwargs)[source]

Fit classifier to data.

Args:

data (pd.DataFrame) - measurement data

attribute (str or list) - attribute(s) on which to cluster

kwargs: keyword arguments for classifier

Returns:

classifier (Classifier derivative)

set_cmap(cmap=None, vmin=0, vmax=None)[source]

Set colormap for class labels.

Args:

cmap (matplotlib.colormap)

vmin (int) - lower bound for color scale

vmax (int) - upper bound for color scale

show()[source]

Visualize classification.

class flyqma.annotation.classification.classifiers.ClassifierIO[source]

Methods for saving and loading classifier objects.

classmethod load(path)[source]

Load classifier from file.

Args:

path (str) - path to classifier directory

Returns:

classifier (Classifier derivative)

save(dirpath, data=False, image=True, extension=None, **kwargs)[source]

Save classifier to specified path.

Args:

dirpath (str) - directory in which classifier is to be saved

data (bool) - if True, save training data

image (bool) - if True, save labeled histogram image

extension (str) - directory name extension

kwargs: keyword arguments for image rendering

class flyqma.annotation.classification.classifiers.ClassifierProperties[source]

Properties for classifier objects.

property centroids

Means of each component (not log transformed).

property component_groups

List of lists of components for each label.

property component_to_label

Returns dictionary mapping components to labels. Mapping is achieved by k-means clustering the model centroids (linear scale).

property num_samples

Number of samples.

property order

Ordered component indices (low to high).

property values

Values for classifier.

class flyqma.annotation.classification.kmeans.KMeansClassifier(values, num_components=3, groups=None, log=True, **kwargs)[source]

K-means classifier.

Attributes:

groups (dict) - {cluster_id: label_id} pairs for merging clusters

component_to_label (vectorized func) - maps cluster_id to label_id

km (sklearn.cluster.KMeans) - kmeans object

classifier (vectorized func) - maps value to label_id

labels (np.ndarray[int]) - predicted labels

Inherited attributes:

values (array like) - basis for clustering

attribute (str or list) - attribute(s) on which to cluster

log (bool) - indicates whether clustering performed on log values

cmap (matplotlib.colors.ColorMap) - colormap for label_id

parameters (dict) - {param name: param value} pairs

fig (matplotlib.figures.Figure) - histogram figure

static fit(values, n)[source]

Fit n clusters to x

property means

Mean of each cluster.

predict(values)[source]

Predict which component each of <values> belongs to.

class flyqma.annotation.classification.mixtures.BivariateMixtureClassifier(values, num_components=3, num_labels=3, fit_kw={}, model=None, **kwargs)[source]

Bivariate mixed log-normal model classifier.

Attributes:

model (mixtures.BivariateMixture) - frozen bivariate mixture model

Inherited attributes:

values (np.ndarray[float]) - basis for clustering

attribute (list) - attributes on which to cluster

num_labels (int) - number of labels

num_components (int) - number of mixture components

classifier (vectorized func) - maps values to labels

labels (np.ndarray[int]) - predicted labels

log (bool) - indicates whether clustering performed on log values

cmap (matplotlib.colors.ColorMap) - colormap for labels

parameters (dict) - {param name: param value} pairs

static fit(values, num_components=3, **kwargs)[source]

Fit univariate gaussian mixture model.

Args:

values (np.ndarray[float]) - 1D array of log-transformed values

num_components (int) - number of model components

kwargs: keyword arguments for fitting

Returns:

model (mixtures.BivariateMixture)

marginalize(margin)[source]

Returns UnivariateMixtureClassifier for specified margin.

class flyqma.annotation.classification.mixtures.MixtureModelIO[source]

Methods for saving and loading classifier objects.

classmethod load(path)[source]

Load classifier from file.

Args:

path (str) - path to classifier directory

Returns:

classifier (Classifier derivative)

save(dirpath, data=False, image=True, extension=None, **kwargs)[source]

Save classifier to specified path.

Args:

dirpath (str) - directory in which classifier is to be saved

data (bool) - if True, save training data

image (bool) - if True, save labeled histogram image

extension (str) - directory name extension

kwargs: keyword arguments for image rendering

class flyqma.annotation.classification.mixtures.UnivariateMixtureClassifier(values, num_components=3, num_labels=3, fit_kw={}, model=None, **kwargs)[source]

Univariate mixed log-normal model classifier.

Attributes:

model (mixtures.UnivariateMixture) - frozen univariate mixture model

num_components (int) - number of mixture components

classifier (vectorized func) - maps values to labels

labels (np.ndarray[int]) - predicted labels

Inherited attributes:

values (np.ndarray[float]) - basis for clustering

num_labels (int) - number of output labels

log (bool) - indicates whether clustering performed on log values

cmap (matplotlib.colors.ColorMap) - colormap for labels

parameters (dict) - {param name: param value} pairs

build_classifier()[source]

Build function that returns the most probable label for each of a series of values.

build_posterior()[source]

Build function that returns the posterior probability of each label given a series of values.

evaluate_posterior(data)[source]

Returns posterior across components for <data>.

static fit(values, num_components=3, **kwargs)[source]

Fit univariate gaussian mixture model.

Args:

values (np.ndarray[float]) - 1D array of log-transformed values

num_components (int) - number of model components

kwargs: keyword arguments for fitting

Returns:

model (mixtures.UnivariateMixture)

property means

Mean of each component.

property num_components

Number of model components.

predict(values)[source]

Predict which component each of <values> belongs to.

predict_proba(values)[source]

Predict the posterior probability with which each of <values> belongs to each component.

class flyqma.annotation.classification.visualization.MixtureVisualization[source]

Methods for visualizing a mixture-model based classifier.

property component_cdfs

Returns weighted CDF of each component over support.

property component_pdfs

Weighted component PDFs over support.

property ecdf

Empirical CDF over support.

property epdf

Empirical PDF over support.

property esupport

Empirical support vector (sorted values).

property label_colors

RGB color for each class label.

property pdf

Model PDF over support.

property support

Model support.

property support_labels

Labels for support vector.

Spatial Analysis

Tools for analyzing the 2D spatial arrangement of cells.

class flyqma.annotation.spatial.triangulation.LocalTriangulation(*args, **kwargs)[source]

Triangulation with edge distance filter.

Attributes:

edge_list (np.ndarray[int]) - (from, to) node pairs

edge_lengths (np.ndarray[float]) - euclidean length of each edge

property angle_threshold

Predicted upper bound on edge angles.

property angles

Angle on [0, 2p] interval.

compile_edge_list()[source]

Returns list of (node_from, node_to) tuples.

property edge_angles

Angular distance of each edge about origin.

property edge_radii

Minimum node radius in each edge.

property edges

Filtered edges.

static evaluate_edge_lengths(edge_list, x, y)[source]

Returns 1D array of edge lengths.

classmethod filter_edges(nodes, edges, lengths, max_length=0.1)[source]

Returns all edges less than <max_length>, with at least one edge containing each node.

filter_hull(edges)[source]

Returns all edges not on the convex hull.

filter_longest_edge(edges, edge_lengths)[source]

Returns all edges except the longest edge in each triangle.

classmethod filter_outliers(nodes, edges, lengths)[source]

Returns all edges whose lengths are not outliers, with at least one edge containing each node.

static find_disconnected_nodes(nodes, edges)[source]

Returns boolean array of nodes not included in edges.

static find_first_edge(edges, node)[source]

Returns index of first edge containing <node>.

property hull

Convex hull.

static is_outlier(points, threshold=3.0)[source]

Returns a boolean array with True if points are outliers and False otherwise.

Args:

points (np.ndarray[float]) - 1-D array of observations

threshold (float) - Maximum modified z-score. Observations with a modified z-score (based on the median absolute deviation) greater are classified as outliers.

Returns:

mask (np.ndarray[bool])

References:

Boris Iglewicz and David Hoaglin (1993), “Volume 16: How to Detect and Handle Outliers”, The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

property nodes

All nodes.

property num_triangles

Number of triangles.

property radii

Radius.

property size

Number of points.

class flyqma.annotation.spatial.graphs.CommunityDetection[source]

Methods for detecting communities in a Graph.

assign_community(level=None, key='community')[source]

Assign communities using InfoMap clustering.

Args:

level (int) - module level at which aggregation occurs, starting from the finest resolution

key (str) - name of community attribute

detect_communities(**kwargs)[source]

Detect communities using InfoMap clustering.

Accepts keyword arguments for InfoMap, including:

twolevel (bool) - if True, perform two-level clustering, otherwise defaults to multi-level clustering

N (int) - number of trials

class flyqma.annotation.spatial.graphs.Graph(data, xykey=None)[source]

Object provides an undirected unweighted graph connecting adjacent cells.

Attributes:

data (pd.DataFrame) - cell measurement data (nodes)

xykey (list) - attribute keys for node x/y positions

G (nx.Graph) - undirected graph instance

nodes (np.ndarray[int]) - node indices

edges (np.ndarray[int]) - pairs of connected node indices

node_map (vectorized func) - maps positional index to node index

position_map (vectorized func) - maps node index to positional index

tri (matplotlib.tri.Triangulation) - triangulation of node positions

copy()[source]

Returns deep copy of graph instance.

get_correlations(attribute, log=True)[source]

Returns SpatialCorrelation object for <attribute>.

Args:

attribute (str) - name of attribute

log (bool) - if True, log-transform attribute values

Returns:

correlations (SpatialCorrelation)

get_networkx(*node_attributes)[source]

Returns networkx instance of graph.

Args:

node_attributes (str) - attributes to be added for each node

get_subgraph(ind)[source]

Instantiate subgraph from DataFrame indices.

class flyqma.annotation.spatial.graphs.GraphVisualizationMethods[source]

Methods for visualizing a Graph instance.

label_triangles(label_by='genotype')[source]

Label each triangle with most common node attribute value.

Args:

label_by (str) - node attribute used to label each triangle

Returns:

labels (np.ndarray[int]) - labels for each triangle

plot_edges(ax=None, **kwargs)[source]

Plot triangulation edges.

Args:

ax (matplotlib.axes.AxesSubplot)

kwargs: keyword arguments for matplotlib.pyplot.triplot

plot_triangles(label_by='genotype', cmap=None, ax=None, **kwargs)[source]

Plot triangle faces using tripcolor.

Args:

label_by (str) - data attribute used to color each triangle

cmap (matplotlib.colors.ColorMap) - colormap for attribute values

ax (matplotlib.axes.AxesSubplot)

kwargs: keyword arguments for plt.tripcolor

show(ax=None, colorby=None, disconnect=False, **kwargs)[source]

Visualize graph.

Args:

ax (matplotlib.axes.AxesSubplot) - if None, create figure

colorby (str) - node attribute used to assign node/edge colors

disconnect (bool) - if True, remove edges between nodes whose colorby values differ

kwargs: keyword arguments for NetworkxGraphVisualization.draw

class flyqma.annotation.spatial.graphs.NetworkxGraphVisualization(G, pos)[source]

Object for visualizing a NetworkX Graph object.

Attributes:

G (nx.Graph) - networkx graph object

pos (np.ndarray[float]) - 2D node positions

build_cmap(colorby)[source]

Build colormap.

draw(ax=None, colorby=None, edge_color='k', node_color='k', cmap=None, **kwargs)[source]

Draw graph.

Args:

ax (matplotlib.axes.AxesSubplot) - axis on which to draw graph

colorby (str) - node attribute on which nodes/edges are colored

edge_color, node_color (str) - edge/node colors, overrides colorby

node_cmap (matplotlib.colors.ColorMap) - node colormap

class flyqma.annotation.spatial.graphs.SpatialProperties[source]

Spatial properties for Graph objects.

property distance_matrix

Euclidean distance matrix between all nodes.

property edge_lengths

Unique edge lengths.

static evaluate_fluctuations(values)[source]

Construct pairwise fluctuation matrix for <values>.

Args:

values (1D np.ndarray[float]) - attribute values

Returns:

fluctuations (2D np.ndarray[float]) - pairwise fluctuations

get_fluctuations_matrix(attribute, log=True)[source]

Returns normalized pairwise fluctuations of <attribute> value for each node in the graph.

Args:

attribute (str) - name of attribute

log (bool) - if True, log-transform attribute values

Returns:

fluctuations (2D np.ndarray[float]) - pairwise fluctuations

static get_matrix_upper(matrix)[source]

Return upper triangular portion of a 2-D matrix.

Parameters:

matrix (2D np.ndarray)

Returns:

upper (1D np.ndarray) - upper triangle, ordered row then column

property median_edge_length

Median edge length.

property node_positions

Assign 2D coordinate positions to nodes.

property node_positions_arr

N x 2 array of node coordinates, ordered by positional index.

property unique_distances

Upper triangular portion of euclidean distance matrix.

class flyqma.annotation.spatial.graphs.TopologicalProperties[source]

Topological properties for Graph objects.

property adjacency

Adjacency matrix ordered by <self.nodes>.

property adjacency_positional

Adjacency matrix ordered by positional index in <self.data>.

property edge_list

Distance-filtered edges as (from, to) tuples.

property edges

Distance-filtered edges.

property nodes

Unique nodes in graph.

property nodes_order

Indices that sort nodes by positional index in <self.data>.

property num_nodes

Number of nodes.

class flyqma.annotation.spatial.graphs.WeightFunction(data, weighted_by='r', distance=False)[source]

Object for weighting graph edges by similarity.

Attributes:

data (pd.DataFrame) - nodes data

weighted_by (str) - node attribute used to assess similarity

values (pd.Series) - node attribute values

distance (bool) - if True, weights edges by distance

assess_weights(edges, logratio=False)[source]

Evaluate edge weights normalized by mean difference in node values.

Args:

edges (list of (i, j) tuples) - edges between nodes i and j

logratio (bool) - if True, weight edges by logratio

Returns:

weights (np.ndarray[float]) - edge weights

difference(i, j)[source]

Evaluate difference in values between nodes i and j.

Args:

i, j (ind) - node indices

Returns:

difference (float)

logratio(i, j)[source]

Evaluate log ratio between nodes i and j.

Args:

i, j (ind) - node indices

Returns:

logratio (float)

class flyqma.annotation.spatial.graphs.WeightedGraph(data, weighted_by, xykey=None, logratio=True, distance=False)[source]

Object provides an undirected weighted graph connecting adjacent cells. Edge weights are evaluated based on the similarity of expression between pairs of connected nodes. Node similariy is based on the cell measurement data attribute specified by the ‘weighted_by’ parameter.

Attributes:

weighted_by (str) - data attribute used to weight edges

imap (spatial.InfoMap) - community detection

community_labels (np.ndarray[int]) - community label for each node

logratio (bool) - if True, weight edges by log ratio

distance (bool) - if True, weights edges by distance rather than similarity

Inherited attributes:

data (pd.DataFrame) - cell measurement data (nodes)

xykey (list) - attribute keys for node x/y positions

nodes (np.ndarray[int]) - node indices

edges (np.ndarray[int]) - pairs of connected node indices

node_map (vectorized func) - maps positional index to node index

position_map (vectorized func) - maps node index to positional index

tri (matplotlib.tri.Triangulation) - triangulation of node positions

property edge_list

Distance-filtered edges as (from, to, weight) tuples.

evaluate_edge_weights()[source]

Evaluate edge weights.

Returns:

weights (np.ndarray[float]) - edge weights

class flyqma.annotation.spatial.correlation.CharacteristicLength(correlation, fraction_of_max=0.01)[source]

Class for determining the characteristic length over which correlations decay.

property characteristic_length

Characteristic decay length.

static extract_decay(correlation, fraction_of_max)[source]

Extract decay.

static fit(x, y, model)[source]

Fit exponential decay model to decay vectors.

static model(x, a, b)[source]

Exponential decay model.

plot_fit(ax, **kwargs)[source]

Plot model fit.

plot_measured(ax, **kwargs)[source]

Plot measured correlation decay.

property x_normed

Distance vector normalized by maximum value.

property yp

Predicted correlation values.

class flyqma.annotation.spatial.correlation.CorrelationVisualization[source]

Visualization methods for SpatialCorrelation.

class flyqma.annotation.spatial.correlation.SpatialCorrelation(d_ij=None, C_ij=None)[source]

Container for correlations between 1-D timeseries.

Attributes:

d_ij (np array) - pairwise separation distances between measurements

C_ij (np array) - normalized pairwise fluctuations between measurements

class flyqma.annotation.spatial.infomap.CommunityAggregator(infomap)[source]

Tool for hierarchical aggregation of communities.

class flyqma.annotation.spatial.infomap.InfoMap(edges, **kwargs)[source]

Object for performing infomap flow-based community detection.

Attributes:

infomap (infomap.Infomap) - infomap object

node_to_module (dict) - {node: module} pairs

classifier (vectorized func) - maps nodes to modules

aggregator (CommunityAggregator)

build_classifier()[source]

Construct node to module classifier.

Returns:

node_to_module (dict) - {node: module} pairs

classifier (vectorized func) - maps nodes to modules

static build_network(edges, twolevel=False, N=25)[source]

Compile InfoMap object from graph edges.

Args:

twolevel (bool) - if True, perform two-level clustering

N (int) - number of trials

property max_depth

Maximum tree depth.

run(report=False)[source]

Run infomap community detection.

Args:

report (bool) - if True, print number of modules found

class flyqma.annotation.spatial.sampling.CommunitySampler(graph, attr, depth=1.0, log=True, twolevel=False)[source]

Class for sampling node attributes averaged over local community.

Attributes:

graph (spatial.Graph) - graph instance

G (nx.Graph) - graph with node attribute

attr (str) - attribute to be averaged over neighbors

depth (int) - mean correlation lifetime

level (int) - hierarchical level at which clusters are merged

log (bool) - if True, log-transform values before averaging

twolevel (bool) - if True, use two-level community clustering

autocorrelate(include_distances=False)[source]

Returns autocorrelation versus community depth.

Args:

include_distances (bool) - return mean separate distances

Returns:

levels (list) - clustering depths, starting from finest resolution

correlations (list) - mean correlation within communities

<optional> distances (list) - mean pairwise separation distance

average_over_neighbors()[source]

Average attribute value over all members of the community encompassing each node.

property averaged_attr

Name of averaged attribute.

property clustering_level

Highest clustering level at which the mean correlation remains above <self.depth> multiples of the decay constant.

property neighbors

Dictionary of neighbor indices keyed by node indices.

property size_attr

Neighborhood size attribute name.

property z_attr

Name of z-scored attribute.

class flyqma.annotation.spatial.sampling.NeighborSampler(graph, attr, depth=1, log=True)[source]

Class for sampling node attributes averaged over neighbors.

Attributes:

graph (spatial.Graph) - graph instance

G (nx.Graph) - graph with node attribute

attr (str) - attribute to be averaged over neighbors

depth (int) - maximum number of edges connecting neighbors

log (bool) - if True, log-transform values before averaging

property G

NetworkX graph instance.

add_attribute_to_graph()[source]

Add attribute to networkx graph object.

property attr_used

Name of attribute used to access graph data.

average_over_neighbors()[source]

Average attribute value over all neighbors adjacent to each node.

property averaged_attr

Name of averaged attribute.

property data

Graph data.

property keys

List of attribute names.

classmethod multisample(attr, *graphs, **kwargs)[source]

Generate composite sample from one or more <graphs>.

Args:

attr (str) - attribute to be averaged over neighbors

graphs (spatial.Graph) - one or more graph instances

kwargs: keyword arguments for sampler

Returns:

sample (np.ndarray[float]) - 2D array of sampled values, first column contains cell measurements while the second column contains measurements averaged over the neighbors of each cell

keys (list of str) - attribute keys for sampled data

property neighbors

Dictionary of neighbor indices keyed by node indices.

property node_values

Vector of attribute values for each node.

property node_values_dict

Dictionary of attribute values, keyed by node index.

property num_nodes

Number of nodes.

property sample

Returns bivariate sample combining each node’s attribute value with the average attribute value in its neighborhood.

property size_attr

Neighborhood size attribute name.

class flyqma.annotation.spatial.sampling.RadialSampler(graph, attr, depth=1.0, log=True)[source]

Class for sampling node attributes averaged within a predetermined radius of each node.

Attributes:

graph (spatial.Graph) - graph instance

G (nx.Graph) - graph with node attribute

attr (str) - attribute to be averaged over neighbors

depth (int) - hierarchical level to which communities are merged

log (bool) - if True, log-transform values before averaging

length_scale (float) - characteristic length scale of the graph

radius (float) - radius of sampling region surrounding each measurement

average_over_neighbors()[source]

Average attribute value over all nodes within the specified radius of each node.

property averaged_attr

Name of averaged attribute.

property distance_matrix

Euclidean distance matrix between nodes (ordered by position in <self.data>).

property neighbors

Dictionary of neighbor positional indices keyed by node indices.

property size_attr

Neighborhood size attribute name.

flyqma.annotation.spatial.timeseries.apply_custom_roller(func, x, **kwargs)[source]

Apply function to rolling window.

Args:

func (function) - function applied to each window, returns 1 x N_out

x (np.ndarray) - ordered samples, length N

kwargs: keyword arguments for window specification

Returns:

fx (np.ndarray) - function output for each window, N/resolution x N_out

flyqma.annotation.spatial.timeseries.bootstrap(x, func=<function mean>, confidence=95, N=1000)[source]

Returns point estimate obtained by parametric bootstrap.

Args:

x (np.ndarray) - ordered samples, length N

func (function) - function applied to each bootstrap sample

confidence (float) - confidence interval, between 0 and 100

N (int) - number of bootstrap samples

Returns:

interval (np.ndarray) - confidence interval bounds

flyqma.annotation.spatial.timeseries.detrend_signal(x, window_size=99, order=1)[source]

Detrend and scale fluctuations using first-order univariate spline.

Args:

x (np array) -ordered samples

window_size (int) - size of interpolation window for lowpass filter

order (int) - spline order

Returns:

residuals (np array) - detrended residuals

trend (np array) - spline fit to signal

flyqma.annotation.spatial.timeseries.get_binned_mean(x, window_size=100)[source]

Returns mean values within non-overlapping sequential windows.

Args:

x (np.ndarray) - ordered samples, length N

window_size (int) - size of window, W

Returns:

means (np.ndarray) - bin means, N/W x 1

flyqma.annotation.spatial.timeseries.get_rolling_gaussian(x, window_size=100, resolution=10)[source]

Returns gaussian fit within sliding window.

Args:

x (np.ndarray) - ordered samples

window_size (int) - size of window

resolution (int) - sampling interval

Returns:

model (scipy.stats.norm)

flyqma.annotation.spatial.timeseries.get_rolling_mean(x, **kw)[source]

Compute rolling mean. This implementation permits flexible sampling intervals and multi-dimensional time series, but is slower than get_running_mean for 1D time series.

Args:

x (np.ndarray) - ordered samples, length N

kw: arguments for window specification

Returns:

means (np.ndarray) - moving average of x, N/resolution x 1

flyqma.annotation.spatial.timeseries.get_rolling_mean_interval(x, window_size=100, resolution=1, confidence=95, nbootstraps=1000)[source]

Evaluate confidence interval for moving average of ordered values.

Args:

x (np.ndarray) - ordered samples, length N

window_size (int) - size of window, W

resolution (int) - sampling interval

confidence (float) - confidence interval, between 0 and 100

nbootstraps (int) - number of bootstrap samples

Returns:

interval (np.ndarray) - confidence interval bounds, N/resolution x 2

flyqma.annotation.spatial.timeseries.get_rolling_window(x, window_size=100, resolution=1)[source]

Return array slices within a rolling window.

Args:

x (np.ndarray) - ordered samples, length N

window_size (int) - size of window, W

resolution (int) - sampling interval

Returns:

windows (np.ndarray) - sampled values, N/resolution x W

flyqma.annotation.spatial.timeseries.get_running_mean(x, window_size=100)[source]

Returns running mean for a 1D vector. This is the fastest implementation, but is limited to one-dimensional arrays and doesn’t permit interval specification.

Args:

x (np.ndarray) - ordered samples, length N

window_size (int) - size of window, W

Returns:

means (np.ndarray) - moving average of x

flyqma.annotation.spatial.timeseries.plot_mean(ax, x, y, label=None, ma_type='sliding', window_size=100, resolution=1, line_color='k', line_width=1, line_alpha=1, linestyle=None, markersize=2, smooth=False, **kw)[source]

Plot moving average.

Args:

x, y (array like) - timeseries data

ax (matplotlib.axes.AxesSubplot) - axis which to which line is added

label (str) - data label

ma_type (str) - type of average used, either sliding, binned, or savgol

window_size (int) - size of window

resolution (int) - sampling interval

line_color, line_width, line_alpha, linestyle - formatting parameters

smooth (bool) - if True, apply secondary savgol filter

Returns:

line (matplotlib.lines.Line2D)

flyqma.annotation.spatial.timeseries.plot_mean_interval(ax, x, y, ma_type='sliding', window_size=100, resolution=10, nbootstraps=1000, confidence=95, color='grey', alpha=0.25, error_bars=False, lw=0.0)[source]

Adds confidence interval for line average (sliding window or binned) to existing axes.

Args:

x, y (array like) - data

ax (axes) - axis which to which line is added

ma_type (str) - type of average used, either ‘sliding’ or ‘binned’

window_size (int) - size of sliding window or bin (num of cells)

interval_resolution (int) - sampling resolution for confidence interval

nbootstraps (int) - number of bootstraps

confidence (float) - confidence interval, between 0 and 100

color, alpha - formatting parameters

flyqma.annotation.spatial.timeseries.savgol(x, window_size=100, polyorder=1)[source]

Perform Savitzky-Golay filtration of 1-D array.

Args:

x (np.ndarray) - ordered samples

window_size (int) - filter size

polyorder (int) - polynomial order

Returns:

trend (np.ndarray) - smoothed values

flyqma.annotation.spatial.timeseries.smooth(x, window)[source]

Returns smoothed moving average of <x> within <window>.

flyqma.annotation.spatial.timeseries.subsample(x, frac=1)[source]

Subsample array with replacement.

Args:

x (np.ndarray) - ordered samples, length N

frac (float) - sample size (fraction of array)

Returns:

sample (np.ndarray) - subsampled values