nulogo

Data Module

flyqma.data provides three levels of organization for managing cell measurement data:

  1. Layer: a 2D cross sectional image of an eye disc

  2. Stack: a set of layers obtained from the same eye disc

  3. Experiment: a collection of eye discs obtained under similar conditions

Images

Images are 2D arrays of pixel intensities recorded within one or more fluorescence channels.

class flyqma.data.images.ImageMultichromatic(im, labels=None)[source]

Object represents a multichromatic image.

Attributes:

im (np.ndarray[float]) - 2D array of pixel values in WHC format

Inherited attributes:

shape (array like) - image dimensions

mask (np.ndarray[bool]) - image mask

labels (np.ndarray[int]) - segment ID mask

get_channel(channel, copy=True)[source]

Returns monochrome image of specified color channel.

Args:

channel (int) - desired channel

copy (bool) - if True, instantiate from image copy

Returns:

image (ImageScalar) - monochrome image

to_RGB(channels_dict=None, copy=True)[source]

Returns RGB image of specified color channels.

Args:

channels_dict (dict) - RGB channels keyed by channel index

copy (bool) - if True, instantiate from image copy

Returns:

image (ImageMultichromatic) - RGB image

class flyqma.data.images.ImageScalar(im, labels=None)[source]

Object represents a monochrome image.

Attributes:

im (np.ndarray[float]) - 2D array of pixel values

shape (array like) - image dimensions

mask (np.ndarray[bool]) - image mask

labels (np.ndarray[int]) - segment ID mask

add_contour(ax, mask, lw=1, color='r')[source]

Adds border of specified contour.

add_contours(ax, lw=1, color='r', rasterized=False)[source]

Adds borders of all contours.

clahe(factor=8, clip_limit=0.01, nbins=256)[source]

Run CLAHE on reflection-padded image.

Args:

factor (float or int) - number of segments per dimension

clip_limit (float) - clip limit for CLAHE

nbins (int) - number of grey-scale bins for histogram

gaussian_filter(sigma=(1.0, 1.0))[source]

Apply 2D gaussian filter.

median_filter(radius=0, structure_dim=1)[source]

Apply 2D median filter.

preprocess(median_radius=2, gaussian_sigma=(2, 2), clip_limit=0.03, clip_factor=20)[source]

Preprocess image.

Args:

median_radius (int) - median filter size, px

gaussian_sigma (tuple) - gaussian filter size, px std dev

clip_limit (float) - CLAHE clip limit

clip_factor (int) - CLAHE clip factor

set_mean_mask()[source]

Mask values below mean.

set_otsu_mask()[source]

Mask values below otsu threahold.

show(segments=True, cmap=None, vmin=0, vmax=1, figsize=(10, 10), ax=None, **kwargs)[source]

Render image.

Args:

segments (bool) - if True, include cell segment contours

cmap (matplotlib.colors.ColorMap or str) - colormap or RGB channel

vmin, vmax (float) - bounds for color scale

figsize (tuple) - figure size

ax (matplotlib.axes.AxesSubplot) - if None, create axis

kwargs: keyword arguments for add_contours

Returns:

fig (matplotlib.figures.Figure)

Layers

Layers are 2D cross sectional images of an eye disc.

class flyqma.data.layers.Layer(path, im=None, annotator=None)[source]

Object represents a single imaged layer.

Attributes:

measurements (pd.DataFrame) - raw cell measurement data

data (pd.DataFrame) - processed cell measurement data

path (str) - path to layer directory

_id (int) - layer ID, must be an integer value

subdirs (dict) - {name: path} pairs for all subdirectories

metadata (dict) - layer metadata

labels (np.ndarray[int]) - segment ID mask

annotator (Annotation) - object that assigns labels to measurements

graph (Graph) - graph connecting cell centroids

include (bool) - if True, layer was manually marked for inclusion

Inherited attributes:

im (np.ndarray[float]) - 3D array of pixel values

shape (array like) - image dimensions

mask (np.ndarray[bool]) - image mask

labels (np.ndarray[int]) - segment ID mask

Properties:

color_depth (int) - number of fluorescence channels

num_cells (int) - number of cells detected by segmentation

bg_key (str) - key for channel used to generate segmentation

is_segmented (bool) - if True, layer has been segmented

has_trained_annotator (bool) - if True, layer has a trained annotator

build_graph(weighted_by, **graph_kw)[source]

Compile weighted graph connecting adjacent cells.

Args:

weighted_by (str) - attribute used to weight edges

graph_kw: keyword arguments, including:

xykey (list) - attribute keys for node x/y positions

logratio (bool) - if True, weight edges by log ratio

distance (bool) - if True, weights edges by distance

initialize()[source]

Initialize layer directory by:

  • Creating a layer directory

  • Removing existing segmentation directory

  • Saving metadata to file

process_measurements(measurements)[source]
Augment measurements by:
  1. incorporating manual selection boundary

  2. correcting for fluorescence bleedthrough

  3. assigning measurement labels

  4. marking clone boundaries

  5. assigning label concurrency information

Operations 3-5 require construction of a WeightedGraph object.

Args:

measurements (pd.DataFrame) - raw measurement data

Returns:

data (pd.DataFrame) - processed measurement data

class flyqma.data.layers.LayerAnnotation[source]

Annotation related methods for Layer class.

annotate()[source]

Annotate measurement data in place, also labeling boundaries between labeled regions and marking regions in which each label occurs.

apply_annotation(label='genotype', **kwargs)[source]

Assign labels to cell measurements in place.

Args:

label (str) - attribute name for predicted genotype

kwargs: keyword arguments for Annotator.annotate()

apply_concurrency(basis='genotype', min_pop=5, max_distance=10, **kwargs)[source]

Add boolean ‘concurrent_<basis>’ field to measurement data for each unique value of <basis> attribute.

Args:

basis (str) - attribute on which concurrency is established

min_pop (int) - minimum population size for inclusion of cell type

max_distance (float) - maximum distance threshold for inclusion

kwargs: keyword arguments for ConcurrencyLabeler

mark_boundaries(basis='genotype', max_edges=0)[source]

Mark boundaries between cells with disparate labels by assigning a boundary label to all cells that share an edge with another cell with a different label.

Args:

basis (str) - attribute used to define label

max_edges (int) - maximum number of edges for interior cells

show_annotation(channel, label, interior_only=False, selection_only=False, cmap=None, figsize=(8, 4), **kwargs)[source]

Visualize annotation by overlaying <label> attribute on the image of the specified fluoreascence <channel>.

Args:

channel (str) - fluorescence channel to visualize

label (str) - attribute containing cell type labels

interior_only (bool) - if True, exclude border regions

selection_only (bool) - if True, only add contours within ROI

cmap (matplotlib.ListedColorMap) - color scheme for celltype labels

figsize (tuple) - figure dimensions

kwargs: keyword arguments for plt.scatter

Returns:

fig (matplotlib.Figure)

train_annotator(attribute, save=False, logratio=True, num_labels=3, **kwargs)[source]

Train an Annotation model on the measurements in this layer.

Args:

attribute (str) - measured attribute used to determine labels

save (bool) - if True, save model selection routine

logratio (bool) - if True, weight edges by relative attribute value

num_labels (int) - number of allowable unique labels

kwargs: keyword arguments for Annotation, including:

sampler_type (str) - either ‘radial’, ‘neighbors’, ‘community’

sampler_kwargs (dict) - keyword arguments for sampler

min_num_components (int) - minimum number of mixture components

max_num_components (int) - maximum number of mixture components

addtl_kwargs: keyword arguments for Classifier

Returns:

selector (ModelSelection object)

class flyqma.data.layers.LayerCorrection[source]

Bleedthrough correction related methods for Layer class.

apply_correction(data)[source]

Adds bleedthrough-corrected fluorescence levels to the measurements dataframe.

Args:

data (pd.DataFrame) - processed cell measurement data

class flyqma.data.layers.LayerIO[source]

Methods for saving and loading Layer objects and their subcomponents.

add_subdir(dirname, dirpath)[source]

Add subdirectory.

find_subdirs()[source]

Find all subdirectories.

load(use_cache=True, graph=True)[source]

Load layer.

Args:

use_cache (bool) - if True, use cached measurement data, otherwise re-process the measurement data

graph (bool) - if True, load weighted graph

load_annotator()[source]

Load annotator instance.

load_correction()[source]

Load linear background correction.

Returns:

correction (LayerCorrection)

load_inclusion()[source]

Load inclusion flag.

load_labels()[source]

Load segment labels if they are available.

load_measurements()[source]

Load raw measurements.

load_metadata()[source]

Load metadata.

load_processed_data()[source]

Load processed data from file.

make_subdir(dirname)[source]

Make subdirectory.

save(segmentation=True, measurements=True, processed_data=True, annotator=False, segmentation_image=False, annotation_image=False)[source]

Save segmentation parameters and results.

Args:

segmentation (bool) - if True, save segmentation

measurements (bool) - if True, save measurement data

processed_data (bool) - if True, save processed measurement data

annotator (bool) - if True, save annotator

segmentation_image (bool) - if True, save segmentation image

annotation_image (bool) - if True, save annotation image

save_annotator(image=False, **kwargs)[source]

Save annotator instance.

Args:

image (bool) - if True, save annotation images

kwargs: keyword arguments for image rendering

save_measurements()[source]

Save raw measurements.

save_metadata()[source]

Save metadata.

save_processed_data()[source]

Save processed measurement data.

save_segmentation(image, **kwargs)[source]

Save segment labels, and optionally save a segmentation image.

Args:

image (bool) - if True, save segmentation image

kwargs: keyword arguments for image rendering

class flyqma.data.layers.LayerMeasurement[source]

Measurement related methods for Layer class.

apply_normalization(data)[source]

Normalize fluorescence intensity measurements by measured background channel intensity.

Args:

data (pd.DataFrame) - processed cell measurement data

import_segmentation_mask(path, channel, save=True, save_image=True)[source]

Import external segmentation mask and use it to generate measurements.

Provided mask must contain a 2-D array of positive integers in which a values of zero denotes the image background.

Args:

path (str) - path to segmentation mask

channel (int) - fluorescence channel used for segmentation

save (bool) - if True, copy segmentation to stack directory

save_image (bool) - if True, save segmentation image

measure()[source]

Measure properties of cell segments. Raw measurements are stored under in the ‘measurements’ attribute, while processed measurements are stored in the ‘data’ attribute.

segment(channel, preprocessing_kws={}, seed_kws={}, seg_kws={}, min_area=250)[source]

Identify nuclear contours by running watershed segmentation on specified background channel.

Args:

channel (int) - channel index on which to segment image

preprocessing_kws (dict) - keyword arguments for image preprocessing

seed_kws (dict) - keyword arguments for seed detection

seg_kws (dict) - keyword arguments for segmentation

min_area (int) - threshold for minimum segment size, px

Returns:

background (ImageScalar) - background image (after processing)

class flyqma.data.layers.LayerProperties[source]

Properties for Layer class:

color_depth (int) - number of fluorescence channels

num_cells (int) - number of cells detected by segmentation

bg_key (str) - key for channel used to generate segmentation

has_image (bool) - if True, image is loaded into memory

is_segmented (bool) - if True, layer has been segmented

has_trained_annotator (bool) - if True, layer has a trained annotator

property bg_key

DataFrame key for background channel.

property color_depth

Number of color channels.

property has_image

True if image is available.

property has_trained_annotator

Returns True if trained annotator is available.

property is_segmented

True if measurement data are available.

property num_cells

Number of cells detected by segmentation.

class flyqma.data.layers.LayerROI[source]

ROI related methods for Layer class.

define_roi(data)[source]

Adds a “selected” attribute to measurements dataframe. The attribute is True for cells that fall within the ROI.

Args:

data (pd.DataFrame) - processed measurement data

import_roi_mask(path, save=True)[source]

Import external ROI mask and use it to label measurement data.

Provided mask must contain a 2-D boolean array with the same dimensions as the raw image. True values denote the ROI. The mask may only contain a single contiguous ROI.

Args:

path (str) - path to ROI mask

save (bool) - if True, copy ROI mask to stack directory

classmethod mask_to_vertices(mask)[source]

Convert boolean mask to a list of vertices defining the border around the largest contiguous region.

Args:

mask (np.ndarray[bool]) - ROI mask, where True denotes the region. Note that the mask may only contain one contiguous component.

Returns:

vertices (np.ndarray[int]) - N x 2 array of vertices

static sort_clockwise(xycoords)[source]

Returns clockwise-sorted xy coordinates.

class flyqma.data.layers.LayerVisualization[source]

Methods for visualizing a layer.

build_attribute_mask(attribute, interior_only=False, selection_only=False, **kwargs)[source]

Use <attribute> value for each segment to construct an image mask.

Args:

attribute (str) - attribute used to label each segment

interior_only (bool) - if True, excludes clone borders

selection_only (bool) - if True, only include selected region

Returns:

mask (np.ma.Maskedarray) - masked image in which foreground segments are replaced with the attribute values

build_classifier_mask(classifier, interior_only=False, selection_only=False, **kwargs)[source]

Use segment <classifier> to construct an image mask.

Args:

classifier (annotation.Classifier object)

interior_only (bool) - if True, excludes clone borders

selection_only (bool) - if True, only include selected region

Returns:

mask (np.ma.Maskedarray) - masked image in which foreground segments are replaced with the assigned labels

plot_boundaries(ax, label_by='genotype', cmap=<matplotlib.colors.LinearSegmentedColormap object>, alpha=70, **kwargs)[source]

Plot boundaries of all <label_by> groups on <ax>.

plot_boundary(ax, label, label_by='genotype', color='r', alpha=70, **kwargs)[source]

Plot boundary of <label_by> groups with <label> on <ax>.

Stacks

Stacks are sets of layers obtained from the same eye disc.

class flyqma.data.stacks.Stack(path, bit_depth=None)[source]

Object represents a 3D RGB image stack.

Attributes:

path (str) - path to stack directory

_id (str) - stack ID

stack (np.ndarray[float]) - 3D RGB image stack

shape (tuple) - stack dimensions, (depth, X, Y, 3)

bit_depth (int) - bit depth of raw tif image

stack_depth (int) - number of layers in stack

color_depth (int) - number of fluorescence channels in stack

annotator (Annotation) - object that assigns labels to measurements

metadata (dict) - stack metadata

tif_path (str) - path to multilayer RGB tiff file

layers_path (str) - path to layers directory

annotator_path (str) - path to annotation directory

aggregate_measurements(selected_only=False, exclude_boundary=False, raw=False, use_cache=True)[source]

Aggregate measurements from each included layer.

Args:

selected_only (bool) - if True, exclude cells not marked for inclusion

exclude_boundary (bool) - if True, exclude cells on clone boundaries

raw (bool) - if True, aggregate raw measurements

use_cache (bool) - if True, used available cached measurement data

Returns:

data (pd.Dataframe) - measurement data (None if unavailable)

property bit_depth

Bit depth of raw image.

property color_depth

Number of fluorescence channels in stack.

property filename

Stack filename.

get_included_layers()[source]

Returns indices of included layers.

property included

Indices of included layers.

initialize(bit_depth)[source]

Initialize stack directory.

Args:

bit_depth (int) - bit depth of raw tif (e.g. 12 or 16)

property is_annotated

True if annotation is complete.

property is_initialized

Returns True if Stack has been initialized.

property is_segmented

True if segmentation is complete.

load_layer(layer_id=0, graph=True, use_cache=True, full=True)[source]

Load individual layer.

Args:

layer_id (int) - layer index

graph (bool) - if True, load layer graph

use_cache (bool) - if True, use cached layer measurement data

full (bool) - if True, load fully labeled RGB image

Returns:

layer (Layer)

prompt_initialization()[source]

Ask user whether to initialize all stack directories.

restore_directory()[source]

Restore stack directory to original state.

segment(channel, preprocessing_kws={}, seed_kws={}, seg_kws={}, min_area=250, save=True)[source]

Segment all layers using watershed strategy.

Args:

channel (int) - channel index on which to segment image

preprocessing_kws (dict) - keyword arguments for image preprocessing

seed_kws (dict) - keyword arguments for seed detection

seg_kws (dict) - keyword arguments for segmentation

min_area (int) - threshold for minimum segment size, px

save (bool) - if True, save measurement data for each layer

property selector_path

Path to model selection object.

property stack_depth

Number of layers in stack.

train_annotator(attribute, save=False, logratio=True, num_labels=3, **kwargs)[source]

Train an Annotation model on all layers in this stack.

Args:

attribute (str) - measured attribute used to determine labels

save (bool) - if True, save annotator and model selection routine

logratio (bool) - if True, weight edges by relative attribute value

num_labels (int) - number of allowable unique labels

kwargs: keyword arguments for Annotation, including:

sampler_type (str) - either ‘radial’, ‘neighbors’, ‘community’

sampler_kwargs (dict) - keyword arguments for sampler

min_num_components (int) - minimum number of mixture components

max_num_components (int) - maximum number of mixture components

addtl_kwargs: keyword arguments for Classifier

class flyqma.data.stacks.StackIO[source]

Methods for saving and loading a Stack instance.

static from_silhouette(filepath, bit_depth)[source]

Initialize stack from silhouette <filepath>.

Args:

path (str) - path to silhouette file

bit_depth (int) - bit depth of raw tif (e.g. 12 or 16)

Returns:

stack (flyqma.Stack)

static from_tif(filepath, bit_depth)[source]

Initialize stack from tif <filepath>.

Args:

path (str) - path to tif image file

bit_depth (int) - bit depth of raw tif (e.g. 12 or 16)

Returns:

stack (flyqma.Stack)

load_annotator()[source]

Load annotator from annotation directory.

load_image()[source]

Load 3D image from tif file.

load_metadata()[source]

Load available metadata.

save()[source]

Save stack metadata and annotator.

save_annotator(data=True)[source]

Save annotator to annotation directory.

save_metadata()[source]

Save metadata.

Experiments

Experiments are collections of stacks obtained under similar conditions.

class flyqma.data.experiments.Experiment(path)[source]

Object represents a collection of 3D RGB image stacks collected under the same experimental conditions.

Attributes:

path (str) - path to experiment directory

_id (str) - name of experiment

stack_ids (list of str) - unique stack ids within experiment

stack_dirs (dict) - {stack_id: stack_directory} tuples

count (int) - counter for stack iteration

aggregate_measurements(selected_only=False, exclude_boundary=False, raw=False, use_cache=True)[source]

Aggregate measurements from each stack.

Args:

selected_only (bool) - if True, exclude cells outside the ROI

exclude_boundary (bool) - if True, exclude cells on the border of labeled regions

raw (bool) - if True, use raw measurements from included discs

use_cache (bool) - if True, used available cached measurement data

Returns:

data (pd.Dataframe) - curated cell measurement data, which is None if no measurement data are found

initialize(bit_depth)[source]

Initialize a collection of image stacks.

Args:

bit_depth (int) - bit depth of raw tif (e.g. 12 or 16). Value will be read from the stack metadata if None is provided. An error is raised if no value is found.

property is_initialized

Returns True if Experiment has been initialized.

load_stack(stack_id, full=False, **kwargs)[source]

Load 3D RGB image stack.

Args:

stack_id (str or int) - desired stack

full (bool) - if True, load full 3D image from tif file

Returns:

stack (Stack)

prompt_initialization()[source]

Ask user whether to initialize all stack directories.

Silhouette Interface

Fly-QMA provides several tools for seemlessly exchanging data with NU FlyEye Silhouette.

class flyqma.data.silhouette_read.ReadSilhouette(path)[source]

Read-only interface to a FlyEye Silhouette file.

Attributes:

path (str) - path to Silhouette file

feed (dict) - feed file containing layer IDs

feud (dict) - feud file containing cell type labels

Properties:

is_flipped_about_yz (bool) - if True, invert about YZ plane

is_flipped_about_xy (bool) - if True, invert about XY plane

read_json(filename)[source]

Read contents of specified JSON file.

Args:

filename (str) - filename

Returns:

out (dict) - file contents

class flyqma.data.silhouette_read.ReadSilhouetteData(path, recompile=False)[source]

Read-only interface to data within a FlyEye Silhouette file.

Upon instantiation, individual cell measurements are aggregated into a data.cells.Cells compatible DataFrame.

Measurement data must be read on a layer-by-layer basis the first time a Silhouette object is instantiated. Following this initial reading, the aggregated measurement data are serialized and stored within the silhouette file. These serialized measurements may then be accessed directly during future use. The recompile flag indicates whether the serialized measurements should be ignored upon instantiation.

Attributes:

df (pd.DataFrame) - cell measurement data

Inherited attributes:

path (str) - path to Silhouette file

feed (dict) - feed file containing layer IDs

feud (dict) - feud file containing cell type labels

compile_measurements()[source]

Compile measurements from all layers (slow access).

property labels

pd.Series of labels keyed by (layer_id, segment_id).

load(recompile=False)[source]

Read all contour and orientation data from silhouette file.

Args:

recompile (bool) - if True, recompile measurements from all layers

load_measurements()[source]

Load serialized measurements (fast access).

static parse_contour(contour)[source]

Convert contour to list format.

Args:

contour (dict) - contour from silhouette file

Returns:

ctr_list (list) - values in data.cells.Cells compatible list format

read_contours(all_labels={}, include_unlabeled=False)[source]

Read contours from silhouette file.

Args:

all_labels (dict) - {layer_id: {contour_id: label}} for each layer

include_unlabeled (bool) - if True, include unlabeled segments

Returns:

df (pd.DataFrame) - data.cells.Cells compatible dataframe of contours

read_labels()[source]

Load segment labels from silhouette file.

Returns:

labels (dict) - {layer_id: {contour_id: label}} entries for each layer

save_measurements()[source]

Save serialized measurements for fast access.

class flyqma.data.silhouette_write.WriteSilhouette[source]

Methods for writing a stack to Silhouette readable format.

The Silhouette container includes a FEED file:

FEED.json

"orientation": {"flip_about_xy": false, "flip_about_yz": false},
"layer_ids": [ 0,1,2... ],
"params": { param_name: param_value ... } }
build_feud(label=None)[source]

Compile feud file with <label> field serving as annotations.

load_silhouette_labels()[source]

Load manually assigned labels from file.

Returns:

labels (pd.Series) - labels keyed by (layer_id, segment_id)

property silhouette_path

Path to Silhouette directory.

write_silhouette(dst=None, label=None, include_image=True, channel_dict=None)[source]

Write silhouette file.

Args:

dst (str) - destination directory

label (str) - field containing cell type annotations

include_image (bool) - save RGB image of each layer

channel_dict (dict) - RGB channel names, keyed by channel index. If none provided, defaults to the first three channels in RGB order.

class flyqma.data.silhouette_write.WriteSilhouetteLayer[source]

Methods for writing a Layer to Silhouette readable format. A layer file is structured as follows:

LAYER_ID.json :

{ “id”: LAYER_ID “imageFilename”: “LAYER_ID.png” “contours”: [ … contours … ]

{“centroid”: [CONTOUR_CENTROID_X, CONTOUR_CENTROID_Y], “color_avg”: {“b”: X, “g”: X, “r”: X}, “color_std”: {“b”: X, “g”: X, “r”: X}, “id”: CONTOUR_ID, “pixel_count”: CONTOUR_AREA, “points”: [[x1, y1], [x2, y2] … ]}}

build_contours(channel_dict)[source]

Convert dataframe to a list of contours (Silhouette format).

Args:

channel_dict (dict) - RGB channel names, keyed by channel index

Returns:

contours (list) - list of contour dictionaries

write_silhouette(dst, layer_id=None, include_image=True, channel_dict=None)[source]

Write silhouette compatible JSON to target directory.

Args:

dst (str) - destination directory

layer_id (int) - ID optionally used to override true layer ID

include_image (bool) - save layer image as png

channel_dict (dict) - RGB channel names, keyed by channel index. If none provided, defaults to the first three channels in RGB order.