.. image:: graphics/Northwestern_purple_RGB.png
   :width: 30%
   :align: right
   :alt: nulogo
   :target: https://amaral.northwestern.edu/


.. _start:

Getting Started
===============

The fastest way to familiarize yourself with Fly-QMA is to start with a working example. We recommend starting with the Fly-QMA `Tutorial <https://github.com/sbernasek/flyqma/blob/master/tutorial.ipynb>`_.

We also recommend reading the sections below before working with your own microscopy data.


Pipeline Overview
-----------------

.. figure:: graphics/pipeline.png
   :align: center
   :alt: flyqma-pipeline


Preparing Images
----------------

Fly-QMA uses a hierarchical :ref:`file structure <filestructure>` that is organized into three levels:

 1. **EXPERIMENT**: One or more tissue samples imaged under the same conditions.

 2. **STACK**: All images of a particular tissue sample, such as an individual z-stack.

 3. **LAYER**: All analysis relevant to a single 2-D image, such as an individual layer.

Before using Fly-QMA, microscopy data should be manually arranged into a collection of **STACK** directories that reside within a particular **EXPERIMENT** directory. Note that the actual names of these directories don't matter, but their hierarchical positions do:

.. code-block:: bash

   EXPERIMENT
   │
   ├── STACK 0         # First STACK directory
   ├── STACK 1
   └── ... STACK N     # Nth STACK directory

Each **STACK** directory should contain one or more 2-D images of a unique tissue sample. Images must be supplied in ``.tif`` format with ZXYC orientation. Each image file may depict a single layer, a regularly-spaced z-stack, or an irregularly-spaced collections of layers.

.. code-block:: bash

   EXPERIMENT
   │
   ├── STACK 0
   │   └── image.tif   # 3-D image with ZXYC orientation
   │
   ├── STACK 1
   │   └── image.tif
   │
   └── ... STACK N
           └── image.tif

.. warning::
   Image segmentation is performed on a layer-by-layer basis. Because cells often span several adjacent layers in a confocal z-stack, individual layers must be spaced far enough apart to avoid measuring the same cells twice. Overlapping layers may also be manually excluded using the provided :ref:`ROI Selector <selection_docs>`.


Loading Images
--------------

Next, instantiate an ``Experiment`` using the **EXPERIMENT** directory path:

.. code-block:: python

   >>> from flyqma.data import Experiment
   >>> experiment = Experiment('./EXPERIMENT')

This instance will serve as the entry-point for managing all of the data in the **EXPERIMENT** directory. Lower levels of the data hierarchy may then be accessed in a top-down manner. To access an individual stack:

.. code-block:: python

    # load specific stack
    stack = experiment.load_stack(stack_id)

    # alternatively, by sequential iteration
    for stack in experiment:
      stack.do_stuff()

The ``experiment.load_stack()`` method includes a ``full`` keyword argument that may be set to False in order to skip loading the stack's ``.tif`` file into memory. This offers some performance benefit when only saved measurement data are needed. Of course, loading the image data is necessary if any segmentation, measurement, ROI definition, or bleedthrough correction operations are to be performed.

To begin analyzing an image stack, layers must be added to the corresponding stack directory. Calling ``stack.initialize()`` creates a ``layers`` subdirectory containing an additional subdirectory for each 2-D layer in the 3-D image stack. A stack metadata file is also added to the **STACK** directory at this time, resulting in:

.. code-block:: bash

   EXPERIMENT
   │
   ├── STACK 0
   │   ├── image.tif
   │   ├── metadata.json   # stack metadata (number of layers, image bit depth, etc.)
   │   └── layers
   │       ├── 0           # first LAYER directory
   │       ├── 1
   │       └── ... M       # Mth LAYER directory
   │
   ├── STACK 1
   └── ... STACK N


Image layers may now be analyzed individually:

.. code-block:: python

    # load specific layer
    layer = stack.load_layer(layer_id)

    # alternatively, by sequential iteration
    for layer in stack:
      layer.do_stuff()

Methods acting upon lower level Stack or Layer instances are executed in place, meaning you won't lose progress by iterating across instances or by coming back to a given instance at a different time. This peristence is possible because new subdirectories and files are automatically added to the appropriate **STACK** or **LAYER** directory each time a segmentation, measurement, annotation, bleedthrough correction, or ROI selection is saved, overwriting any existing files of the same type.


Segmenting Images
-----------------

See the measurement :ref:`documentation <measurement_docs>` for a list of the specific parameters needed to customize the segmentation routine to suit your data. At a minimum, users must specify the background ``channel`` - that is, the index of the fluorescence channel used to identify cells or nuclei.

To segment an image layer, measure the segment properties, and save the results:

.. code-block:: python

   >>> channel = 2
   >>> layer.segment(channel)
   >>> layer.save()

Alternatively, to segment all layers within an image stack:

.. code-block:: python

   >>> channel = 2
   >>> stack.segment(channel, save=True)

In both cases, measurement data are generated on a layer-by-layer basis. To ensure that the segmentation results and corresponding measurement data will remain available after the session is terminated, specify ``save=True`` or call ``layer.save()``. This will save the segmentation parameters within a layer metadata file and create a ``segmentation`` subdirectory containing a segment labels mask. It will also create a ``measurements`` subdirectory containing the corresponding raw expression measurement data (measurements.hdf), as well as a duplicate version that is subject to all subsequent processing operations (processed.hdf). The raw measurements will remain the same until a new segmentation is executed and saved, while the processed measurements are updated each time a new operation is applied and saved. Following segmentation, each **LAYER** directory will resemble:

.. code-block:: bash

   EXPERIMENT
   │
   ├── STACK 0
   │   ├── image.tif
   │   ├── metadata.json
   │   └── layers
   │       ├── 0
   │       │   ├── metadata.json          # layer metadata (background channel, parameter values, etc.)
   │       │   ├── segmentation
   │       │   │   ├── labels.npy         # segmentation mask (np.ndarray[int])
   │       │   │   └── segmentation.png   # layer image overlayed with segment contours (optional)
   │       │   └── measurements
   │       │       ├── measurements.hdf   # raw expression measurements
   │       │       └── processed.hdf      # processed expression measurements
   │       ├── 1
   │       └── ... M
   ├── STACK 1
   └── ... STACK N


Measurement Data
----------------

Raw and processed measurement data are accessed via the ``Layer.measurements`` and ``Layer.data`` attributes, respectively. Both are stored in `Pandas DataFrames <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ in which each sample (row) reflects an individual segment. Columns depict a mixture of continuous and categorical features, including:

 - **segment_id:** unique integer identifier assigned to the segment
 - **pixel_count:** total number of pixels within the segment
 - **centroid_x:** mean x-coordinate of all pixels
 - **centroid_y:** mean y-coordinate of all pixels
 - **chN:** mean intensity of the Nth channel across all pixels
 - **chN_std:** standard deviation of the Nth channel across all pixels
 - **chN_normalized:** normalized mean intensity of the Nth channel

To aggregate processed measurement data across all layers in an image stack:

.. code-block:: python

   >>> stack_data = stack.aggregate_measurements()

Similarly, to aggregate across an entire experiment:

.. code-block:: python

   >>> experiment_data = experiment.aggregate_measurements()

Each of these operations returns measurement data in the same DataFrame format. However, in order to preserve the unique identity of each measurement the index is replaced by a hierarchical index depicting the unique layer and/or stack from which each segment was derived.


Analysis
--------

The measurement data stored in the ``layer.measurements`` attribute and ``measurements.hdf`` file reflect raw measurements of mean pixel intensity for each segment. These measurements may then be subject to one or more processing operations such as:

  - ROI definition
  - Bleedthrough correction
  - Automated annotation
  - Manual annotation

The objects that perform these operations all behave in a similar manner. They are manually defined for each disc (see the `Tutorial <https://github.com/sbernasek/flyqma/blob/master/tutorial.ipynb>`_ for examples), but may then be saved for repeated use. When saved, each object creates its own subdirectory within the corresponding **LAYER** directory:

.. code-block:: bash

    EXPERIMENT
    │
    ├── STACK 0
    │   ├── image.tif
    │   ├── metadata.json
    │   └── layers
    │       ├── 0
    │       │   ├── metadata.json
    │       │   ├── segmentation
    │       │   │   └── ...
    │       │   ├── measurements
    │       │   │   └── ...
    │       │   ├── annotation
    │       │   │   └── ...
    │       │   ├── correction
    │       │   │   └── ...
    │       │   └── selection
    │       │       └── ...
    │       ├── 1
    │       └── ... M
    ├── STACK 1
    └── ... STACK N

The added subdirectories include all the files and metadata necessary to load and execute the data processing operations performed by the respective object. Saved operations are automatically applied to the raw measurement data each time a layer is loaded, appending a number of additional features to the ``layer.data`` DataFrame:

 - **chN_predicted:** estimated bleedthrough contribution into the Nth channel
 - **chNc:** bleedthrough-corrected intensity of the Nth channel
 - **chNc_normalized:** normalized bleedthrough-corrected intensity of the Nth channel
 - **selected:** boolean flag indicating whether the segment falls within the ROI
 - **boundary:**  boolean flag indicating whether the segment lies within a boundary region
 - **manual_label:** segment label manually assigned using  `FlyEye Silhouette <https://www.silhouette.amaral.northwestern.edu/>`_

Furthermore, the annotation module may be used to assign one or more labels to each segment. Users are free to specify the names of these additional features as they please.