📊 Data Analysis & Gating

A comprehensive guide to interpreting flow cytometry data — from FCS files and plot types to advanced high-dimensional analysis.

📚 Table of Contents

  1. The FCS File Format
  2. Display Types
  3. Scaling & Transformation
  4. Gating Strategies
  5. Common Gating Hierarchies
  6. Statistical Measures
  7. Quality Control
  8. High-Dimensional Analysis
  9. Batch Analysis & Standardization
  10. Software Overview
  11. Reporting & Publication Standards

1. The FCS File Format

Flow cytometry data is stored in the Flow Cytometry Standard (FCS) file format, maintained by the International Society for Advancement of Cytometry (ISAC). The current standard is FCS 3.1 (released 2010).

File Structure

An FCS file has four segments:

  1. HEADER: Fixed 256-byte block with byte offsets pointing to the other segments. Includes version string (e.g., "FCS3.1").
  2. TEXT: Key-value pairs containing metadata — parameter names ($PnN), channel counts ($TOT), acquisition date, laser settings, detector voltages, compensation matrix, etc. Delimited by a chosen separator character.
  3. DATA: The actual measurements. List-mode data where each row is one event and each column is one parameter. Stored as integers (for older files) or floating-point values. Typical parameters: FSC-A, FSC-H, SSC-A, SSC-H, FL1-A, FL2-A, etc., plus a Time parameter.
  4. ANALYSIS: Optional segment for gating results (rarely used; most analysis is done externally in software like FlowJo).
Key Metadata Keywords: $TOT (total events), $PAR (number of parameters), $PnN (parameter name for channel n), $PnS (short name), $PnR (range), $PnB (bits), $PnE (amplification), $SPILLOVER or $COMP (compensation matrix).

2. Display Types

Choosing the right plot type is critical for identifying populations, setting gates, and presenting results.

Plot TypeParametersBest ForLimitations
Dot Plot2Exploring data; identifying outliers; seeing every eventOverplotting at high event counts (dense regions appear solid)
Density / Pseudocolor Plot2Resolving dense populations; publication figuresColors can obscure rare events; color scales need context
Contour Plot2Clean population boundaries; publicationRare events (<1%) may be invisible between contour lines
Zebra Plot2Hybrid of contour and dot plot; shows rare events as dots outside contour linesCan be busy with complex data
Histogram1Single parameter expression; overlays; comparing conditionsOnly shows one dimension; misses relationships between parameters
Overlay Histogram1Comparing staining intensity across samples or conditionsMore than 3–4 overlays becomes difficult to read
Tip: For publications, pseudocolor or contour plots are preferred over dot plots because they better represent population density. Always include axis labels with the parameter name and fluorochrome/marker.

3. Scaling & Transformation

Flow cytometry data spans many orders of magnitude (from near-zero background to bright positive staining). Proper scaling is essential to visualize and gate data correctly.

Linear Scale

Values are displayed with equal spacing. Best for: scatter parameters (FSC, SSC), DNA content (cell cycle), and any parameter where the populations don't span more than ~1 decade.

Logarithmic Scale

Traditional 4-decade log scale compresses upper values and expands lower values. Good for most fluorescence parameters where positive and negative populations span several decades. Problem: Log scale cannot display zero or negative values — events get piled up on the axis.

Biexponential (Logicle / Hyperlog)

Modern flow cytometry's most important data transformation. Biexponential display behaves linearly near zero (and below zero) and logarithmically at higher values. This is critical because:

Common Mistake: Using log scale for compensated multi-color data will cause axis compression artifacts. Always use biexponential transformation for compensated fluorescence data. Most modern software (FlowJo, FCS Express) applies this by default.

Transformation Parameters

Biexponential transforms have adjustable parameters (width, positive/negative decades) that control the linear-to-log transition. In FlowJo, these are the "T" (top of scale) and "W" (width of linear range) parameters. If populations look compressed against the axis, try adjusting the width parameter.

4. Gating Strategies

Gating is the process of selecting subsets of events for analysis. Gates are drawn on plots as regions (rectangles, polygons, ellipses, or quadrants) that define populations of interest.

The Standard Gating Hierarchy

Almost every flow cytometry analysis follows this core gating sequence before any marker-specific analysis:

Universal Pre-Gating Hierarchy

1. Time Gate
Remove fluidic artifacts
2. Scatter Gate
FSC vs SSC: exclude debris
3. Singlet Gate
FSC-H vs FSC-A
4. Live/Dead Gate
Viability dye negative
5. Lineage Gate
Population of interest
Standard pre-gating sequence. Steps 1–4 should be applied to virtually every experiment before analyzing markers of interest.

Gate Types

Singlet Discrimination Deep Dive

Cell doublets (two cells stuck together) cause false positive events. They appear between populations on dot plots and corrupt statistics. Singlet discrimination removes them:

5. Common Gating Hierarchies

Below are standard gating strategies for the most commonly analyzed cell populations. All assume you've already applied the pre-gating hierarchy (time, scatter, singlets, live/dead).

T Cell Subsets

CD3+
T lymphocytes
CD4+ vs CD8+
Helper vs Cytotoxic
CD45RA vs CCR7
Naïve/CM/EM/EMRA

CD3+CD4+ = helper T cells. CD3+CD8+ = cytotoxic T cells. Further subdivision by memory markers: CCR7+CD45RA+ (naïve), CCR7+CD45RA− (central memory), CCR7−CD45RA− (effector memory), CCR7−CD45RA+ (TEMRA).

B Cells

CD19+ or CD20+
B lymphocytes
CD27 vs IgD
Naïve/Memory/Switched

NK Cells

CD3−
Exclude T cells
CD56+ and/or CD16+
NK cells

CD56brightCD16dim/− = regulatory/cytokine-producing NK. CD56dimCD16+ = cytotoxic NK.

Monocyte Subsets

CD14 vs CD16
On HLA-DR+ gate

CD14++CD16− = classical (85%). CD14++CD16+ = intermediate (5%). CD14+CD16++ = non-classical (10%).

Dendritic Cells

Lin− HLA-DR+
Exclude lineage markers
CD11c vs CD123
mDC vs pDC

Lineage cocktail (dump channel): CD3, CD14, CD16, CD19, CD20, CD56. Lin−HLA-DR+CD11c+ = myeloid DC. Lin−HLA-DR+CD123+ = plasmacytoid DC.

6. Statistical Measures

StatisticDefinitionWhen to Use
% PositiveFraction of events within a gate, as percentage of parent populationReporting frequency of cell subsets
MFI (Mean)Arithmetic mean fluorescence intensity of a gated populationExpression level comparison; sensitive to outliers
MFI (Median)Median fluorescence intensity; 50th percentile valuePreferred for skewed distributions (most flow data); robust to outliers
gMFI (Geometric Mean)Geometric mean; equivalent to mean of log-transformed dataLog-normally distributed data; traditional flow metric
CV (%)Coefficient of variation = (SD/Mean) × 100%Instrument QC (bead CV); assessing population spread
Robust CV (rCV)CV calculated from the median and robust SD (84.13th − 50th percentile)More stable for non-Gaussian data; used by some QC software
Stain Index(MFIpos − MFIneg) / (2 × SDneg)Evaluating antibody titration; comparing reagent performance; panel optimization
Tip: Always report median (not mean) fluorescence intensity for compensated flow cytometry data. Compensation spreads populations asymmetrically, making the arithmetic mean a poor measure of central tendency. Median is robust to this spreading effect.

7. Quality Control

The Time Parameter

The time parameter records the acquisition time of each event. Plotting any fluorescence parameter vs. time reveals fluidic instabilities:

Apply a time gate to exclude unstable regions before any analysis.

QC Beads

Bead TypePurposeFrequency
CS&T Beads (BD)Daily QC; sets PMT voltages; tracks instrument performance over time via Levey-Jennings plotsDaily (before experiments)
Rainbow Beads (Spherotech)Multi-peak beads spanning 3–4 decades; verify linearity and PMT responseDaily or weekly
MESF BeadsMolecules of Equivalent Soluble Fluorochrome; quantitate absolute fluorescenceFor quantitative studies
ABC Beads (Quantum Simply Cellular)Antibody Binding Capacity; quantitate antibody molecules per cellFor receptor density quantitation
Compensation BeadsAnti-Ig capture beads for single-stain compensation controlsEvery experiment with multiple colors

8. High-Dimensional Analysis

As panels grow beyond 15–20 parameters, manual sequential gating becomes impractical and biased. High-dimensional computational tools offer unbiased population discovery.

AlgorithmTypeWhat It DoesCommon Software
tSNEDimensionality reductionNon-linear projection of high-dimensional data to 2D. Preserves local structure; similar cells cluster together. Good for visualization.FlowJo, Cytobank, OMIQ, R
UMAPDimensionality reductionSimilar to tSNE but faster and better at preserving global structure. Increasingly preferred over tSNE.FlowJo, OMIQ, Python, R
FlowSOMClusteringSelf-Organizing Map + hierarchical clustering. Groups events into metaclusters representing cell populations. Fast; handles millions of events.R (FlowSOM package), OMIQ, Cytobank
PhenoGraphClusteringk-nearest neighbor graph-based clustering. Often produces more granular clusters than FlowSOM. Good for rare populations.R, Python, OMIQ
SPADETree visualizationSpanning-tree Progression of Density-normalized Events. Creates a tree structure showing population relationships.Cytobank, R
Diffusion MapTrajectory analysisReveals differentiation trajectories and developmental pathways.R (destiny), Python (scanpy)
PCADimensionality reductionLinear projection; finds axes of maximum variance. Good for initial exploration and removing correlated dimensions.All platforms
Important: tSNE and UMAP are visualization tools, not clustering tools. You should NOT draw gates on tSNE/UMAP plots — they distort distances and densities. Use them to discover populations, then define those populations using the original marker space.

9. Batch Analysis & Standardization

Cross-Sample Normalization

When comparing MFI values across experiments, dates, or sites, raw MFI is unreliable because it depends on instrument-specific factors (PMT voltage, laser power, optical alignment). Solutions:

Batch Effects

Common sources of batch effects include: reagent lot changes (especially tandem dyes), instrument service/recalibration, operator differences, sample processing time variations, and ambient temperature. Mitigate by: randomizing sample order, including internal controls, and using harmonization algorithms (e.g., CytoNorm, CytofBatchAdjust for CyTOF).

10. Software Overview

SoftwareTypePlatformKey Features
FlowJo (BD)DesktopMac/WindowsIndustry standard; intuitive gating; tSNE/UMAP/FlowSOM built-in; extensive plugin ecosystem; workspace sharing
FCS Express (De Novo)DesktopWindowsDirect export to PowerPoint; batch analysis; GxP compliance module; image cytometry support
Kaluza (Beckman Coulter)DesktopWindowsRadar plots; optimized for CytoFLEX data; merge/overlay capabilities
CytobankCloudBrowserCollaborative cloud platform; viSNE, SPADE, FlowSOM; ideal for multi-site studies
OMIQCloudBrowserModern cloud platform; opt-SNE, FlowSOM, PhenoGraph, UMAP; AI-assisted gating
SpectroFlo (Cytek)Acquisition + AnalysisWindowsSpectral unmixing; reference spectrum management; spectral cytometry-optimized
R (flowCore/openCyto)ProgrammingCross-platformFull programmatic control; reproducible pipelines; integration with Bioconductor
Python (FlowCytometryTools, CytoFlow)ProgrammingCross-platformMachine learning integration; scripting; Jupyter notebooks

11. Reporting & Publication Standards

MIFlowCyt (Minimum Information about Flow Cytometry)

MIFlowCyt is the ISAC-endorsed reporting standard for flow cytometry experiments. When publishing, your methods section should include:

Figure Preparation

Data Sharing: ISAC encourages depositing raw FCS files in public repositories like FlowRepository (flowrepository.org) and ImmPort (immport.org). Include the FlowRepository ID in your publication for reproducibility.