---
title: "04. Differential Expression Analysis"
format:
  html:
    toc: true
    toc-expand: 2
    toc-location: left
    code-fold: true
    code-summary: "Show code"
vignette: >
  %\VignetteIndexEntry{04. Differential Expression Analysis}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
#| echo: false
#| results: asis
in_pkgdown <- nzchar(Sys.getenv("IN_PKGDOWN"))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width = 8,
  fig.height = 6
)
if (!in_pkgdown) library(targets)

# Shared vignette utilities (safe_tar_read, show_target, render helpers)
utils_path <- system.file("vignette_utils.R", package = "coMMpass")
if (utils_path == "") {
  utils_path <- if (file.exists("../inst/vignette_utils.R")) "../inst/vignette_utils.R"
  else if (file.exists("inst/vignette_utils.R")) "inst/vignette_utils.R"
  else stop("Cannot find vignette_utils.R")
}
source(utils_path, local = TRUE)
```

```{r pkgdown-banner}
#| results: asis
#| eval: !expr in_pkgdown
#| echo: false
cat("::: {.callout-note}\n## Online documentation\nThis vignette displays pre-computed results. Run the targets pipeline locally for interactive analysis.\n:::\n")
```

## Overview

> See the [Glossary](glossary.html) for term definitions used throughout this project.

- Compares tumor vs normal or baseline vs relapse samples
- Three complementary methods: **DESeq2**, **edgeR**, **limma-voom**
- Consensus genes (significant in all three) used for pathway analysis
- Visualizations: PCA, volcano plot, MA plot, heatmap, method comparison

> **Note:** This vignette was built in CI with `sample_limit=20`. Local builds
> default to 200 samples. Numbers below reflect the CI subset.

## Method Comparison

Summary of DE genes detected by each method (DESeq2, edgeR, limma-voom) and their consensus overlap.

```{r method-summary, results='asis'}
#| echo: false
show_target("vig_de_method_table")
```

```{r method-barplot, results='asis'}
#| echo: false
show_target("vig_de_method_barplot")
```

## Principal Component Analysis

- Computed from top 500 most variable genes after [VST](https://bioconductor.org/packages/DESeq2/)
- VST stabilizes variance across the expression range
- Preferred over raw counts or logCPM for exploratory visualization

```{r pca-plot}
#| echo: false
#| results: asis
show_target("vig_pca_plot")
```

## Volcano Plot

- Statistical significance (-log10 adjusted p-value) vs biological effect size (log2 fold change)
- Genes in upper corners are both statistically significant and biologically meaningful

```{r volcano}
#| echo: false
#| results: asis
show_target("vig_volcano_plot")
```

## MA Plot

- Bland-Altman plot: mean expression (x-axis) vs fold change (y-axis)
- Reveals whether DE signals concentrate at particular expression levels
- Detects bias in fold-change estimates

```{r ma-plot}
#| echo: false
#| results: asis
show_target("vig_ma_plot")
```

## Heatmap of Top DE Genes

The heatmap shows Z-score scaled VST expression for the most significant
genes. Rows (genes) and columns (samples) are hierarchically clustered.

```{r heatmap, fig.height = 10}
#| echo: false
#| results: asis
show_target("vig_heatmap_plot")
```

## Consensus Genes

Genes identified as significant by all three methods (DESeq2, edgeR, limma)
represent the highest-confidence DE candidates.

```{r consensus, results='asis'}
#| echo: false
show_target("vig_consensus_text")
```

## Paired Longitudinal DE

For patients with samples at multiple timepoints (baseline and relapse),
a paired design (`~ patient_id + visit`) controls for inter-patient
variability and tests for within-patient expression changes over time.

```{r paired-de-text, results='asis'}
#| echo: false
show_target("vig_paired_de_text")
```

```{r paired-de-plot}
#| echo: false
#| results: asis
show_target("vig_paired_de_plot")
```

## Annotated DE Results

Gene symbols make Ensembl IDs interpretable. The `annotate_genes()` function
maps Ensembl IDs to HGNC symbols using
[MSigDB gene mappings](https://www.gsea-msigdb.org/gsea/msigdb/).

```{r annotated-de, results='asis'}
#| echo: false
show_target("vig_annotated_de_table")
```

## Pathway Enrichment Visualizations

### GSEA Enrichment Dot Plot

Gene Set Enrichment Analysis using the full ranked gene list against MSigDB Hallmark pathways.

```{r gsea-dotplot}
#| echo: false
#| results: asis
show_target("vig_gsea_dotplot")
```

### ORA Enrichment Bar Plot

Over-representation analysis testing whether consensus DE genes are enriched in specific pathways.

```{r ora-barplot}
#| echo: false
#| results: asis
show_target("vig_ora_barplot")
```

## Next Steps

- **Pathway analysis**: Consensus DE genes are tested for enrichment in
  MSigDB Hallmark, KEGG, and other collections. See the
  [pathway analysis vignette](pathway-analysis.html).
- **Survival stratification**: DE gene signatures can inform survival
  analysis. See the [survival analysis vignette](survival-analysis.html).
- **Cytogenetic context**: See the [EDA vignette](exploratory-analysis.html)
  for the cytogenetic landscape underlying these expression changes.

## Data Sources

Results in this vignette are derived from the
[MMRF CoMMpass study](https://portal.gdc.cancer.gov/projects/MMRF-COMMPASS)
(MMRF-COMMPASS, ~1,143 patients), downloaded via
[TCGAbiolinks](https://bioconductor.org/packages/TCGAbiolinks/).
The pipeline runs with a configurable `sample_limit` (default 200; CI uses 20).

For full citations, data access tiers, and the distinction between
pipeline data and synthetic test data, see the
[Data Sources](data-sources.html) vignette.

## Recent Changes

Recent project commits with lines added, files changed, and change categories.

```{r changelog}
#| echo: false
#| results: asis
show_target("vig_git_changelog")
```

## Reproducibility

<details>
<summary>Session Info (click to expand)</summary>

```{r session-info, eval=TRUE}
sessionInfo()
```

</details>
