---
title: "00. Project Overview"
format:
  html:
    toc: true
    toc-expand: 2
    toc-location: left
    code-fold: true
    code-summary: "Show code"
vignette: >
  %\VignetteIndexEntry{00. Project Overview}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

This vignette provides a visual overview of the coMMpass-analysis pipeline:
its data flow, layer architecture, and a reading guide for navigating the
vignettes. All diagrams use clickable nodes linking to the relevant vignette
or external resource.

```{=html}
<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@11/dist/mermaid.esm.min.mjs';
  mermaid.initialize({
    startOnLoad: false,
    securityLevel: 'loose',
    theme: 'dark',
    themeVariables: {
      darkMode: true,
      background: '#1a1a2e',
      primaryColor: '#1a3a5c',
      primaryTextColor: '#ffffff',
      primaryBorderColor: '#3498db',
      lineColor: '#aaaaaa',
      textColor: '#ffffff',
      mainBkg: '#1a3a5c',
      nodeBorder: '#3498db'
    }
  });
  // Render all mermaid diagrams
  document.querySelectorAll('pre.mermaid').forEach(async (el) => {
    const id = el.id || 'mermaid-' + Math.random().toString(36).slice(2);
    const source = el.querySelector('script[type="text/plain"]');
    const graphDef = source ? source.textContent : el.textContent;
    const { svg } = await mermaid.render(id + '-svg', graphDef);
    el.innerHTML = svg;
  });
</script>
<style>
  pre.mermaid { background: transparent; border: none; text-align: center; }
  .mermaid-diagram { margin: 1em 0; }
</style>
```

## Data Flow Pipeline

::: {#fig-dataflow}

<pre class="mermaid" id="dataflow">
<script type="text/plain">
graph LR
  subgraph Acquisition
    GDC["GDC Portal"]
    RNA["RNA-seq Counts"]
    Clin["Clinical Data"]
  end

  subgraph Cleaning
    Clean["Data Cleaning"]
    QC["Quality Control"]
  end

  subgraph Analysis
    DE["Differential<br/>Expression"]
    Surv["Survival<br/>Analysis"]
    Path["Pathway<br/>Analysis"]
    EDA["Exploratory<br/>Analysis"]
    Cyto["Cytogenetics"]
  end

  subgraph Outputs
    Report["Gene Reports"]
    API["API Endpoints"]
    Site["pkgdown Website"]
  end

  GDC --> RNA
  GDC --> Clin
  RNA --> Clean
  Clin --> Clean
  Clean --> QC
  QC --> DE
  QC --> Surv
  QC --> Path
  QC --> EDA
  QC --> Cyto
  DE --> Report
  Surv --> Report
  Path --> Report
  EDA --> Site
  Cyto --> Site
  Report --> Site
  API --> Site

  click GDC "https://portal.gdc.cancer.gov/projects/MMRF-COMMPASS" _blank
  click RNA "data-acquisition.html" _blank
  click Clin "data-acquisition.html" _blank
  click Clean "data-sources.html" _blank
  click QC "data-sources.html" _blank
  click DE "differential-expression.html" _blank
  click Surv "survival-analysis.html" _blank
  click Path "differential-expression.html#pathway-enrichment-visualizations" _blank
  click EDA "exploratory-analysis.html" _blank
  click Cyto "exploratory-analysis.html#cytogenetic-landscape" _blank
  click Report "gene-report.html" _blank
  click API "api-usage.html" _blank
  click Site "https://JohnGavin.github.io/coMMpass-analysis/" _blank

  style GDC fill:#1a3a5c,stroke:#3498db,color:#fff
  style RNA fill:#1a3a5c,stroke:#3498db,color:#fff
  style Clin fill:#1a3a5c,stroke:#3498db,color:#fff
  style Clean fill:#4a3800,stroke:#f39c12,color:#fff
  style QC fill:#4a3800,stroke:#f39c12,color:#fff
  style DE fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style Surv fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style Path fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style EDA fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style Cyto fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style Report fill:#1a4a2e,stroke:#27ae60,color:#fff
  style API fill:#1a4a2e,stroke:#27ae60,color:#fff
  style Site fill:#1a4a2e,stroke:#27ae60,color:#fff
</script>
</pre>

**Data flow pipeline for coMMpass-analysis.**
Data flows left-to-right from the [GDC Portal](https://portal.gdc.cancer.gov/projects/MMRF-COMMPASS)
through acquisition, cleaning, and QC stages into five parallel analysis tracks
([DE](differential-expression.html), [survival](survival-analysis.html),
[pathway](differential-expression.html#pathway-enrichment-visualizations),
[EDA](exploratory-analysis.html),
[cytogenetics](exploratory-analysis.html#cytogenetic-landscape)), converging on
[gene reports](gene-report.html), [API endpoints](api-usage.html),
and the [pkgdown website](https://JohnGavin.github.io/coMMpass-analysis/).
The pipeline comprises 12 layers and ~183 [targets](glossary.html#targets-pipeline).
Colour key: blue = acquisition, yellow = cleaning/QC, red = analysis, green = outputs.
Source: layer definitions in `R/tar_plans/plan_dag_validation.R`.

:::

## Layer Dependency Graph

::: {#fig-layers}

<pre class="mermaid" id="layers">
<script type="text/plain">
graph TD
  DA["data-acquisition"]
  DC["data-cleaning"]
  CY["cytogenetics"]
  QC["quality-control"]
  DEX["differential-<br/>expression"]
  SU["survival"]
  PW["pathway"]
  ED["eda"]
  ST["storage"]
  AP["api"]
  DOC["documentation"]
  INF["infrastructure"]

  DA --> DC
  DC --> CY
  DC --> QC
  QC --> DEX
  QC --> SU
  QC --> PW
  QC --> ED
  DEX --> ST
  SU --> ST
  PW --> ST
  ED --> ST
  CY --> ST
  ST --> AP
  DEX --> DOC
  SU --> DOC
  PW --> DOC
  ED --> DOC
  CY --> DOC
  AP --> DOC
  INF -.-> DA

  click DA "data-acquisition.html" _blank
  click DC "data-sources.html" _blank
  click QC "data-sources.html" _blank
  click CY "exploratory-analysis.html#cytogenetic-landscape" _blank
  click DEX "differential-expression.html" _blank
  click SU "survival-analysis.html" _blank
  click PW "differential-expression.html#pathway-enrichment-visualizations" _blank
  click ED "exploratory-analysis.html" _blank
  click AP "api-usage.html" _blank
  click DOC "pipeline-dag.html" _blank
  click INF "telemetry.html" _blank

  style DA fill:#1a3a5c,stroke:#3498db,color:#fff
  style DC fill:#4a3800,stroke:#f39c12,color:#fff
  style CY fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style QC fill:#4a3800,stroke:#f39c12,color:#fff
  style DEX fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style SU fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style PW fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style ED fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style ST fill:#1a4a2e,stroke:#27ae60,color:#fff
  style AP fill:#1a4a2e,stroke:#27ae60,color:#fff
  style DOC fill:#3a1a4a,stroke:#8e44ad,color:#fff
  style INF fill:#333333,stroke:#7f8c8d,color:#fff
</script>
</pre>

**Layer dependency graph showing the 12 pipeline layers.**
Each node represents a pipeline layer from
[`plan_dag_validation.R`](https://github.com/JohnGavin/coMMpass-analysis/blob/main/R/tar_plans/plan_dag_validation.R).
Solid arrows indicate allowed dependencies: data flows top-down from
acquisition through cleaning/QC into parallel analysis tracks, then into
storage and documentation. The dashed arrow from infrastructure to
data-acquisition reflects the Nix/config bootstrap.
Terminal layers: [documentation](pipeline-dag.html) (aggregates all outputs)
and [infrastructure](telemetry.html) (standalone).
Colour key: blue = acquisition, yellow = cleaning/QC, red = analysis,
green = storage/API, purple = documentation, grey = infrastructure.
Source: `R/tar_plans/plan_dag_validation.R`, lines 13-48.

:::

## Vignette Reading Guide

::: {#fig-readingguide}

<pre class="mermaid" id="readingguide">
<script type="text/plain">
graph LR
  subgraph Clinical["Clinical Path"]
    direction LR
    C1["Data Sources"]
    C2["Exploratory<br/>Analysis"]
    C3["Survival<br/>Analysis"]
    C1 --> C2 --> C3
  end

  subgraph Genomics["Genomics Path"]
    direction LR
    G1["Data Sources"]
    G2["Data<br/>Acquisition"]
    G3["Differential<br/>Expression"]
    G4["Gene Report"]
    G1 --> G2 --> G3 --> G4
  end

  subgraph DevOps["Developer / Pipeline Path"]
    direction LR
    D1["Data Sources"]
    D2["Pipeline DAG"]
    D3["Telemetry"]
    D4["API Usage"]
    D1 --> D2 --> D3 --> D4
  end

  click C1 "data-sources.html" _blank
  click C2 "exploratory-analysis.html" _blank
  click C3 "survival-analysis.html" _blank
  click G1 "data-sources.html" _blank
  click G2 "data-acquisition.html" _blank
  click G3 "differential-expression.html" _blank
  click G4 "gene-report.html" _blank
  click D1 "data-sources.html" _blank
  click D2 "pipeline-dag.html" _blank
  click D3 "telemetry.html" _blank
  click D4 "api-usage.html" _blank

  style C1 fill:#1a3a5c,stroke:#3498db,color:#fff
  style C2 fill:#1a3a5c,stroke:#3498db,color:#fff
  style C3 fill:#1a3a5c,stroke:#3498db,color:#fff
  style G1 fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style G2 fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style G3 fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style G4 fill:#5c1a1a,stroke:#e74c3c,color:#fff
  style D1 fill:#1a4a2e,stroke:#27ae60,color:#fff
  style D2 fill:#1a4a2e,stroke:#27ae60,color:#fff
  style D3 fill:#1a4a2e,stroke:#27ae60,color:#fff
  style D4 fill:#1a4a2e,stroke:#27ae60,color:#fff
</script>
</pre>

**Recommended reading paths through the 11 vignettes.**
Three paths serve different audiences: **Clinical** (blue) covers
patient-level [exploratory analysis](exploratory-analysis.html) and
[survival modelling](survival-analysis.html);
**Genomics** (red) covers [RNA-seq acquisition](data-acquisition.html),
[differential expression](differential-expression.html), and
[gene-level reporting](gene-report.html);
**Developer/Pipeline** (green) covers the [targets DAG](pipeline-dag.html),
[pipeline telemetry](telemetry.html), and [API endpoints](api-usage.html).
All paths begin at [Data Sources](data-sources.html).
Cross-cutting references: [glossary](glossary.html) (65 terms),
[data dictionary](data-dictionary.html).

:::

## Session Information

```{r sessioninfo, eval = TRUE}
sessionInfo()
```
