---
title: "05. Survival Analysis"
format:
  html:
    toc: true
    toc-expand: 2
    toc-location: left
    code-fold: true
    code-summary: "Show code"
vignette: >
  %\VignetteIndexEntry{05. Survival Analysis}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
#| echo: false
#| results: asis
in_pkgdown <- nzchar(Sys.getenv("IN_PKGDOWN"))
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.width = 8,
  fig.height = 6
)
if (!in_pkgdown) library(targets)

# Shared vignette utilities (safe_tar_read, show_target, render helpers)
utils_path <- system.file("vignette_utils.R", package = "coMMpass")
if (utils_path == "") {
  utils_path <- if (file.exists("../inst/vignette_utils.R")) "../inst/vignette_utils.R"
  else if (file.exists("inst/vignette_utils.R")) "inst/vignette_utils.R"
  else stop("Cannot find vignette_utils.R")
}
source(utils_path, local = TRUE)
```

```{r pkgdown-banner}
#| results: asis
#| eval: !expr in_pkgdown
#| echo: false
cat("::: {.callout-note}\n## Online documentation\nThis vignette displays pre-computed results. Run the targets pipeline locally for interactive analysis.\n:::\n")
```

## Overview

> See the [Glossary](glossary.html) for term definitions used throughout this project.

- Survival analysis of the CoMMpass cohort stratified by clinical and cytogenetic features
- **Kaplan-Meier curves** with log-rank tests for group comparisons
- **Cox proportional hazards models** for multivariate analysis
- **Forest plots** of hazard ratios with 95% confidence intervals

> **Note:** This vignette was built in CI with `sample_limit=20`. Local builds
> default to 200 samples. Numbers below reflect the CI subset.

## Overall Survival

Kaplan-Meier estimate of the overall survival function for the full patient cohort.

```{r km-overall}
#| echo: false
#| results: asis
show_target("vig_km_overall")
```

```{r km-overall-text, results='asis'}
#| echo: false
show_target("vig_km_overall_text")
```

## Survival by ISS Stage

- [ISS](https://doi.org/10.1200/JCO.2005.04.242) stratifies by serum albumin and beta-2 microglobulin
- ISS III is expected to have worse survival than ISS I
- See the [EDA vignette](exploratory-analysis.html) for ISS stage distributions

```{r km-iss}
#| echo: false
#| results: asis
show_target("vig_km_iss")
```

## Survival by Cytogenetic Risk Group

Cytogenetic risk per [IMWG 2014 criteria](https://doi.org/10.1200/JCO.2014.55.1519):
**high-risk** = t(4;14) OR t(14;16) OR del(17p); **standard-risk** = all others.

```{r km-risk}
#| echo: false
#| results: asis
show_target("vig_km_risk")
```

## Survival by Individual Cytogenetic Markers

Each marker is analyzed separately: patients with vs without the alteration.

```{r km-markers, fig.height = 5}
#| echo: false
#| results: asis
show_target("vig_km_markers")
```

## Cox Proportional Hazards

### Basic Model: Age + Gender

Hazard ratios for age and gender from a minimal Cox PH model.

```{r cox-basic, results='asis'}
#| echo: false
show_target("vig_cox_basic_table")
```

### Full Model: Age + Gender + ISS + Cytogenetic Risk

Multivariable Cox model adjusting for ISS stage and IMWG cytogenetic risk classification.

```{r cox-full, results='asis'}
#| echo: false
show_target("vig_cox_full_table")
```

### Forest Plot

Forest plot showing hazard ratios from the full multivariate Cox regression. Each row represents a covariate; the point estimate is the HR with 95% CI bars. HR > 1 indicates increased risk; HR < 1 indicates a protective effect. The dashed vertical line marks HR = 1 (no effect).

```{r forest-plot}
#| echo: false
#| results: asis
show_target("vig_forest_plot")
```

### Proportional Hazards Assumption

Schoenfeld residuals test for the proportional hazards assumption. A significant p-value (< 0.05) suggests the covariate's effect changes over time, violating the PH assumption. Covariates that fail this test may need time-varying coefficients or stratification.

```{r ph-test, results='asis'}
#| echo: false
show_target("vig_ph_test_table")
```

## Model Comparison

Comparison of nested Cox models using likelihood ratio test, AIC, and concordance index. The full model includes ISS stage and cytogenetic risk in addition to age and gender. A significant likelihood ratio test indicates the additional covariates improve model fit.

```{r model-comparison, results='asis'}
#| echo: false
show_target("vig_cox_comparison")
```

## Survival by Gene Expression

Patients are split at the median VST expression of each top DE gene into
"High" and "Low" groups. This connects differential expression results to
clinical outcomes.

> **Note on multiple testing:** With 5 genes tested, a Bonferroni-corrected
> significance threshold is p < 0.01.

```{r km-expression, fig.height = 5}
#| echo: false
#| results: asis
show_target("vig_km_by_expression")
```

## Expression by Cytogenetic Subtype

Violin/box plots showing gene expression (VST) stratified by cytogenetic
marker status. This reveals whether specific alterations drive expression
changes in top DE genes.

```{r expr-by-subtype, fig.height = 6}
plots <- safe_tar_read("vig_expr_by_subtype")
if (is.list(plots) && !inherits(plots, c("gtable", "grob"))) {
  for (p in plots) {
    if (inherits(p, c("gtable", "grob", "gTree"))) {
      grid::grid.newpage(); grid::grid.draw(p)
    } else print(p)
  }
}
```

## Next Steps

- **Time-varying coefficients**: For covariates violating the PH assumption.
- **Cure models**: If a plateau is observed in KM curves, consider
  mixture cure models.
- See the [cytogenetic landscape](exploratory-analysis.html) for alteration
  frequencies underlying risk groups.
- See the [DE results](differential-expression.html) for gene signatures
  that could inform survival stratification.

## Data Sources

Results in this vignette are derived from the
[MMRF CoMMpass study](https://portal.gdc.cancer.gov/projects/MMRF-COMMPASS)
(MMRF-COMMPASS, ~1,143 patients), downloaded via
[TCGAbiolinks](https://bioconductor.org/packages/TCGAbiolinks/).
The pipeline runs with a configurable `sample_limit` (default 200; CI uses 20).

For full citations, data access tiers, and the distinction between
pipeline data and synthetic test data, see the
[Data Sources](data-sources.html) vignette.

## Recent Changes

Recent project commits with lines added, files changed, and change categories.

```{r changelog}
#| echo: false
#| results: asis
show_target("vig_git_changelog")
```

## Bayesian Survival Models

Bayesian Cox PH models fit with [brms](https://paul-buerkner.github.io/brms/)
using weakly informative priors. These complement the frequentist models above
by providing posterior distributions, credible intervals, and natural
hierarchical structure (e.g., ISS-stage random intercepts).

::: {.callout-note}
Bayesian models use `cue = "never"` — they only run when explicitly requested
via `tar_make(names = bayes_cox_basic)`. MCMC compilation takes several minutes.
:::

### Frequentist vs Bayesian Comparison

```{r bayes-comparison}
#| echo: false
#| results: asis
show_target("vig_bayes_freq_table")
```

## Reproducibility

<details>
<summary>Session Info (click to expand)</summary>

```{r session-info, eval=TRUE}
sessionInfo()
```

</details>
