| Title: | Analyze Irish Weather Buoy Network Data |
|---|---|
| Description: | Provides tools to download, process, and analyze data from the Irish Weather Buoy Network. Includes functions for accessing real-time and historical data via the Marine Institute's ERDDAP server, storing data in DuckDB for efficient querying, and building predictive models for wave height and weather conditions. |
| Authors: | John Gavin [aut, cre] |
| Maintainer: | John Gavin <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-06-07 10:34:31 UTC |
| Source: | https://github.com/JohnGavin/irishbuoys |
Adds calculated wave metrics including rogue wave flag and steepness.
add_wave_metrics(data, rogue_threshold = 2)add_wave_metrics(data, rogue_threshold = 2)
data |
Data frame with wave_height, hmax, and wave_period columns |
rogue_threshold |
Threshold for rogue wave classification (default: 2.0) |
Data frame with additional columns: rogue_ratio, is_rogue, steepness, danger_level
data <- data.frame( wave_height = c(2.5, 3.0, 1.8), hmax = c(4.5, 6.5, 3.2), wave_period = c(8, 10, 7) ) add_wave_metrics(data)data <- data.frame( wave_height = c(2.5, 3.0, 1.8), hmax = c(4.5, 6.5, 3.2), wave_period = c(8, 10, 7) ) add_wave_metrics(data)
Analyzes the ratio of peak gust to sustained wind speed. This is the wind equivalent of the wave Hmax/Hs ratio.
analyze_gust_factor(data, min_wind_speed = 5)analyze_gust_factor(data, min_wind_speed = 5)
data |
Data frame with gust and wind_speed columns |
min_wind_speed |
Minimum sustained wind speed to consider (default: 5 m/s) |
List with:
summary: summary statistics of gust factor
extreme_gusts: observations with high gust factors
by_wind_category: gust factor by wind speed category
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wind_speed", "gust")) gust_analysis <- analyze_gust_factor(data) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wind_speed", "gust")) gust_analysis <- analyze_gust_factor(data) DBI::dbDisconnect(con) ## End(Not run)
Analyzes how often extreme events co-occur at multiple stations.
analyze_joint_extremes( data, variable = "wave_height", threshold_quantile = 0.95 )analyze_joint_extremes( data, variable = "wave_height", threshold_quantile = 0.95 )
data |
Data frame with columns: time, station_id, and the variable |
variable |
Variable to analyze (default: "wave_height") |
threshold_quantile |
Quantile threshold for "extreme" (default: 0.95) |
List with:
joint_extreme_counts: matrix of joint extreme event counts
conditional_probs: P(station j extreme | station i extreme)
extreme_events: data frame of all extreme events
## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) analyze_joint_extremes(data) ## End(Not run)## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) analyze_joint_extremes(data) ## End(Not run)
Get statistics about Parquet file storage.
analyze_parquet_storage(data_path = "inst/extdata/parquet")analyze_parquet_storage(data_path = "inst/extdata/parquet")
data_path |
Base path for Parquet files |
Computes statistics on rogue wave occurrence rates and associated conditions.
analyze_rogue_statistics(con, threshold = 2, min_wave_height = 2)analyze_rogue_statistics(con, threshold = 2, min_wave_height = 2)
con |
DBI connection to DuckDB database |
threshold |
Hmax/WaveHeight ratio threshold (default: 2.0) |
min_wave_height |
Minimum significant wave height (default: 2m) |
List containing rogue wave statistics by station and overall
## Not run: con <- connect_duckdb() stats <- analyze_rogue_statistics(con) print(stats$by_station) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() stats <- analyze_rogue_statistics(con) print(stats$by_station) DBI::dbDisconnect(con) ## End(Not run)
Computes cross-correlations for all unique station pairs.
analyze_station_pairs(data, variable = "wave_height", max_lag = 48)analyze_station_pairs(data, variable = "wave_height", max_lag = 48)
data |
Data frame with columns: time, station_id, and the variable |
variable |
Variable to analyze (default: "wave_height") |
max_lag |
Maximum lag in hours (default: 48) |
Data frame with one row per station pair containing:
station1, station2: station pair
distance_km: distance between stations
optimal_lag: lag with max correlation
max_correlation: correlation at optimal lag
expected_lag: expected lag based on wave propagation (~30 km/h)
## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) analyze_station_pairs(data) ## End(Not run)## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) analyze_station_pairs(data) ## End(Not run)
Functions for running a live REST API that serves pre-built static JSON data from the targets pipeline.
Other api:
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
Functions for generating static JSON API files served via GitHub Pages.
These are written to docs/api/v1/ and updated 6-hourly by CI.
Other api:
api_plumber,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
Maps Beaufort scale integers (0-12) to standard descriptions.
beaufort_to_description(beaufort)beaufort_to_description(beaufort)
beaufort |
Integer vector of Beaufort numbers (0-12). |
Character vector of Beaufort descriptions.
Other storm-alert:
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
beaufort_to_description(0:12)beaufort_to_description(0:12)
Returns a lazy dplyr tibble reference to the buoy_data table. All dplyr operations are translated to SQL and executed in DuckDB. Call collect() to retrieve results as a data frame.
buoy_tbl(con, table_name = "buoy_data")buoy_tbl(con, table_name = "buoy_data")
con |
DBI connection to DuckDB database |
table_name |
Name of the table (default: "buoy_data") |
A lazy tibble (tbl_dbi) for use with dplyr verbs
## Not run: con <- connect_duckdb() buoy_tbl(con) |> dplyr::filter(station_id == "M3", wave_height > 5) |> dplyr::select(time, wave_height, hmax) |> dplyr::collect() DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() buoy_tbl(con) |> dplyr::filter(station_id == "M3", wave_height > 5) |> dplyr::select(time, wave_height, hmax) |> dplyr::collect() DBI::dbDisconnect(con) ## End(Not run)
Calculates annual statistics and fits a linear trend to detect long-term changes in the data.
calculate_annual_trends(data, variable = "wave_height", time_col = "time")calculate_annual_trends(data, variable = "wave_height", time_col = "time")
data |
Data frame with time and value columns |
variable |
Name of the variable (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
List with:
annual_stats: annual mean, max, etc.
trend_model: linear model for trend
trend_per_decade: change per decade with significance
set.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- calculate_annual_trends(data) result$trend_per_decade result$p_valueset.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- calculate_annual_trends(data) result$trend_per_decade result$p_value
Computes return levels from a GPD (Generalized Pareto Distribution) fit
produced by mev::fit.gpd(). Uses the standard GPD return level formula
with delta method confidence intervals.
The formula is:
where is the threshold, is the scale, is
the shape, is the exceedance rate, and is the
return period in years.
When shape is approximately zero, the exponential fallback is used:
calculate_gpd_return_levels( gpd_fit, return_periods = c(1, 5, 10), n_obs_per_year = 8760, n_total = NULL, exceedance_rate = NULL, conf_level = 0.95 )calculate_gpd_return_levels( gpd_fit, return_periods = c(1, 5, 10), n_obs_per_year = 8760, n_total = NULL, exceedance_rate = NULL, conf_level = 0.95 )
gpd_fit |
A list from the per-station GPD fitting targets, with elements:
|
return_periods |
Numeric vector of return periods in years (default: c(1, 5, 10)) |
n_obs_per_year |
Number of observations per year for exceedance rate calculation (default: 8760 for hourly data) |
n_total |
Total number of observations. If NULL, estimated from
|
exceedance_rate |
Pre-computed exceedance rate (lambda). If NULL,
estimated as |
conf_level |
Confidence level for intervals (default: 0.95) |
Data frame with columns: return_period, return_level, lower,
upper, threshold_value, method. Returns NA return levels if the fit
has an error or missing parameters.
fit <- list(u = 5.0, scale = 1.2, shape = 0.1, n_exceed = 500) calculate_gpd_return_levels(fit, return_periods = c(10, 50, 100))fit <- list(u = 5.0, scale = 1.2, shape = 0.1, n_exceed = 500) calculate_gpd_return_levels(fit, return_periods = c(10, 50, 100))
Calculates Hs from a time series of surface elevation measurements using the spectral method (4 * sigma).
calculate_hs_from_elevation(elevations)calculate_hs_from_elevation(elevations)
elevations |
Numeric vector of surface elevation measurements (m) |
Significant wave height in meters
# Simulated wave elevation time series t <- seq(0, 1000, by = 0.5) # 1000 seconds at 2Hz elevation <- 0.5 * sin(2*pi*t/8) + 0.3 * sin(2*pi*t/12) + rnorm(length(t), 0, 0.1) hs <- calculate_hs_from_elevation(elevation)# Simulated wave elevation time series t <- seq(0, 1000, by = 0.5) # 1000 seconds at 2Hz elevation <- 0.5 * sin(2*pi*t/8) + 0.3 * sin(2*pi*t/12) + rnorm(length(t), 0, 0.1) hs <- calculate_hs_from_elevation(elevation)
Calculates return levels for specified return periods from a fitted extreme value model.
calculate_return_levels( fit, return_periods = c(10, 50, 100), conf_level = 0.95 )calculate_return_levels( fit, return_periods = c(10, 50, 100), conf_level = 0.95 )
fit |
Result from fit_gev_annual_maxima or fit_gpd_threshold |
return_periods |
Numeric vector of return periods in years (default: c(10, 50, 100)) |
conf_level |
Confidence level for intervals (default: 0.95) |
Data frame with:
return_period: return period in years
return_level: estimated return level
lower: lower confidence bound
upper: upper confidence bound
## Not run: gev_result <- fit_gev_annual_maxima(data) levels <- calculate_return_levels(gev_result, c(10, 50, 100)) print(levels) ## End(Not run)## Not run: gev_result <- fit_gev_annual_maxima(data) levels <- calculate_return_levels(gev_result, c(10, 50, 100)) print(levels) ## End(Not run)
Calculates the Root Mean Square wave height, which is related to wave energy content.
calculate_rms_wave_height(wave_heights)calculate_rms_wave_height(wave_heights)
wave_heights |
Numeric vector of individual wave heights (m) |
H_rms = sqrt(mean(H^2))
Relationship to Hs (for Rayleigh distribution): H_rms = Hs / sqrt(8) ~ 0.707 * Hs
RMS wave height in meters
heights <- c(1.2, 2.1, 0.8, 3.5, 1.9, 2.8) h_rms <- calculate_rms_wave_height(heights)heights <- c(1.2, 2.1, 0.8, 3.5, 1.9, 2.8) h_rms <- calculate_rms_wave_height(heights)
Calculates mean values by month and season for a variable.
calculate_seasonal_means(data, variable = "wave_height", time_col = "time")calculate_seasonal_means(data, variable = "wave_height", time_col = "time")
data |
Data frame with time and value columns |
variable |
Name of the variable (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
List with:
monthly: mean values by month
seasonal: mean values by season (DJF, MAM, JJA, SON)
set.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- calculate_seasonal_means(data) result$monthly result$seasonalset.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- calculate_seasonal_means(data) result$monthly result$seasonal
Calculates wave steepness, an important safety metric. Steepness > 0.07 indicates breaking waves (dangerous).
calculate_wave_steepness(wave_height, wave_period)calculate_wave_steepness(wave_height, wave_period)
wave_height |
Significant wave height in meters |
wave_period |
Wave period in seconds |
Wave steepness = H / L where L = g * T^2 / (2 * pi) Simplified: steepness = H / (1.56 * T^2)
Wave steepness (dimensionless)
# 3m wave with 8 second period steepness <- calculate_wave_steepness(3, 8) # steepness = 0.03 (safe) # 3m wave with 4 second period steepness <- calculate_wave_steepness(3, 4) # steepness = 0.12 (dangerous - breaking waves)# 3m wave with 8 second period steepness <- calculate_wave_steepness(3, 8) # steepness = 0.03 (safe) # 3m wave with 4 second period steepness <- calculate_wave_steepness(3, 4) # steepness = 0.12 (dangerous - breaking waves)
Non-parametric bootstrap (optionally block bootstrap) confidence intervals for GPD-based return levels. Resamples the raw data, refits the GPD, and computes return levels for each replicate.
ci_bootstrap_return_levels( data, variable, return_periods = c(1, 5, 10), n_boot = 500, conf_level = 0.95, block_size = NULL, threshold_quantile = 0.95, n_obs_per_year = 8760, seed = 42 )ci_bootstrap_return_levels( data, variable, return_periods = c(1, 5, 10), n_boot = 500, conf_level = 0.95, block_size = NULL, threshold_quantile = 0.95, n_obs_per_year = 8760, seed = 42 )
data |
Data frame containing the variable to analyse. |
variable |
Character name of the column (e.g. |
return_periods |
Numeric vector of return periods in years
(default |
n_boot |
Number of bootstrap replicates (default 500). |
conf_level |
Confidence level (default 0.95). |
block_size |
Integer block size for block bootstrap (observations, not
hours). |
threshold_quantile |
Quantile for the POT threshold (default 0.95). |
n_obs_per_year |
Observations per year for return level calculation (default 8760 for hourly). |
seed |
Random seed for reproducibility (default 42). |
For each bootstrap replicate:
Resample observations (iid or block bootstrap)
Compute the threshold as the threshold_quantile of the resample
Fit GPD via mev::fit.gpd() to exceedances
Compute return levels via calculate_gpd_return_levels()
The percentile method is used: CIs are the alpha/2 and 1-alpha/2
quantiles of the bootstrap distribution of return levels.
A data.frame with columns: return_period, return_level,
lower, upper, n_success, method.
Distribution-free confidence intervals for population quantiles using
order statistics. Uses the Beta distribution to find order-statistic
indices j,k such that (X_(j), X_(k)) covers the p-th quantile with
at least the specified confidence level.
ci_order_statistics(x, probs, conf_level = 0.95)ci_order_statistics(x, probs, conf_level = 0.95)
x |
Numeric vector of observations (NAs removed internally). |
probs |
Numeric vector of probabilities for which to compute CIs
(e.g. |
conf_level |
Confidence level (default 0.95). |
For a sample of size n, the probability that the interval (X_(j), X_(k))
contains the p-th quantile is pbeta(p, j, n-j+1) - pbeta(p, k, n-k+1).
We search for the tightest such interval achieving at least conf_level
coverage.
This method is distribution-free: it requires no parametric assumptions. With ~8 years of hourly data (~70k observations), order-statistic CIs are well-defined even for extreme quantiles like the 99th percentile.
A data.frame with columns: probability, quantile, lower,
upper, j, k, actual_coverage, method.
set.seed(42) x <- rnorm(1000) ci_order_statistics(x, probs = c(0.95, 0.99))set.seed(42) x <- rnorm(1000) ci_order_statistics(x, probs = c(0.95, 0.99))
Simulate from a fitted GPD, refit, and compute return levels to obtain parametric bootstrap confidence intervals.
ci_parametric_bootstrap( gpd_fit, n_boot = 500, return_periods = c(1, 5, 10), conf_level = 0.95, n_obs_per_year = 8760, seed = 42 )ci_parametric_bootstrap( gpd_fit, n_boot = 500, return_periods = c(1, 5, 10), conf_level = 0.95, n_obs_per_year = 8760, seed = 42 )
gpd_fit |
A list with elements |
n_boot |
Number of bootstrap replicates (default 500). |
return_periods |
Numeric vector of return periods in years. |
conf_level |
Confidence level (default 0.95). |
n_obs_per_year |
Observations per year (default 8760). |
seed |
Random seed (default 42). |
For each replicate:
Simulate n_exceed exceedances from GPD(scale, shape)
Add threshold to obtain values above u
Refit GPD via mev::fit.gpd()
Compute return levels
Uses the percentile method for CIs.
A data.frame with columns: return_period, return_level,
lower, upper, n_success, method.
Compares the occurrence rates of rogue waves (Hmax/Hs > 2) and extreme gusts (gust/wind > 2.6).
compare_rogue_wave_gust(data)compare_rogue_wave_gust(data)
data |
Data frame with wave_height, hmax, wind_speed, gust columns |
Data frame comparing occurrence rates
data <- data.frame( wave_height = c(3, 4, 5, 2.5), hmax = c(5, 9, 8, 4), wind_speed = c(10, 15, 20, 8), gust = c(15, 40, 30, 12) ) compare_rogue_wave_gust(data)data <- data.frame( wave_height = c(3, 4, 5, 2.5), hmax = c(5, 9, 8, 4), wind_speed = c(10, 15, 20, 8), gust = c(15, 40, 30, 12) ) compare_rogue_wave_gust(data)
Computes autocorrelation function values and returns them as a tibble. Useful for examining temporal dependence structure in buoy data.
compute_acf_summary(data, variable = "wave_height", max_lag = 48)compute_acf_summary(data, variable = "wave_height", max_lag = 48)
data |
Data frame with the variable to analyze |
variable |
Name of the variable (default: "wave_height") |
max_lag |
Maximum number of lags (default: 48) |
A tibble with columns lag and acf.
set.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) acf_result <- compute_acf_summary(data) head(acf_result)set.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) acf_result <- compute_acf_summary(data) head(acf_result)
Compute calibration metrics for predictions
compute_calibration(predictions)compute_calibration(predictions)
predictions |
Tibble from |
List with: brier_score, accuracy, calibration_by_bucket, rolling_brier, n_total, n_resolved
preds <- tibble::tibble( prediction_id = paste0("pred_", 1:5), p_success = c(0.9, 0.7, 0.3, 0.8, 0.5), outcome = c(TRUE, TRUE, FALSE, TRUE, FALSE), outcome_binary = c(1, 1, 0, 1, 0), recorded_at = as.character(Sys.time() - (5:1) * 86400) ) cal <- compute_calibration(preds) cal$brier_scorepreds <- tibble::tibble( prediction_id = paste0("pred_", 1:5), p_success = c(0.9, 0.7, 0.3, 0.8, 0.5), outcome = c(TRUE, TRUE, FALSE, TRUE, FALSE), outcome_binary = c(1, 1, 0, 1, 0), recorded_at = as.character(Sys.time() - (5:1) * 86400) ) cal <- compute_calibration(preds) cal$brier_score
Computes temporal coverage and gap analysis for buoy stations. Uses dplyr only (no raw SQL).
compute_data_coverage(con, start_date, end_date)compute_data_coverage(con, start_date, end_date)
con |
DBI connection to DuckDB database |
start_date |
Start date for analysis |
end_date |
End date for analysis |
List with coverage tibble and gaps tibble
Estimates the upper tail dependence coefficient (lambda_U) for all unique station pairs using a Gumbel copula. For Gumbel copula with parameter alpha, lambda_U = 2 - 2^(1/alpha). Bootstrap confidence intervals assess whether lambda_U is significantly greater than zero (H1: spatial coherence of extremes).
Also computes empirical chi statistics at multiple quantile levels and Kendall's tau for overall rank dependence.
compute_extremal_dependence( data, variable = "wave_height", threshold_quantile = seq(0.9, 0.99, by = 0.01), n_bootstrap = 100, boot_subsample = 5000, station_info = NULL )compute_extremal_dependence( data, variable = "wave_height", threshold_quantile = seq(0.9, 0.99, by = 0.01), n_bootstrap = 100, boot_subsample = 5000, station_info = NULL )
data |
Data frame with columns: |
variable |
Variable to analyze (default: |
threshold_quantile |
Quantile levels at which to compute empirical chi
(default: |
n_bootstrap |
Number of bootstrap replicates for lambda CI (default: 100). |
boot_subsample |
Maximum observations per bootstrap replicate. Subsampling speeds computation for large datasets (default: 5000). |
station_info |
Optional data frame with station metadata (from
|
List with:
Data frame with columns: station1, station2,
distance_km, kendall_tau, lambda_upper, lambda_lower,
lambda_upper_ci_low, lambda_upper_ci_high, n_concurrent,
copula_alpha, chi_q95, chi_q99, h1_significant (logical).
Character: "gumbel_copula".
Integer: number of bootstrap replicates used.
Numeric vector of quantile levels for chi.
If the copula package is unavailable or no valid pairs exist, returns
a list with an error field.
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height")) result <- compute_extremal_dependence(data) result$dependence_table DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height")) result <- compute_extremal_dependence(data) result$dependence_table DBI::dbDisconnect(con) ## End(Not run)
Maps observation age in hours to a confidence multiplier in (0, 1].
Confidence is 1.0 while data is fresh, then decays linearly between
breakpoints. Floor of 0.1 — we never claim zero information.
Default schedule (chosen to match the buoy fetch cadence):
0 - 6 h : 1.00 (well within the 6-h fetch cycle)
6 - 24 h : 1.00 -> 0.50 (one missed fetch up to one day)
24 - 72 h : 0.50 -> 0.25 (one to three missed days)
> 72 h : 0.10 (floor)
compute_obs_confidence(age_hours)compute_obs_confidence(age_hours)
age_hours |
Numeric vector of ages in hours. NA in -> NA out. |
Numeric vector in [0.1, 1], same length as age_hours.
Other obs-confidence:
obs_status_label(),
widen_ci()
compute_obs_confidence(c(0, 6, 12, 24, 48, 72, 168))compute_obs_confidence(c(0, 6, 12, 24, 48, 72, 168))
Creates a new DuckDB database or connects to an existing one for storing Irish Weather Buoy Network data. Sets up the schema if creating new.
connect_duckdb(db_path = "inst/extdata/irish_buoys.duckdb", create_new = FALSE)connect_duckdb(db_path = "inst/extdata/irish_buoys.duckdb", create_new = FALSE)
db_path |
Character, path to database file (default: "inst/extdata/irish_buoys.duckdb") |
create_new |
Logical, whether to create new database (default: FALSE) |
DBI connection object to the DuckDB database
# Connect to existing database con <- connect_duckdb() # Create new database con <- connect_duckdb(create_new = TRUE) # Don't forget to disconnect when done DBI::dbDisconnect(con)# Connect to existing database con <- connect_duckdb() # Create new database con <- connect_duckdb(create_new = TRUE) # Don't forget to disconnect when done DBI::dbDisconnect(con)
One-time conversion from DuckDB database to Parquet files.
convert_duckdb_to_parquet( db_path = "inst/extdata/irish_buoys.duckdb", data_path = "inst/extdata/parquet" )convert_duckdb_to_parquet( db_path = "inst/extdata/irish_buoys.duckdb", data_path = "inst/extdata/parquet" )
db_path |
Path to existing DuckDB database |
data_path |
Output path for Parquet files |
Creates and returns a plumber router without starting the server. Useful for testing and programmatic access.
create_api_router()create_api_router()
A plumber router object.
Other api:
api_plumber,
api_static,
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
## Not run: pr <- create_api_router() # Test endpoints programmatically ## End(Not run)## Not run: pr <- create_api_router() # Test endpoints programmatically ## End(Not run)
Creates the necessary tables and indexes for efficient storage and querying of buoy data.
create_buoy_schema(con)create_buoy_schema(con)
con |
DBI connection object |
Invisible NULL
Formats the weekly summary as an HTML email using blastula.
create_email_summary(summary)create_email_summary(summary)
summary |
Summary object from generate_weekly_summary() |
blastula email object
Create Annual Trends Line Plot
create_plot_annual_trends(annual_trends, date_caption = NULL)create_plot_annual_trends(annual_trends, date_caption = NULL)
annual_trends |
Annual trends from calculate_annual_trends |
date_caption |
Date range caption |
plotly object
## Not run: trends <- list( annual_stats = data.frame( year = 2018:2024, mean = runif(7, 1.5, 3.5), sd = runif(7, 0.3, 1.0) ) ) create_plot_annual_trends(trends) ## End(Not run)## Not run: trends <- list( annual_stats = data.frame( year = 2018:2024, mean = runif(7, 1.5, 3.5), sd = runif(7, 0.3, 1.0) ) ) create_plot_annual_trends(trends) ## End(Not run)
Create Gust Factor by Category Plot
create_plot_gust_by_category(gust_analysis, date_caption = NULL)create_plot_gust_by_category(gust_analysis, date_caption = NULL)
gust_analysis |
Gust factor analysis results |
date_caption |
Date range caption |
plotly object
## Not run: gust <- list( by_station_category = data.frame( station_id = rep(c("M2", "M3"), each = 3), wind_category = rep(c("0-10", "10-20", "20+"), 2), mean_gf = runif(6, 1.1, 1.8), p95_gf = runif(6, 1.5, 2.5), n = sample(50:500, 6) ) ) create_plot_gust_by_category(gust) ## End(Not run)## Not run: gust <- list( by_station_category = data.frame( station_id = rep(c("M2", "M3"), each = 3), wind_category = rep(c("0-10", "10-20", "20+"), 2), mean_gf = runif(6, 1.1, 1.8), p95_gf = runif(6, 1.5, 2.5), n = sample(50:500, 6) ) ) create_plot_gust_by_category(gust) ## End(Not run)
Create Rogue Gusts vs Rogue Waves Scatter Plot
create_plot_gusts_vs_waves(analysis_data)create_plot_gusts_vs_waves(analysis_data)
analysis_data |
Full analysis data with both ratios computed |
plotly object
## Not run: data <- data.frame( time = as.POSIXct("2024-01-01") + (1:20) * 3600, station_id = rep(c("M2", "M3"), 10), gust = runif(20, 15, 50), wind_speed = runif(20, 8, 25), hmax = runif(20, 5, 15), wave_height = runif(20, 2, 6) ) create_plot_gusts_vs_waves(data) ## End(Not run)## Not run: data <- data.frame( time = as.POSIXct("2024-01-01") + (1:20) * 3600, station_id = rep(c("M2", "M3"), 10), gust = runif(20, 15, 50), wind_speed = runif(20, 8, 25), hmax = runif(20, 5, 15), wave_height = runif(20, 2, 6) ) create_plot_gusts_vs_waves(data) ## End(Not run)
Create Monthly Wave Height Bar Plot
create_plot_monthly_wave(seasonal_means_wave, date_caption = NULL)create_plot_monthly_wave(seasonal_means_wave, date_caption = NULL)
seasonal_means_wave |
Seasonal means from calculate_seasonal_means |
date_caption |
Date range caption |
plotly object
## Not run: seasonal <- list( monthly = data.frame( month_name = month.abb, mean = runif(12, 1, 4), sd = runif(12, 0.3, 1.0) ) ) create_plot_monthly_wave(seasonal) ## End(Not run)## Not run: seasonal <- list( monthly = data.frame( month_name = month.abb, mean = runif(12, 1, 4), sd = runif(12, 0.3, 1.0) ) ) create_plot_monthly_wave(seasonal) ## End(Not run)
Create Monthly Wind Speed Bar Plot
create_plot_monthly_wind(seasonal_means_wind, date_caption = NULL)create_plot_monthly_wind(seasonal_means_wind, date_caption = NULL)
seasonal_means_wind |
Seasonal means from calculate_seasonal_means |
date_caption |
Date range caption |
plotly object
## Not run: seasonal <- list( monthly = data.frame( month_name = month.abb, mean = runif(12, 5, 15), sd = runif(12, 1, 4) ) ) create_plot_monthly_wind(seasonal) ## End(Not run)## Not run: seasonal <- list( monthly = data.frame( month_name = month.abb, mean = runif(12, 5, 15), sd = runif(12, 1, 4) ) ) create_plot_monthly_wind(seasonal) ## End(Not run)
Create Return Levels Plot
create_plot_return_levels( return_levels, variable = "wave", date_caption = NULL )create_plot_return_levels( return_levels, variable = "wave", date_caption = NULL )
return_levels |
Return levels data frame |
variable |
Variable name for title ("wave", "wind", or "hmax") |
date_caption |
Date range caption |
plotly object
## Not run: rl <- data.frame( return_period = c(1, 5, 10, 25, 50, 100), return_level = c(5.2, 7.1, 8.3, 9.8, 11.0, 12.5), lower = c(4.8, 6.3, 7.2, 8.1, 8.9, 9.6), upper = c(5.6, 7.9, 9.4, 11.5, 13.1, 15.4) ) create_plot_return_levels(rl, variable = "wave") ## End(Not run)## Not run: rl <- data.frame( return_period = c(1, 5, 10, 25, 50, 100), return_level = c(5.2, 7.1, 8.3, 9.8, 11.0, 12.5), lower = c(4.8, 6.3, 7.2, 8.1, 8.9, 9.6), upper = c(5.6, 7.9, 9.4, 11.5, 13.1, 15.4) ) create_plot_return_levels(rl, variable = "wave") ## End(Not run)
Creates a horizontal dotplot showing GPD return levels by station for a given variable, with error bars for confidence intervals. Text labels at each point replace the legend for clarity.
create_plot_return_levels_per_station(return_levels_df, variable_filter)create_plot_return_levels_per_station(return_levels_df, variable_filter)
return_levels_df |
Data frame from |
variable_filter |
Character, which variable to plot (one of "avg_wave", "rogue_wave", "avg_wind", "wind_gust") |
plotly object, or NULL if no data for the requested variable
## Not run: rl_df <- data.frame( station = rep(c("M2", "M3", "M4"), each = 3), variable = "avg_wave", variable_label = "Avg Wave Height (m)", return_period = rep(c(1, 5, 10), 3), return_level = runif(9, 4, 12), lower = runif(9, 3, 8), upper = runif(9, 9, 16) ) create_plot_return_levels_per_station(rl_df, "avg_wave") ## End(Not run)## Not run: rl_df <- data.frame( station = rep(c("M2", "M3", "M4"), each = 3), variable = "avg_wave", variable_label = "Avg Wave Height (m)", return_period = rep(c(1, 5, 10), 3), return_level = runif(9, 4, 12), lower = runif(9, 3, 8), upper = runif(9, 9, 16) ) create_plot_return_levels_per_station(rl_df, "avg_wave") ## End(Not run)
Create Rogue Wave All Stations Plot
create_plot_rogue_all(rogue_events)create_plot_rogue_all(rogue_events)
rogue_events |
Data frame of rogue wave events |
plotly object
## Not run: rogue <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6), wind_speed = runif(10, 10, 30), gust = runif(10, 15, 45) ) create_plot_rogue_all(rogue) ## End(Not run)## Not run: rogue <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6), wind_speed = runif(10, 10, 30), gust = runif(10, 15, 45) ) create_plot_rogue_all(rogue) ## End(Not run)
Create Rogue Wave By Station Subplot
create_plot_rogue_by_station(rogue_events, date_caption = NULL)create_plot_rogue_by_station(rogue_events, date_caption = NULL)
rogue_events |
Data frame of rogue wave events |
date_caption |
Date range caption |
plotly object
## Not run: rogue <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6), wind_speed = runif(10, 10, 30) ) create_plot_rogue_by_station(rogue) ## End(Not run)## Not run: rogue <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6), wind_speed = runif(10, 10, 30) ) create_plot_rogue_by_station(rogue) ## End(Not run)
Create Rogue Gusts by Station Plot
create_plot_rogue_gusts(gust_analysis)create_plot_rogue_gusts(gust_analysis)
gust_analysis |
Gust factor analysis results |
plotly object
## Not run: gust <- list( rogue_gust_threshold = 1.5, by_station = data.frame( station_id = c("M2", "M3", "M4"), n = c(1000, 800, 600), n_rogue = c(15, 12, 8), pct_rogue = c(1.5, 1.5, 1.3), mean_gf = c(1.25, 1.30, 1.22), max_gf = c(2.8, 3.1, 2.5) ) ) create_plot_rogue_gusts(gust) ## End(Not run)## Not run: gust <- list( rogue_gust_threshold = 1.5, by_station = data.frame( station_id = c("M2", "M3", "M4"), n = c(1000, 800, 600), n_rogue = c(15, 12, 8), pct_rogue = c(1.5, 1.5, 1.3), mean_gf = c(1.25, 1.30, 1.22), max_gf = c(2.8, 3.1, 2.5) ) ) create_plot_rogue_gusts(gust) ## End(Not run)
Create Rogue Gusts All Stations Plot
create_plot_rogue_gusts_all(rogue_gust_events)create_plot_rogue_gusts_all(rogue_gust_events)
rogue_gust_events |
Data frame of rogue gust events |
plotly object
## Not run: gusts <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), gust_ratio = runif(10, 1.5, 3.0), gust = runif(10, 20, 50), wind_speed = runif(10, 10, 25), wave_height = runif(10, 2, 6) ) create_plot_rogue_gusts_all(gusts) ## End(Not run)## Not run: gusts <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), gust_ratio = runif(10, 1.5, 3.0), gust = runif(10, 20, 50), wind_speed = runif(10, 10, 25), wave_height = runif(10, 2, 6) ) create_plot_rogue_gusts_all(gusts) ## End(Not run)
Create Rogue Gusts By Station Subplot
create_plot_rogue_gusts_by_station(rogue_gust_events, date_caption = NULL)create_plot_rogue_gusts_by_station(rogue_gust_events, date_caption = NULL)
rogue_gust_events |
Data frame of rogue gust events |
date_caption |
Date range caption |
plotly object
## Not run: gusts <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), gust_ratio = runif(10, 1.5, 3.0), gust = runif(10, 20, 50), wind_speed = runif(10, 10, 25) ) create_plot_rogue_gusts_by_station(gusts) ## End(Not run)## Not run: gusts <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), gust_ratio = runif(10, 1.5, 3.0), gust = runif(10, 20, 50), wind_speed = runif(10, 10, 25) ) create_plot_rogue_gusts_by_station(gusts) ## End(Not run)
Create STL Decomposition Plot
create_plot_stl(wave_stl, date_caption = NULL)create_plot_stl(wave_stl, date_caption = NULL)
wave_stl |
STL decomposition from calculate_wave_seasonality |
date_caption |
Date range caption |
ggplot2 object
## Not run: stl_data <- list( components = data.frame( time = as.POSIXct("2024-01-01") + (1:100) * 86400, original = sin(1:100 / 10) + rnorm(100, 0, 0.2) + 2, seasonal = sin(1:100 / 10), trend = seq(1.8, 2.2, length.out = 100), remainder = rnorm(100, 0, 0.2) ) ) create_plot_stl(stl_data) ## End(Not run)## Not run: stl_data <- list( components = data.frame( time = as.POSIXct("2024-01-01") + (1:100) * 86400, original = sin(1:100 / 10) + rnorm(100, 0, 0.2) + 2, seasonal = sin(1:100 / 10), trend = seq(1.8, 2.2, length.out = 100), remainder = rnorm(100, 0, 0.2) ) ) create_plot_stl(stl_data) ## End(Not run)
Create Time of Day Bar Plot
create_plot_time_of_day(rogue_conditions, date_caption = NULL)create_plot_time_of_day(rogue_conditions, date_caption = NULL)
rogue_conditions |
Data frame with rogue wave conditions |
date_caption |
Date range caption |
plotly object
## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:20) * 3600, time_of_day = rep(c("Morning", "Afternoon", "Evening", "Night"), 5), rogue_ratio = runif(20, 2.0, 2.5) ) create_plot_time_of_day(conditions) ## End(Not run)## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:20) * 3600, time_of_day = rep(c("Morning", "Afternoon", "Evening", "Night"), 5), rogue_ratio = runif(20, 2.0, 2.5) ) create_plot_time_of_day(conditions) ## End(Not run)
Create Week of Year Stacked Bar Plot
create_plot_week_of_year(rogue_conditions, date_caption = NULL)create_plot_week_of_year(rogue_conditions, date_caption = NULL)
rogue_conditions |
Data frame with rogue wave conditions |
date_caption |
Date range caption |
plotly object
## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:30) * 86400, rogue_ratio = runif(30, 2.0, 2.5), hmax = runif(30, 8, 15) ) create_plot_week_of_year(conditions) ## End(Not run)## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:30) * 86400, rogue_ratio = runif(30, 2.0, 2.5), hmax = runif(30, 8, 15) ) create_plot_week_of_year(conditions) ## End(Not run)
Create Wind Speed by Beaufort Scale Plot
create_plot_wind_beaufort(rogue_conditions, date_caption = NULL)create_plot_wind_beaufort(rogue_conditions, date_caption = NULL)
rogue_conditions |
Data frame with rogue wave conditions |
date_caption |
Date range caption |
plotly object
## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), wind_speed = runif(10, 5, 35), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6) ) create_plot_wind_beaufort(conditions) ## End(Not run)## Not run: conditions <- data.frame( time = as.POSIXct("2024-01-01") + (1:10) * 3600, station_id = rep(c("M2", "M3"), 5), wind_speed = runif(10, 5, 35), rogue_ratio = runif(10, 2.0, 2.5), hmax = runif(10, 8, 15), wave_height = runif(10, 3, 6) ) create_plot_wind_beaufort(conditions) ## End(Not run)
Generates data for a return level plot showing the fitted distribution and confidence intervals.
create_return_level_plot_data(fit, max_return_period = 200, n_points = 100)create_return_level_plot_data(fit, max_return_period = 200, n_points = 100)
fit |
Result from fit_gev_annual_maxima or fit_gpd_threshold |
max_return_period |
Maximum return period to plot (default: 200) |
n_points |
Number of points for the curve (default: 100) |
Data frame suitable for plotting
Composes an HTML email showing forecasts for ALL stations, with storm stations highlighted. Stations sorted by max Beaufort descending.
create_storm_alert_email( storm_events, station_info = get_station_info(), all_forecasts = NULL, threshold_knots = 41, met_warnings = NULL, forecast_rogue_summary = NULL )create_storm_alert_email( storm_events, station_info = get_station_info(), all_forecasts = NULL, threshold_knots = 41, met_warnings = NULL, forecast_rogue_summary = NULL )
storm_events |
Tibble from |
station_info |
Data frame from |
all_forecasts |
Full forecast tibble from |
threshold_knots |
Numeric threshold used for triggering (default 41). |
met_warnings |
Character vector from |
A blastula email object.
Other storm-alert:
beaufort_to_description(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: events <- tibble::tibble( station_id = "M2", time = Sys.time(), wind_speed_kn = 40, wind_gust_kn = 55, beaufort = 8L, description = "Gale", is_gust_driven = FALSE ) create_storm_alert_email(events) ## End(Not run)## Not run: events <- tibble::tibble( station_id = "M2", time = Sys.time(), wind_speed_kn = 40, wind_gust_kn = 55, beaufort = 8L, description = "Gale", is_gust_driven = FALSE ) create_storm_alert_email(events) ## End(Not run)
Generates a summary of all validation results that can be included in dashboards or reports.
create_validation_summary(...)create_validation_summary(...)
... |
Named validation agents from interrogate() |
A tibble summarizing validation results
Computes cross-correlation function (CCF) between two stations for a given variable, identifying the optimal lag for prediction.
cross_correlation_stations( data, station1, station2, variable = "wave_height", max_lag = 48 )cross_correlation_stations( data, station1, station2, variable = "wave_height", max_lag = 48 )
data |
Data frame with columns: time, station_id, and the variable |
station1, station2
|
Station IDs to compare |
variable |
Variable to analyze (default: "wave_height") |
max_lag |
Maximum lag in hours to test (default: 48) |
List with:
ccf: cross-correlation values at each lag
optimal_lag: lag (hours) with maximum correlation
max_correlation: correlation at optimal lag
lag_hours: vector of lag values
## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) cross_correlation_stations(data, "M2", "M3") ## End(Not run)## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) cross_correlation_stations(data, "M2", "M3") ## End(Not run)
Applies Seasonal-Trend decomposition using Loess (STL) to a time series. This separates the signal into seasonal, trend, and remainder components.
decompose_stl( data, variable = "wave_height", time_col = "time", frequency = "daily" )decompose_stl( data, variable = "wave_height", time_col = "time", frequency = "daily" )
data |
Data frame with time and value columns |
variable |
Name of the variable to decompose (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
frequency |
Seasonal frequency (default: "daily" = 24 hours) |
List with:
decomposition: stl object
components: data frame with time, seasonal, trend, remainder
summary: summary statistics of each component
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, stations = "M3") stl_result <- decompose_stl(data) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, stations = "M3") stl_result <- decompose_stl(data) DBI::dbDisconnect(con) ## End(Not run)
Identifies anomalous values using standard deviation thresholds relative to seasonal norms.
detect_anomalies( data, variable = "wave_height", time_col = "time", threshold = 3 )detect_anomalies( data, variable = "wave_height", time_col = "time", threshold = 3 )
data |
Data frame with time and value columns |
variable |
Name of the variable (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
threshold |
Number of standard deviations for anomaly detection (default: 3) |
List with:
anomalies: data frame of anomalous observations
seasonal_norms: monthly mean and sd used as baseline
summary: count of anomalies by month
set.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- detect_anomalies(data) nrow(result$anomalies) result$summaryset.seed(1) data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "hour", length.out = 1000), wave_height = 2 + sin(seq(0, 20, length.out = 1000)) + rnorm(1000, 0, 0.3) ) result <- detect_anomalies(data) nrow(result$anomalies) result$summary
Identifies outliers using the interquartile range (IQR) method. Values beyond Q1 - multiplierIQR or Q3 + multiplierIQR are flagged.
detect_outliers_iqr(data, variable = "wave_height", multiplier = 1.5)detect_outliers_iqr(data, variable = "wave_height", multiplier = 1.5)
data |
Data frame with the variable to check |
variable |
Name of the variable (default: "wave_height") |
multiplier |
IQR multiplier for outlier threshold (default: 1.5) |
The input data frame with an additional is_outlier logical column.
data <- data.frame(x = c(1:20, 100)) detect_outliers_iqr(data, variable = "x")data <- data.frame(x = c(1:20, 100)) detect_outliers_iqr(data, variable = "x")
Functions for detecting and analyzing rogue waves from buoy data. Rogue waves are defined as waves where Hmax > threshold * WaveHeight.
Standard definition: Hmax > 2.0 * significant wave height Extreme definition: Hmax > 2.2 * significant wave height Detect Rogue Waves in Buoy Data
Identifies rogue wave events based on the ratio of maximum wave height (Hmax) to significant wave height (WaveHeight). Uses dplyr verbs translated to SQL for efficient DuckDB execution.
detect_rogue_waves( con, threshold = 2, min_wave_height = 2, start_date = NULL, end_date = NULL, stations = NULL )detect_rogue_waves( con, threshold = 2, min_wave_height = 2, start_date = NULL, end_date = NULL, stations = NULL )
con |
DBI connection to DuckDB database |
threshold |
Hmax/WaveHeight ratio threshold (default: 2.0) |
min_wave_height |
Minimum significant wave height to consider (default: 2m) |
start_date |
Optional start date filter |
end_date |
Optional end date filter |
stations |
Optional vector of station IDs to filter |
Data frame of rogue wave events with associated conditions
## Not run: con <- connect_duckdb() rogues <- detect_rogue_waves(con, threshold = 2.0) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() rogues <- detect_rogue_waves(con, threshold = 2.0) DBI::dbDisconnect(con) ## End(Not run)
Filters forecast data for wind speeds at or above the storm threshold.
Threshold is resolved in order: threshold_knots parameter, then
STORM_ALERT_THRESHOLD_KNOTS env var, then default of 41 knots (Beaufort 9).
detect_storm_events(forecasts, threshold_knots = NULL, use_gusts = FALSE)detect_storm_events(forecasts, threshold_knots = NULL, use_gusts = FALSE)
forecasts |
Tibble from |
threshold_knots |
Numeric threshold in knots (default NULL, uses env var or 41). |
use_gusts |
Logical; if TRUE, also flag rows where gusts exceed threshold. Default FALSE — only sustained wind speed triggers alerts. |
Tibble with columns: station_id, time, wind_speed_kn, wind_gust_kn, beaufort, description, is_gust_driven. Empty tibble if no storms detected.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
forecasts <- tibble::tibble( station_id = "M2", time = Sys.time() + 3600 * 1:3, wind_speed_kn = c(20, 38, 50), wind_gust_kn = c(25, 45, 60) ) detect_storm_events(forecasts)forecasts <- tibble::tibble( station_id = "M2", time = Sys.time() + 3600 * 1:3, wind_speed_kn = c(20, 38, 50), wind_gust_kn = c(25, 45, 60) ) detect_storm_events(forecasts)
Downloads data from the Marine Institute's ERDDAP server for the Irish Weather Buoy Network. Supports filtering by date range, stations, and variables.
download_buoy_data( start_date = Sys.Date() - 30, end_date = Sys.Date(), stations = NULL, variables = NULL, format = "csv" )download_buoy_data( start_date = Sys.Date() - 30, end_date = Sys.Date(), stations = NULL, variables = NULL, format = "csv" )
start_date |
Character or Date, start of date range (default: 30 days ago) |
end_date |
Character or Date, end of date range (default: today) |
stations |
Character vector of station IDs (default: all stations) |
variables |
Character vector of variable names (default: all variables) |
format |
Character, output format: "csv", "json", or "tsv" (default: "csv") |
Data frame containing the requested buoy data
## Not run: # Get last 7 days of data for all stations data <- download_buoy_data( start_date = Sys.Date() - 7, end_date = Sys.Date() ) # Get specific variables for M3 buoy wave_data <- download_buoy_data( stations = "M3", variables = c("time", "WaveHeight", "WavePeriod", "Hmax") ) ## End(Not run)## Not run: # Get last 7 days of data for all stations data <- download_buoy_data( start_date = Sys.Date() - 7, end_date = Sys.Date() ) # Get specific variables for M3 buoy wave_data <- download_buoy_data( stations = "M3", variables = c("time", "WaveHeight", "WavePeriod", "Hmax") ) ## End(Not run)
Evaluates model performance on test data.
evaluate_wave_model(model_result, data, target = "wave_height")evaluate_wave_model(model_result, data, target = "wave_height")
model_result |
Result from train_wave_model |
data |
Full prepared data frame |
target |
Target variable name (default: "wave_height") |
Data frame with performance metrics
Educational function explaining how raw measurements become hourly values.
explain_hourly_averaging()explain_hourly_averaging()
Character string with explanation
cat(explain_hourly_averaging())cat(explain_hourly_averaging())
Educational function explaining the physical and statistical basis for the relationship Hs = 4 * sigma.
explain_hs_formula()explain_hs_formula()
Character string with explanation
cat(explain_hs_formula())cat(explain_hs_formula())
Educational function explaining why wave measurements use specific time periods for statistical validity.
explain_measurement_period()explain_measurement_period()
Character string with explanation
cat(explain_measurement_period())cat(explain_measurement_period())
Educational function explaining how individual wave heights like Hmax are measured, and how this differs from the statistical Hs calculation.
explain_wave_height_measurement()explain_wave_height_measurement()
Character string with explanation
cat(explain_wave_height_measurement())cat(explain_wave_height_measurement())
Loops over all stations from get_station_info() and fetches wind forecasts.
fetch_all_forecasts( station_info = get_station_info(), forecast_days = 7, timeout = 30 )fetch_all_forecasts( station_info = get_station_info(), forecast_days = 7, timeout = 30 )
station_info |
Data frame with station_id, lat, lon columns
(default from |
forecast_days |
Integer number of forecast days (default 7). |
timeout |
Numeric request timeout in seconds (default 30). |
Combined tibble of all station forecasts.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: fetch_all_forecasts() ## End(Not run)## Not run: fetch_all_forecasts() ## End(Not run)
Loops over all stations from get_station_info() and fetches Open-Meteo
Marine API forecasts. Soft dependency: any per-station failure is logged
and skipped, never aborts.
fetch_all_marine_forecasts( station_info = get_station_info(), forecast_days = 7, timeout = 30 )fetch_all_marine_forecasts( station_info = get_station_info(), forecast_days = 7, timeout = 30 )
station_info |
Data frame with station_id, lat, lon columns. |
forecast_days |
Integer number of forecast days (default 7). |
timeout |
Numeric request timeout in seconds (default 30). |
Combined tibble of all station marine forecasts. Empty tibble if every station failed.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: fetch_all_marine_forecasts() ## End(Not run)## Not run: fetch_all_marine_forecasts() ## End(Not run)
Fetches the latest marine forecast/warning text from Met Eireann's open data. Returns NULL on any error (best-effort supplementary info).
fetch_met_eireann_warnings(timeout = 10)fetch_met_eireann_warnings(timeout = 10)
timeout |
Numeric request timeout in seconds (default 10). |
Character vector of warning lines, or NULL if unavailable.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: fetch_met_eireann_warnings() ## End(Not run)## Not run: fetch_met_eireann_warnings() ## End(Not run)
Queries the Open-Meteo API for hourly wind speed and gust forecasts at a given latitude/longitude. Returns an empty tibble on error.
fetch_open_meteo_forecast( lat, lon, station_id, forecast_days = 7, timeout = 30 )fetch_open_meteo_forecast( lat, lon, station_id, forecast_days = 7, timeout = 30 )
lat |
Latitude in decimal degrees. |
lon |
Longitude in decimal degrees. |
station_id |
Character station identifier (e.g. "M2"). |
forecast_days |
Integer number of forecast days (1-16, default 7). |
timeout |
Numeric request timeout in seconds (default 30). |
Tibble with columns: station_id, time, wind_speed_kn, wind_gust_kn, forecast_fetched_at. Empty tibble on error.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: fetch_open_meteo_forecast(51.22, -9.99, "M2", forecast_days = 1) ## End(Not run)## Not run: fetch_open_meteo_forecast(51.22, -9.99, "M2", forecast_days = 1) ## End(Not run)
Queries the Open-Meteo Marine Weather API for hourly significant wave height, wave period, wind-wave and swell components at a given lat/lon. Returns an empty tibble on error (soft dependency — never aborts the pipeline).
Source: https://open-meteo.com/en/docs/marine-weather-api. Underlying models are DWD EWAM (European) and GWAM (global), ~25 km grid.
fetch_open_meteo_marine(lat, lon, station_id, forecast_days = 7, timeout = 30)fetch_open_meteo_marine(lat, lon, station_id, forecast_days = 7, timeout = 30)
lat |
Latitude in decimal degrees. |
lon |
Longitude in decimal degrees. |
station_id |
Character station identifier (e.g. "M2"). |
forecast_days |
Integer number of forecast days (1-8 for marine, default 7). |
timeout |
Numeric request timeout in seconds (default 30). |
Tibble with columns: station_id, time, wave_height_m, wave_period_s, wind_wave_height_m, swell_wave_height_m, forecast_fetched_at. Empty tibble on error.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
## Not run: fetch_open_meteo_marine(51.22, -9.99, "M2", forecast_days = 1) ## End(Not run)## Not run: fetch_open_meteo_marine(51.22, -9.99, "M2", forecast_days = 1) ## End(Not run)
Fits a copula model to capture the joint dependence structure between two stations, especially in the tails (extremes).
fit_bivariate_copula( data, station1, station2, variable = "wave_height", copula_family = "gumbel" )fit_bivariate_copula( data, station1, station2, variable = "wave_height", copula_family = "gumbel" )
data |
Data frame with columns: time, station_id, and the variable |
station1, station2
|
Station IDs to analyze |
variable |
Variable to analyze (default: "wave_height") |
copula_family |
Copula family: "gaussian", "t", "clayton", "gumbel", "frank" |
List with:
copula: fitted copula object
parameters: copula parameters
tau: Kendall's tau (rank correlation)
tail_dependence: lower and upper tail dependence coefficients
Fits a Generalized Extreme Value distribution to annual maximum values. This is the Block Maxima approach to extreme value analysis.
fit_gev_annual_maxima( data, variable = "wave_height", time_col = "time", min_years = 5 )fit_gev_annual_maxima( data, variable = "wave_height", time_col = "time", min_years = 5 )
data |
Data frame with columns: time, value (the variable to analyze) |
variable |
Name of the variable column (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
min_years |
Minimum years of data required (default: 5) |
List with:
fit: extRemes fevd object
annual_maxima: data frame of annual maxima
parameters: GEV parameters (location, scale, shape)
diagnostics: model diagnostic information
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wave_height")) gev_result <- fit_gev_annual_maxima(data) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wave_height")) gev_result <- fit_gev_annual_maxima(data) DBI::dbDisconnect(con) ## End(Not run)
Fits a Generalized Pareto Distribution to values exceeding a threshold. This is the Peaks Over Threshold (POT) approach.
fit_gpd_threshold( data, variable = "wave_height", threshold = NULL, decluster = TRUE, decluster_hours = 48 )fit_gpd_threshold( data, variable = "wave_height", threshold = NULL, decluster = TRUE, decluster_hours = 48 )
data |
Data frame with the variable to analyze |
variable |
Name of the variable column (default: "wave_height") |
threshold |
Threshold value (default: NULL, uses 95th percentile) |
decluster |
Logical, whether to decluster exceedances (default: TRUE) |
decluster_hours |
Minimum hours between independent exceedances (default: 48) |
List with:
fit: extRemes fevd object
exceedances: data frame of exceedances
threshold: threshold used
parameters: GPD parameters (scale, shape)
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wave_height")) gpd_result <- fit_gpd_threshold(data, threshold = 6) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "wave_height")) gpd_result <- fit_gpd_threshold(data, threshold = 6) DBI::dbDisconnect(con) ## End(Not run)
Fits a Brown-Resnick max-stable process model to annual block maxima of wave heights across multiple stations. Margins are first transformed to unit Frechet using the empirical CDF. If the Brown-Resnick model fails to converge, a Schlather model (Whittle-Matern covariance) is tried as fallback.
Limitation: Max-stable models require many spatial locations (typically= 20) for reliable estimation. With only 5 buoy stations, results areillustrative and the information matrix is often singular.
fit_spatial_maxstable( data, variable = "wave_height", station_info = NULL, min_years = 5 )fit_spatial_maxstable( data, variable = "wave_height", station_info = NULL, min_years = 5 )
data |
Data frame with columns: |
variable |
Variable to analyze (default: |
station_info |
Optional data frame with station metadata (from
|
min_years |
Minimum number of complete years required across all stations (default: 5). |
List with:
Logical: whether a max-stable model was successfully fitted.
The fitted model object (from SpatialExtremes::fitmaxstab),
or NULL if fitting failed.
Character: "brown_resnick", "schlather", or NA.
Named numeric vector of fitted parameters, or NULL.
Data frame of annual maxima per station (long format).
Coordinate matrix (lon, lat) used for fitting.
Character string describing the illustrative nature of results with few stations.
If fitting fails entirely, fitted = FALSE and a reason field explains why.
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height")) result <- fit_spatial_maxstable(data) if (result$fitted) print(result$parameters) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height")) result <- fit_spatial_maxstable(data) if (result$fitted) print(result$parameters) DBI::dbDisconnect(con) ## End(Not run)
Main function to generate summary and send via email. Requires GMAIL_USERNAME and GMAIL_APP_PASSWORD environment variables.
generate_and_send_summary( recipient = Sys.getenv("GMAIL_USERNAME"), sender = Sys.getenv("GMAIL_USERNAME") )generate_and_send_summary( recipient = Sys.getenv("GMAIL_USERNAME"), sender = Sys.getenv("GMAIL_USERNAME") )
recipient |
Email recipient (default from GMAIL_USERNAME env var) |
sender |
Email sender (default from GMAIL_USERNAME env var) |
Returns STL decomposition results per station, downsampled to daily resolution to keep JSON under 1MB.
generate_api_decomposition(decomp_per_station)generate_api_decomposition(decomp_per_station)
decomp_per_station |
Named list of per-station decomposition results.
Each element should have |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
Combines GPD return levels and CI comparison (delta, bootstrap, order-statistics) into a single endpoint.
generate_api_extremes(return_levels_per_station, ci_comparison_per_station)generate_api_extremes(return_levels_per_station, ci_comparison_per_station)
return_levels_per_station |
Tibble with columns |
ci_comparison_per_station |
Tibble with columns |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
Returns gust factor analysis results per station, capped at 500 extreme events to keep JSON under 1MB.
generate_api_gust_factors(gust_analysis)generate_api_gust_factors(gust_analysis)
gust_analysis |
List from |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
Creates a JSON-serialisable list describing all available API endpoints.
Used to generate index.json at the API root.
generate_api_index( base_url = "https://johngavin.github.io/irishbuoys/api/v1/", endpoints = NULL )generate_api_index( base_url = "https://johngavin.github.io/irishbuoys/api/v1/", endpoints = NULL )
base_url |
Character, base URL for the API
(default: |
endpoints |
Named list of endpoint metadata. Each element should have
|
A list suitable for jsonlite::toJSON().
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
## Not run: idx <- generate_api_index() jsonlite::toJSON(idx, pretty = TRUE, auto_unbox = TRUE) ## End(Not run)## Not run: idx <- generate_api_index() jsonlite::toJSON(idx, pretty = TRUE, auto_unbox = TRUE) ## End(Not run)
Queries DuckDB for the most recent n observations per station.
Returns a tibble suitable for JSON serialisation.
generate_api_latest(db_path = "inst/extdata/irish_buoys.duckdb", n = 1L)generate_api_latest(db_path = "inst/extdata/irish_buoys.duckdb", n = 1L)
db_path |
Character, path to the DuckDB database file
(default: |
n |
Integer, number of most recent observations per station to return (default: 1L) |
A tibble with n rows per station, ordered by station and time
(most recent first).
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
## Not run: latest <- generate_api_latest(n = 1) latest_5 <- generate_api_latest(n = 5) ## End(Not run)## Not run: latest <- generate_api_latest(n = 1) latest_5 <- generate_api_latest(n = 5) ## End(Not run)
Returns statistical methods documentation: thresholds, formulas, references. Pure function with no upstream target dependency.
generate_api_methods()generate_api_methods()
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
methods <- generate_api_methods() names(methods)methods <- generate_api_methods() names(methods)
Returns data provenance constants: ERDDAP URL, dataset ID, update frequency, license, and citation.
generate_api_sources(update_frequency = NULL)generate_api_sources(update_frequency = NULL)
update_frequency |
Character, human-readable update schedule.
If NULL (default), uses "Every 6 hours (0:00, 6:00, 12:00, 18:00 UTC)".
Typically supplied dynamically from the |
A list with _meta and data fields suitable for
jsonlite::toJSON().
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends(),
run_api()
## Not run: src <- generate_api_sources() jsonlite::toJSON(src, pretty = TRUE, auto_unbox = TRUE) ## End(Not run)## Not run: src <- generate_api_sources() jsonlite::toJSON(src, pretty = TRUE, auto_unbox = TRUE) ## End(Not run)
Returns cross-station correlation matrices for wave height, wind speed, and Hmax.
generate_api_spatial(pair_wave, pair_wind, pair_hmax)generate_api_spatial(pair_wave, pair_wind, pair_hmax)
pair_wave |
Data frame from |
pair_wind |
Data frame from |
pair_hmax |
Data frame from |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_status(),
generate_api_trends(),
run_api()
## Not run: dm <- data.frame(station1 = "M2", station2 = "M3", distance_km = 150) cr <- data.frame(station1 = "M2", station2 = "M3", correlation = 0.85) ed <- data.frame(station1 = "M2", station2 = "M3", chi = 0.3) generate_api_spatial(dm, cr, ed) ## End(Not run)## Not run: dm <- data.frame(station1 = "M2", station2 = "M3", distance_km = 150) cr <- data.frame(station1 = "M2", station2 = "M3", correlation = 0.85) ed <- data.frame(station1 = "M2", station2 = "M3", chi = 0.3) generate_api_spatial(dm, cr, ed) ## End(Not run)
Returns per-station operational status including record counts
and date ranges. Reuses dashboard_stats target output.
generate_api_status(dashboard_stats)generate_api_status(dashboard_stats)
dashboard_stats |
List, output from the |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_trends(),
run_api()
Returns Mann-Kendall trend tests per station/variable and overall annual trend statistics for wave height and wind speed.
generate_api_trends( mann_kendall_per_station, annual_trends_wave, annual_trends_wind )generate_api_trends( mann_kendall_per_station, annual_trends_wave, annual_trends_wind )
mann_kendall_per_station |
Named list of per-station Mann-Kendall
results. Each element is a station, containing named sub-elements
for each variable (e.g. |
annual_trends_wave |
List with |
annual_trends_wind |
Same structure as |
A list with _meta and data fields.
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
run_api()
Creates pointblank validation reports and saves them to the docs directory for inclusion in the pkgdown/GitHub Pages website.
generate_validation_reports( analysis_data, rogue_events, output_dir = "docs/articles" )generate_validation_reports( analysis_data, rogue_events, output_dir = "docs/articles" )
analysis_data |
The analysis_data tibble to validate |
rogue_events |
The rogue_wave_events tibble to validate |
output_dir |
Directory to save reports (default: "docs/articles") |
A list with paths to generated reports
Compares recent data against historical averages to identify trends and anomalies. Optionally includes data ingestion statistics.
generate_weekly_summary( db_path = "inst/extdata/irish_buoys.duckdb", lookback_days = 7, qc_filter = NULL, update_result = NULL )generate_weekly_summary( db_path = "inst/extdata/irish_buoys.duckdb", lookback_days = 7, qc_filter = NULL, update_result = NULL )
db_path |
Path to DuckDB database |
lookback_days |
Number of days to analyze (default: 7) |
qc_filter |
QC flag filter: 1 = good only, 0 = include unverified, NULL = no filter |
update_result |
Optional result from incremental_update() containing ingestion stats |
List containing summary statistics and comparisons
This function returns a comprehensive data dictionary for all variables available in the Irish Weather Buoy Network dataset. Each entry includes the variable name, units, data type, description, and typical range.
get_data_dictionary()get_data_dictionary()
A data frame containing the complete data dictionary with columns:
variable: Variable name as used in the dataset
category: Category (dimension, meteorological, oceanographic, quality)
units: Measurement units
data_type: R data type
description: Detailed description of the variable
typical_range: Typical or valid range of values
dict <- get_data_dictionary() print(dict)dict <- get_data_dictionary() print(dict)
Returns summary statistics about the current state of the database.
get_database_stats(db_path = "inst/extdata/irish_buoys.duckdb")get_database_stats(db_path = "inst/extdata/irish_buoys.duckdb")
db_path |
Path to DuckDB database file |
List with database statistics
## Not run: stats <- get_database_stats() print(stats) ## End(Not run)## Not run: stats <- get_database_stats() print(stats) ## End(Not run)
Queries the ERDDAP server to find the most recent data timestamp available for the Irish Weather Buoy Network.
get_latest_timestamp(station = NULL)get_latest_timestamp(station = NULL)
station |
Optional station ID to check specific buoy |
POSIXct timestamp of most recent data
## Not run: latest <- get_latest_timestamp() latest_m3 <- get_latest_timestamp("M3") ## End(Not run)## Not run: latest <- get_latest_timestamp() latest_m3 <- get_latest_timestamp("M3") ## End(Not run)
Returns a data frame with station metadata including coordinates and depths.
get_station_info()get_station_info()
Data frame with columns: station_id, location, lat, lon, depth_m, distance_km
get_station_info()get_station_info()
Returns a data frame with information about all available weather buoy stations.
get_stations()get_stations()
Data frame with station metadata
## Not run: stations <- get_stations() ## End(Not run)## Not run: stations <- get_stations() ## End(Not run)
Returns extended documentation for specific variables including scientific context, calculation methods, and usage notes.
get_variable_docs(variable = NULL)get_variable_docs(variable = NULL)
variable |
Character string specifying the variable name |
List containing detailed documentation
doc <- get_variable_docs("WaveHeight")doc <- get_variable_docs("WaveHeight")
Calculates the great-circle distance between two points using the Haversine formula.
haversine_distance(lat1, lon1, lat2, lon2)haversine_distance(lat1, lon1, lat2, lon2)
lat1, lon1
|
Coordinates of first point (degrees) |
lat2, lon2
|
Coordinates of second point (degrees) |
Distance in kilometers
# Distance from M6 to M2 haversine_distance(53.07, -15.93, 51.22, -9.99)# Distance from M6 to M2 haversine_distance(53.07, -15.93, 51.22, -9.99)
Converts RMS wave height to significant wave height using the theoretical relationship for Rayleigh-distributed waves.
hs_from_rms(h_rms)hs_from_rms(h_rms)
h_rms |
RMS wave height in meters |
For Rayleigh-distributed waves: Hs = H_rms * sqrt(8) ~ 2.83 * H_rms
Significant wave height in meters
h_rms <- 1.5 hs <- hs_from_rms(h_rms) # Returns ~4.24 mh_rms <- 1.5 hs <- hs_from_rms(h_rms) # Returns ~4.24 m
Returns a DBI connection to an ephemeral DuckDB instance.
DuckDB 0.10+ supports hf://datasets/... natively.
httpfs is loaded as a fallback for non-HF HTTPS URLs.
ib_hf_connect()ib_hf_connect()
DBI connection object
Other huggingface:
ib_hf_online(),
ib_hf_url()
## Not run: con <- ib_hf_connect() dplyr::tbl(con, ib_hf_url()) |> dplyr::glimpse() DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- ib_hf_connect() dplyr::tbl(con, ib_hf_url()) |> dplyr::glimpse() DBI::dbDisconnect(con) ## End(Not run)
Returns TRUE if the HF API responds within 5 seconds.
Used by tests and examples to fall back to local sample data.
ib_hf_online()ib_hf_online()
Logical
Other huggingface:
ib_hf_connect(),
ib_hf_url()
ib_hf_online()ib_hf_online()
DuckDB 0.10+ supports hf://datasets/... natively — no httpfs extension
needed, 34% faster than resolve/main/ URLs.
ib_hf_url(filename = "buoy_data.parquet")ib_hf_url(filename = "buoy_data.parquet")
filename |
Parquet filename (default: |
hf://datasets/{repo}/{filename} URL string
Other huggingface:
ib_hf_connect(),
ib_hf_online()
ib_hf_url() ib_hf_url("stations.json")ib_hf_url() ib_hf_url("stations.json")
Downloads new data since the last update and appends it to the database. Designed to be run on a schedule (e.g., daily or weekly via cron/GitHub Actions).
incremental_update( db_path = "inst/extdata/irish_buoys.duckdb", lookback_hours = 48 )incremental_update( db_path = "inst/extdata/irish_buoys.duckdb", lookback_hours = 48 )
db_path |
Path to DuckDB database file |
lookback_hours |
Number of hours to look back for safety (default: 48) This ensures we don't miss data due to delays in ERDDAP updates |
List with update statistics
## Not run: # Perform incremental update result <- incremental_update() # Check what was updated print(result$summary) ## End(Not run)## Not run: # Perform incremental update result <- incremental_update() # Check what was updated print(result$summary) ## End(Not run)
Efficiently append new data to Parquet files. Only writes new partitions or updates existing ones.
incremental_update_parquet(new_data, data_path = "inst/extdata/parquet")incremental_update_parquet(new_data, data_path = "inst/extdata/parquet")
new_data |
New data to append |
data_path |
Base path for Parquet files |
Uses Parquet files as storage backend with DuckDB as query engine. This provides excellent compression (5-10x) while maintaining query performance.
The architecture:
Raw data stored in partitioned Parquet files (by year/month)
DuckDB used as query engine (reads Parquet directly)
Optional: DuckDB database for metadata and indexes only Initialize Parquet Storage Structure
init_parquet_storage( data_path = "inst/extdata/parquet", db_path = "inst/extdata/metadata.duckdb" )init_parquet_storage( data_path = "inst/extdata/parquet", db_path = "inst/extdata/metadata.duckdb" )
data_path |
Base path for Parquet files |
db_path |
Optional path for metadata database |
Downloads and loads a larger set of historical data into the database. Use this for initial setup or to rebuild the database.
initialize_database( db_path = "inst/extdata/irish_buoys.duckdb", start_date = Sys.Date() - 365, end_date = Sys.Date(), chunk_days = 365 )initialize_database( db_path = "inst/extdata/irish_buoys.duckdb", start_date = Sys.Date() - 365, end_date = Sys.Date(), chunk_days = 365 )
db_path |
Path to DuckDB database file |
start_date |
Start date for historical data (default: 1 year ago) |
end_date |
End date for historical data (default: today) |
chunk_days |
Number of days to download at once (default: 30) |
Total number of records loaded
## Not run: # Initialize with last year of data records <- initialize_database(start_date = "2023-01-01") ## End(Not run)## Not run: # Initialize with last year of data records <- initialize_database(start_date = "2023-01-01") ## End(Not run)
Wrapper for ggplotly that applies the standard irishbuoys theme. Useful when converting ggplot2 plots to plotly.
irishbuoys_ggplotly(gg, title = NULL, ...)irishbuoys_ggplotly(gg, title = NULL, ...)
gg |
A ggplot2 object |
title |
Optional title to override ggplot title |
... |
Additional arguments passed to plotly::ggplotly() |
A styled plotly object
## Not run: p <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point() irishbuoys_ggplotly(p) ## End(Not run)## Not run: p <- ggplot2::ggplot(mtcars, ggplot2::aes(wt, mpg)) + ggplot2::geom_point() irishbuoys_ggplotly(p) ## End(Not run)
Applies consistent dark styling to all plotly plots in the irishbuoys package. Uses black background with white grid lines to match the Quarto cosmo dashboard theme. Bottom-positioned horizontal legend with dark hoverlabels.
irishbuoys_layout(p, title = NULL, ...)irishbuoys_layout(p, title = NULL, ...)
p |
A plotly object |
title |
Optional title string |
... |
Additional arguments passed to plotly::layout() |
A styled plotly object
## Not run: library(plotly) p <- plot_ly(data = mtcars, x = ~wt, y = ~mpg, type = "scatter", mode = "markers") p |> irishbuoys_layout(title = "Weight vs MPG") ## End(Not run)## Not run: library(plotly) p <- plot_ly(data = mtcars, x = ~wt, y = ~mpg, type = "scatter", mode = "markers") p |> irishbuoys_layout(title = "Weight vs MPG") ## End(Not run)
Comprehensive summary of joint dependencies across all stations.
joint_analysis_summary(data, variable = "wave_height")joint_analysis_summary(data, variable = "wave_height")
data |
Data frame with buoy data |
variable |
Variable to analyze (default: "wave_height") |
List containing all joint analysis results
## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) joint_analysis_summary(data) ## End(Not run)## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 100), 2), station_id = rep(c("M2", "M3"), each = 100), wave_height = c(rnorm(100, 3, 1), rnorm(100, 2.5, 0.8)) ) joint_analysis_summary(data) ## End(Not run)
Vectorized conversion from wind speed in knots to the Beaufort scale (0-12).
knots_to_beaufort(wind_speed_kn)knots_to_beaufort(wind_speed_kn)
wind_speed_kn |
Numeric vector of wind speeds in knots. |
Integer vector of Beaufort numbers (0-12).
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
p_hmax_exceedance(),
send_storm_alert(),
summarise_forecast_rogue_risk()
knots_to_beaufort(c(0, 5, 20, 34, 48, 64))knots_to_beaufort(c(0, 5, 20, 34, 48, 64))
Loads buoy data from a data frame into the DuckDB database. Handles duplicates by using ON CONFLICT DO NOTHING.
load_to_duckdb(data, con, update_metadata = TRUE)load_to_duckdb(data, con, update_metadata = TRUE)
data |
Data frame containing buoy data |
con |
DBI connection object |
update_metadata |
Logical, whether to update station metadata (default: TRUE) |
Number of rows inserted
## Not run: # Download and load data data <- download_buoy_data(start_date = "2024-01-01") con <- connect_duckdb() rows_added <- load_to_duckdb(data, con) DBI::dbDisconnect(con) ## End(Not run)## Not run: # Download and load data data <- download_buoy_data(start_date = "2024-01-01") con <- connect_duckdb() rows_added <- load_to_duckdb(data, con) DBI::dbDisconnect(con) ## End(Not run)
Performs a non-parametric Mann-Kendall trend test on a time series variable.
Uses Kendall's tau via stats::cor.test(method = "kendall").
mann_kendall_test(data, variable = "wave_height", time_col = "time")mann_kendall_test(data, variable = "wave_height", time_col = "time")
data |
Data frame with time and value columns |
variable |
Name of the variable (default: "wave_height") |
time_col |
Name of the time column (default: "time") |
List with tau, p_value, and trend_direction ("increasing", "decreasing", or "no trend").
## Not run: data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "day", length.out = 365), wave_height = seq(2, 3, length.out = 365) + rnorm(365, 0, 0.2) ) mann_kendall_test(data) ## End(Not run)## Not run: data <- data.frame( time = seq(as.POSIXct("2020-01-01"), by = "day", length.out = 365), wave_height = seq(2, 3, length.out = 365) + rnorm(365, 0, 0.2) ) mann_kendall_test(data) ## End(Not run)
Maps a confidence multiplier to a short human-readable status label and a suggested colour for dashboard badges.
obs_status_label(confidence)obs_status_label(confidence)
confidence |
Numeric vector of confidence values in |
List with label (character) and color (character hex), both the
same length as confidence.
Other obs-confidence:
compute_obs_confidence(),
widen_ci()
obs_status_label(c(1, 0.7, 0.4, 0.15))obs_status_label(c(1, 0.7, 0.4, 0.15))
Computes P(H_max > h | H_s, T_z, D) for a stationary sea state of significant wave height H_s, mean zero-crossing period T_z, lasting duration D, using the Forristall (1978) Weibull short-term distribution for individual wave heights.
Forristall (1978) gives P(H > h | H_s) = exp(-(h / (alpha * H_s))^beta) with alpha = 0.681 and beta = 2.126 (calibrated on Gulf of Mexico storm data). For N independent waves in the window, P(H_max <= h) = (1 - P(H > h))^N, so P(H_max > h) = 1 - (1 - exp(-(h/(alpha*H_s))^beta))^N, with N = D / T_z.
Reference: Forristall, G. Z. (1978). On the statistical distribution of wave heights in a storm. Journal of Geophysical Research, 83(C5), 2353-2358.
p_hmax_exceedance(h, hs, tz, duration_s = 3600, alpha = 0.681, beta = 2.126)p_hmax_exceedance(h, hs, tz, duration_s = 3600, alpha = 0.681, beta = 2.126)
h |
Numeric vector of wave heights to evaluate (m). |
hs |
Numeric significant wave height (m), length 1 or length(h). |
tz |
Numeric mean zero-crossing period (s), length 1 or length(h). |
duration_s |
Numeric window duration in seconds (default 3600 = 1 hour). |
alpha |
Forristall scale parameter (default 0.681). |
beta |
Forristall shape parameter (default 2.126). |
Numeric vector of P(H_max > h) values in [0, 1]. Returns NA where
hs <= 0, tz <= 0, or any input is NA.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
send_storm_alert(),
summarise_forecast_rogue_risk()
# Probability of a 20 m wave during a 1-hour window with Hs = 10 m, Tz = 9 s p_hmax_exceedance(20, hs = 10, tz = 9, duration_s = 3600)# Probability of a 20 m wave during a 1-hour window with Hs = 10 m, Tz = 9 s p_hmax_exceedance(20, hs = 10, tz = 9, duration_s = 3600)
Uses one station to predict another at the optimal lag. Particularly useful for M6 (offshore) predicting coastal stations.
predict_station_lagged( data, predictor_station, target_station, variable = "wave_height", lag_hours = NULL )predict_station_lagged( data, predictor_station, target_station, variable = "wave_height", lag_hours = NULL )
data |
Data frame with columns: time, station_id, and the variable |
predictor_station |
Station to use as predictor (e.g., "M6") |
target_station |
Station to predict (e.g., "M2") |
variable |
Variable to predict (default: "wave_height") |
lag_hours |
Lag in hours (positive = predictor leads target) |
List with:
model: lm object
r_squared: R-squared of prediction
rmse: Root mean squared error
predictions: data frame with actual and predicted values
## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 200), 2), station_id = rep(c("M6", "M2"), each = 200), wave_height = c(rnorm(200, 3, 1), rnorm(200, 2.5, 0.8)) ) predict_station_lagged(data, "M6", "M2", lag_hours = 6) ## End(Not run)## Not run: data <- data.frame( time = rep(seq(as.POSIXct("2024-01-01"), by = "hour", length.out = 200), 2), station_id = rep(c("M6", "M2"), each = 200), wave_height = c(rnorm(200, 3, 1), rnorm(200, 2.5, 0.8)) ) predict_station_lagged(data, "M6", "M2", lag_hours = 6) ## End(Not run)
Predicts wave height for new observations.
predict_wave_height(model_result, new_data)predict_wave_height(model_result, new_data)
model_result |
Result from train_wave_model |
new_data |
Data frame with predictor values |
Numeric vector of predicted wave heights
Creates lagged features and derived variables for wave height prediction.
prepare_wave_features(data, lags = 1:3)prepare_wave_features(data, lags = 1:3)
data |
Data frame with buoy observations |
lags |
Integer vector of lag periods in hours (default: 1:3) |
Data frame with additional lagged and derived features
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, qc_filter = FALSE) features <- prepare_wave_features(data) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, qc_filter = FALSE) features <- prepare_wave_features(data) DBI::dbDisconnect(con) ## End(Not run)
Flexible querying of buoy data with various filtering options. Uses dplyr verbs translated to SQL for efficient DuckDB execution.
query_buoy_data( con, stations = NULL, start_date = NULL, end_date = NULL, variables = NULL, qc_filter = TRUE )query_buoy_data( con, stations = NULL, start_date = NULL, end_date = NULL, variables = NULL, qc_filter = TRUE )
con |
DBI connection object |
stations |
Character vector of station IDs (default: all) |
start_date |
Start date for query |
end_date |
End date for query |
variables |
Character vector of variables to return |
qc_filter |
Logical, filter for good quality data only (default: TRUE) |
Data frame with query results
## Not run: con <- connect_duckdb() # Get recent M3 wave data waves <- query_buoy_data( con, stations = "M3", variables = c("time", "wave_height", "wave_period"), start_date = Sys.Date() - 7 ) ## End(Not run)## Not run: con <- connect_duckdb() # Get recent M3 wave data waves <- query_buoy_data( con, stations = "M3", variables = c("time", "wave_height", "wave_period"), start_date = Sys.Date() - 7 ) ## End(Not run)
DuckDB can query Parquet files directly without importing. This provides excellent performance with minimal memory usage.
query_parquet( query = NULL, data_path = "inst/extdata/parquet/by_year_month", stations = NULL, date_range = NULL )query_parquet( query = NULL, data_path = "inst/extdata/parquet/by_year_month", stations = NULL, date_range = NULL )
query |
SQL query or NULL for interactive connection |
data_path |
Path to Parquet files |
stations |
Filter for specific stations |
date_range |
Date range as c(start_date, end_date) |
## Not run: # Query recent data df <- query_parquet( "SELECT * FROM buoy_data WHERE wave_height > 5", date_range = c(Sys.Date() - 30, Sys.Date()) ) ## End(Not run)## Not run: # Query recent data df <- query_parquet( "SELECT * FROM buoy_data WHERE wave_height > 5", date_range = c(Sys.Date() - 30, Sys.Date()) ) ## End(Not run)
Reads a project's prediction JSONL file and reconciles outcomes.
When multiple records share the same prediction_id, the latest
non-null outcome wins (allows appending outcome updates).
read_predictions(project_slug = NULL)read_predictions(project_slug = NULL)
project_slug |
Character project slug. If NULL, reads all files
in |
Tibble of predictions with one row per unique prediction_id
## Not run: preds <- read_predictions("my-project-slug") head(preds) ## End(Not run)## Not run: preds <- read_predictions("my-project-slug") head(preds) ## End(Not run)
Generates a formatted summary report of rogue wave analysis.
rogue_wave_report(con, days = 30)rogue_wave_report(con, days = 30)
con |
DBI connection to DuckDB database |
days |
Number of days to analyze (default: 30) |
Character string with formatted report
Starts a plumber API server that serves pre-computed JSON files
from docs/api/v1/. Requires the plumber package.
run_api(port = 8080, host = "0.0.0.0")run_api(port = 8080, host = "0.0.0.0")
port |
Integer port number (default: 8080) |
host |
Character host address (default: "0.0.0.0") |
Invisibly returns the plumber router (runs until interrupted).
Other api:
api_plumber,
api_static,
create_api_router(),
generate_api_decomposition(),
generate_api_extremes(),
generate_api_gust_factors(),
generate_api_index(),
generate_api_latest(),
generate_api_methods(),
generate_api_sources(),
generate_api_spatial(),
generate_api_status(),
generate_api_trends()
## Not run: run_api() # API available at http://localhost:8080 # Swagger docs at http://localhost:8080/__docs__/ ## End(Not run)## Not run: run_api() # API available at http://localhost:8080 # Swagger docs at http://localhost:8080/__docs__/ ## End(Not run)
Save Data to Parquet with Optimal Compression
save_to_parquet( data, data_path = "inst/extdata/parquet", partition_by = "year_month", compression = "zstd" )save_to_parquet( data, data_path = "inst/extdata/parquet", partition_by = "year_month", compression = "zstd" )
data |
Data frame to save |
data_path |
Base path for Parquet files |
partition_by |
How to partition: "year_month", "station", or "both" |
compression |
Compression algorithm: "snappy", "gzip", "zstd", "lz4" |
Main orchestrator: fetches forecasts, detects storms, and sends an email alert if strong gale winds (Beaufort 9+) are forecast. If no storms are detected, no email is sent. Uses the same Gmail SMTP pattern as the weekly email report.
send_storm_alert( threshold_knots = NULL, recipient = Sys.getenv("GMAIL_USERNAME"), sender = Sys.getenv("GMAIL_USERNAME"), dry_run = FALSE )send_storm_alert( threshold_knots = NULL, recipient = Sys.getenv("GMAIL_USERNAME"), sender = Sys.getenv("GMAIL_USERNAME"), dry_run = FALSE )
threshold_knots |
Numeric threshold in knots (default NULL, uses env var or 41). |
recipient |
Email recipient (default from |
sender |
Email sender (default from |
dry_run |
Logical; if TRUE, saves HTML preview to tempdir instead of sending. |
List with: status ("sent", "no_storms", "preview", "error"), n_storms, stations_affected, preview_file (if dry_run), error (if failed).
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
summarise_forecast_rogue_risk()
## Not run: # Check with very high threshold (likely no storms) send_storm_alert(threshold_knots = 999) # Dry run with low threshold (likely produces alert) send_storm_alert(threshold_knots = 20, dry_run = TRUE) ## End(Not run)## Not run: # Check with very high threshold (likely no storms) send_storm_alert(threshold_knots = 999) # Dry run with low threshold (likely produces alert) send_storm_alert(threshold_knots = 20, dry_run = TRUE) ## End(Not run)
Creates a matrix of distances between all station pairs.
station_distance_matrix(station_info = NULL)station_distance_matrix(station_info = NULL)
station_info |
Data frame from get_station_info() or NULL to use default |
Named matrix of distances in km
station_distance_matrix()station_distance_matrix()
Given an Open-Meteo marine forecast tibble, computes per-station summaries:
the peak forecast hour, peak H_s, peak P(H_max > 20 m), peak P(H_max > 25 m).
Uses p_hmax_exceedance() applied independently per forecast hour, then
takes the maximum across the forecast horizon.
This is a forecast-derived risk surrogate — it should always be presented alongside the deterministic-NWP caveat (lead-time skill drops sharply after day 2-3, no ensemble spread).
summarise_forecast_rogue_risk( marine_forecasts, thresholds = c(10, 15, 20, 25), duration_s = 3600 )summarise_forecast_rogue_risk( marine_forecasts, thresholds = c(10, 15, 20, 25), duration_s = 3600 )
marine_forecasts |
Tibble from |
thresholds |
Numeric vector of H_max thresholds in metres
(default |
duration_s |
Window duration in seconds for each forecast hour (default 3600). |
Tibble with one row per station: station_id, peak_time, peak_hs_m,
peak_period_s, p_hmax_gt_10, p_hmax_gt_15, p_hmax_gt_20, p_hmax_gt_25,
n_forecast_hours. Empty tibble if marine_forecasts is empty.
Other storm-alert:
beaufort_to_description(),
create_storm_alert_email(),
detect_storm_events(),
fetch_all_forecasts(),
fetch_all_marine_forecasts(),
fetch_met_eireann_warnings(),
fetch_open_meteo_forecast(),
fetch_open_meteo_marine(),
knots_to_beaufort(),
p_hmax_exceedance(),
send_storm_alert()
Tests whether rogue wave events at one station are followed by rogue events at another station within a time window consistent with wave propagation. Uses a permutation test: the null hypothesis is that rogue events at the second station are uniformly distributed over time (no clustering with the first station).
For each station pair, the theoretical propagation lag is estimated as
distance_km / propagation_speed_kmh (default 30 km/h for deep-water
swell group velocity). Co-occurrence is counted when a station-2 rogue event
falls within [lag - tolerance, lag + tolerance] hours of a station-1 event.
test_rogue_propagation( data, rogue_threshold = 2, min_wave_height = 2, station_pairs = NULL, propagation_speed_kmh = 30, n_permutations = 500, station_info = NULL )test_rogue_propagation( data, rogue_threshold = 2, min_wave_height = 2, station_pairs = NULL, propagation_speed_kmh = 30, n_permutations = 500, station_info = NULL )
data |
Data frame with columns: |
rogue_threshold |
Hmax/Hs ratio threshold for rogue classification (default: 2.0). |
min_wave_height |
Minimum significant wave height in metres for a qualifying observation (default: 2.0). |
station_pairs |
Optional list of character vectors, each of length 2,
specifying directed pairs |
propagation_speed_kmh |
Assumed deep-water group velocity in km/h (default: 30). |
n_permutations |
Number of permutations for the test (default: 500). |
station_info |
Optional data frame from |
List with:
Data frame with columns: station1, station2,
distance_km, theoretical_lag_hrs, n_rogue_s1, n_rogue_s2,
co_occurrence_count, co_occurrence_rate, marginal_rate,
perm_mean_rate, p_value, h3_significant (logical), h3_verdict.
Data frame of all detected rogue wave events.
Total number of rogue events across all stations.
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height", "hmax")) result <- test_rogue_propagation(data) result$h3_table DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, variables = c("time", "station_id", "wave_height", "hmax")) result <- test_rogue_propagation(data) result$h3_table DBI::dbDisconnect(con) ## End(Not run)
Trains a Random Forest model using ranger to predict wave height.
train_wave_model( data, target = "wave_height", predictors = NULL, train_fraction = 0.7, seed = 42, ... )train_wave_model( data, target = "wave_height", predictors = NULL, train_fraction = 0.7, seed = 42, ... )
data |
Data frame with prepared features (from prepare_wave_features) |
target |
Target variable name (default: "wave_height") |
predictors |
Character vector of predictor names (default: NULL uses standard set) |
train_fraction |
Fraction of data for training (default: 0.7) |
seed |
Random seed for reproducibility (default: 42) |
... |
Additional arguments passed to ranger::ranger |
List with model, train/test indices, and feature importance
## Not run: con <- connect_duckdb() data <- query_buoy_data(con, qc_filter = FALSE) features <- prepare_wave_features(data) model_result <- train_wave_model(features) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_duckdb() data <- query_buoy_data(con, qc_filter = FALSE) features <- prepare_wave_features(data) model_result <- train_wave_model(features) DBI::dbDisconnect(con) ## End(Not run)
Generates a formatted summary of trend analysis results.
trend_summary_report(seasonal_means, annual_trends, anomalies = NULL)trend_summary_report(seasonal_means, annual_trends, anomalies = NULL)
seasonal_means |
Result from calculate_seasonal_means |
annual_trends |
Result from calculate_annual_trends |
anomalies |
Result from detect_anomalies (optional) |
Character string with formatted report
## Not run: data <- data.frame( time = seq(as.POSIXct("2015-01-01"), by = "hour", length.out = 5000), wave_height = 2 + sin(seq(0, 40, length.out = 5000)) + rnorm(5000, 0, 0.3) ) seasonal <- calculate_seasonal_means(data) annual <- calculate_annual_trends(data) trend_summary_report(seasonal, annual) ## End(Not run)## Not run: data <- data.frame( time = seq(as.POSIXct("2015-01-01"), by = "hour", length.out = 5000), wave_height = 2 + sin(seq(0, 40, length.out = 5000)) + rnorm(5000, 0, 0.3) ) seasonal <- calculate_seasonal_means(data) annual <- calculate_annual_trends(data) trend_summary_report(seasonal, annual) ## End(Not run)
Performs comprehensive validation of the analysis_data target using pointblank's interrogation framework. Checks for:
Minimum row count
Required columns exist
No NULL values in key columns
Value ranges for physical measurements
Valid station IDs
validate_buoy_data( data, target_name = "analysis_data", min_rows = 100, report_path = NULL )validate_buoy_data( data, target_name = "analysis_data", min_rows = 100, report_path = NULL )
data |
A data frame or tibble to validate |
target_name |
Name of the target for error messages (default: "analysis_data") |
min_rows |
Minimum expected rows (default: 100) |
report_path |
Optional path to save HTML validation report |
The original data if validation passes, otherwise aborts with error
## Not run: # Basic validation validated_data <- validate_buoy_data(my_data) # With custom settings and report validated_data <- validate_buoy_data( my_data, target_name = "custom_target", min_rows = 1000, report_path = "validation_report.html" ) ## End(Not run)## Not run: # Basic validation validated_data <- validate_buoy_data(my_data) # With custom settings and report validated_data <- validate_buoy_data( my_data, target_name = "custom_target", min_rows = 1000, report_path = "validation_report.html" ) ## End(Not run)
Checks that the latest observation timestamps in ingestion_stats are within an acceptable window of the current time.
validate_email_freshness(ingestion_stats, max_stale_hours = 96)validate_email_freshness(ingestion_stats, max_stale_hours = 96)
ingestion_stats |
Tibble with |
max_stale_hours |
Maximum acceptable age of data in hours (default: 96) |
ingestion_stats (invisibly), or aborts if ALL stations are stale
stats <- tibble::tibble( station_id = c("M2", "M3"), latest = Sys.time() - c(1, 2) * 3600 ) validate_email_freshness(stats)stats <- tibble::tibble( station_id = c("M2", "M3"), latest = Sys.time() - c(1, 2) * 3600 ) validate_email_freshness(stats)
Validates rogue wave detection results with specific checks for the rogue_ratio column and event characteristics.
validate_rogue_events( data, target_name = "rogue_wave_events", min_rows = 1, report_path = NULL )validate_rogue_events( data, target_name = "rogue_wave_events", min_rows = 1, report_path = NULL )
data |
A data frame of rogue wave events |
target_name |
Name of the target for error messages |
min_rows |
Minimum expected rows (default: 1) |
report_path |
Optional path to save HTML validation report |
The original data if validation passes
Returns a data frame of acronyms and definitions used in wave measurement.
wave_glossary()wave_glossary()
Data frame with columns: acronym, term, definition, unit
glossary <- wave_glossary() print(glossary)glossary <- wave_glossary() print(glossary)
Functions for building and using a Random Forest model to predict significant wave height from meteorological variables.
Creates a formatted summary report of the wave height prediction model.
wave_model_report(model_result, eval_result)wave_model_report(model_result, eval_result)
model_result |
Result from train_wave_model |
eval_result |
Result from evaluate_wave_model |
Character string with formatted report
Returns a comprehensive markdown document explaining wave measurement science, suitable for inclusion in vignettes.
wave_science_documentation()wave_science_documentation()
Character string with markdown-formatted documentation
docs <- wave_science_documentation() names(docs)docs <- wave_science_documentation() names(docs)
Inflates the half-width of an existing CI by 1 / confidence. Useful when
the underlying point estimate is from observations and you want the
displayed band to grow as the data ages, without refitting the model.
This is a heuristic display device, not a proper Bayesian update. It preserves the median estimate and only stretches the band. Stretching is applied symmetrically about the point estimate.
widen_ci(point, lower, upper, confidence)widen_ci(point, lower, upper, confidence)
point |
Numeric vector of point estimates. |
lower |
Numeric vector of original CI lower bounds. |
upper |
Numeric vector of original CI upper bounds. |
confidence |
Numeric multiplier in |
List with lower and upper numeric vectors, widened symmetrically
about point.
Other obs-confidence:
compute_obs_confidence(),
obs_status_label()
widen_ci(point = 10, lower = 8, upper = 12, confidence = 0.5)widen_ci(point = 10, lower = 8, upper = 12, confidence = 0.5)