
Sample Size Planning for Sequential ANOVAs
Meike Snijder-Steinhilber
2026-02-09
Source:vignettes/plan_sample_size.Rmd
plan_sample_size.RmdWhy Sample Size Planning Matters
Sample size planning for sequential tests differs fundamentally from fixed-design studies. In sequential ANOVA, the final sample size is determined by the data itself and remains unknown beforehand – data collection continues until either the upper or lower decision boundary is reached.
The challenge: While this data-driven stopping rule is very efficient, it creates practical difficulties. Resource planning requires knowing whether you might need 100 observations or 1,000. Budget constraints, time limitations, and logistical considerations all demand some advance estimate of required resources.
The solution: Although the exact final sample size
cannot be known in advance, simulation-based planning bridges the gap
between statistical theory and practical constraints. The
sprtt package provides the plan_sample_size()
function, which generates HTML reports summarizing simulation results
for sequential ANOVAs. Researchers can obtain guidance on:
- The typical amount of data required to reach a decision (\(N_{\text{median}}\))
- The upper limit of resources needed to achieve a specified decision rate (\(N_{\text{max}}\))
Resource Constraints and Decision Rates
While the decision boundaries of the sequential ANOVA control Type I (\(\alpha\)) and Type II (\(\beta\)) errors in the long run, and consequently maintain the desired power (\(1-\beta\)), a new consideration emerges when researchers face resource constraints.
When the maximum affordable sample size is reached before a decision boundary is crossed, this results in a non-decision. Importantly, the non-decision rate depends on the maximum sample size a researcher can collect. This introduces a new metric: the decision rate (the chance to reach a decision) given resource limitations.
While non-decisions are undesirable, they represent a crucial conceptual distinction from accepting the null hypothesis. SPRTs like the sequential ANOVA differentiate between stopping data collection to accept the null hypothesis and the case where more evidence is required to make a decision.
The plan_sample_size() Function
The plan_sample_size() function generates interactive
HTML reports for sample size planning based on a large simulation
database. Reports include recommended maximum sample sizes, expected
sample sizes, power curves, and comparisons to traditional ANOVA
designs.
Pre-computed Simulation Database
To make sample size planning fast and accessible, sprtt includes access to extensive simulation results. These simulations were conducted by:
- Generating thousands of datasets for each combination of parameters
- Running sequential ANOVAs on each dataset
- Recording when each test stopped
- Aggregating these results to get key summary statistics that guide the sample size planning
This simulation database is stored externally to keep the package
installation size small. The data are downloaded automatically on first
use of plan_sample_size() and cached locally for future
sessions.
Getting Started
Your First Sample Size Report
Let’s walk through a practical example. Imagine you’re planning a study to compare three groups. You want to detect medium-sized effects (Cohen’s f = 0.25) or larger, and you’re working with some specific constraints.
First, you’ll set your alpha level to 0.05, the standard threshold that ensures you can trust decisions to reject the null hypothesis and minimize Type I errors. You also want high statistical power (\(1-\beta = 0.90\)) so you can trust decisions to accept the null hypothesis and minimize Type II errors. However, given your limited resources, you’re willing to accept a 20% non-decision rate.
This means that 80% of the time you’ll reach a decision to accept one of the two hypothesis. Critically, whether that decision is to reject \(H_0\) (favoring \(H_1\)) or accept \(H_0\), you can trust the decision: you’ve limited false acceptances of \(H_1\) to 5% and false acceptances of \(H_0\) to 10% in the long run.
Now let’s see how to generate a sample size planning report for this scenario:
plan_sample_size(
f_expected = 0.25, # Expected effect size
k_groups = 3, # Number of groups
power = 0.90, # Desired power
decision_rate = 0.80 # desired percentage of decisions
)When you run this code for the first time, several things happen:
- Data download: The simulation database (~70 MB) is downloaded from GitHub and saved to your local cache directory. This is a one-time operation.
- Report generation: An HTML report is created in your temporary directory.
- Browser launch: The report automatically opens in your default web browser (if running interactively).
The entire process typically takes a couple of seconds for the initial download, then just a few seconds for generating the subsequent report.
Function Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
f_expected |
numeric | required | Expected standardized effect size (Cohen’s f). Must be one of: 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, or 0.40. |
k_groups |
integer | required | Number of groups to compare. Must be 2, 3, or 4. |
power |
numeric | 0.95 | Desired statistical power. Must be 0.80, 0.90, or 0.95. |
output_dir |
character | tempdir() |
Directory where the HTML report will be saved. |
output_file |
character | "sprtt-report-sample-size-planning.html" |
Filename for the generated report. |
open |
logical | interactive() |
Whether to open the report in your browser after generation. Set to
FALSE for batch processing. |
overwrite |
logical | FALSE | Whether to overwrite an existing file with the same name without prompting. |
Input Validation
The function validates all inputs before generating the report. If you specify a parameter value that doesn’t exist in the simulation database, you’ll receive an informative error message listing the available options. For example:
# This will produce an error:
plan_sample_size(f_expected = 0.22, k_groups = 3)
#> Error: `f_expected` = 0.22 is not available.
#> Please choose one of 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, or 0.4Practical Use cases
Case 1: Comparing Different Effect Sizes
The expected effect size has a large impact on required sample size. Here’s how to generate reports for different scenarios:
# report 1
plan_sample_size(f_expected = 0.15, k_groups = 3, power = 0.95)
# report 2
plan_sample_size(f_expected = 0.35, k_groups = 3, power = 0.95)Case 2: Saving Reports to a Specific Location
By default, reports are saved to a temporary directory. For reports you want to keep, specify a custom location:
plan_sample_size(
f_expected = 0.25,
k_groups = 4,
output_dir = "~/Documents/research/sample_size_planning",
output_file = "study1_anova_power.html",
open = TRUE
)This is particularly useful when preparing documentation for grant applications, pre-registrations, or manuscript supplementary materials.
Case 3: Preparing Multiple Scenarios
When preparing grant applications or pre-registrations, you might want to explore multiple scenarios (e.g., different effect size assumptions):
# Define scenarios to compare
scenarios <- data.frame(
effect = c(0.15, 0.20, 0.25),
label = c("conservative", "expected", "optimistic")
)
# Generate reports for each scenario
for (i in 1:(nrow(scenarios))) {
plan_sample_size(
f_expected = scenarios$effect[i],
k_groups = 3,
power = 0.90,
output_dir = "sample_size_reports",
output_file = sprintf("power_analysis_%s.html", scenarios$label[i]),
open = FALSE, # Don't open each one
overwrite = TRUE
)
}
message("Generated ", nrow(scenarios), " sample size reports")This approach creates a set of reports that document your planning across different assumptions.
Managing the Simulation Data
Downloading Data Explicitly
While plan_sample_size() downloads data automatically
when needed, you can also download it explicitly:
# Download simulation data manually
download_sample_size_data()This is useful if you want to:
- Pre-download data on a fast internet connection before traveling
- Verify the download completed successfully
- Troubleshoot download issues
To force a re-download (for example, after a package update with new simulation data):
download_sample_size_data(force = TRUE)Checking Cache Status
To see whether data are cached and how much disk space they occupy:
This displays:
- The cache directory location on your system
- Whether simulation data are currently cached
- The file size (approximately 15 MB when cached)
Clearing the Cache
If you need to free up disk space or suspect corrupted data, you can clear the cache:
The data will be re-downloaded automatically the next time you run
plan_sample_size().
Working with Simulation Data Directly
Advanced users may want to access the raw simulation data for custom analyses or visualizations. You can load the data directly into your R session:
# Load the complete simulation database
df_all <- load_sample_size_data()This data frame contains all simulation results and can be filtered, summarized, or visualized using standard R tools.