Statistical Analysis

Analyze experiment outcomes with rigorous statistical methods using the ExperimentFramework.Science package. Get publication-ready results with effect sizes, confidence intervals, and multiple comparison corrections.

Overview

The Science package provides:

Statistical Tests: t-test, chi-square, Mann-Whitney U, ANOVA
Effect Size Calculators: Cohen's d, odds ratio, relative risk
Multiple Comparison Corrections: Bonferroni, Holm-Bonferroni, Benjamini-Hochberg
Reporters: Markdown and JSON report generation

Installation

dotnet add package ExperimentFramework.Science

Requires MathNet.Numerics (included as a dependency).

Quick Start

using ExperimentFramework.Data;
using ExperimentFramework.Science;

var builder = WebApplication.CreateBuilder(args);

// Register services
builder.Services.AddExperimentDataCollection();
builder.Services.AddExperimentScience();

var app = builder.Build();

Statistical Tests

Choosing the Right Test

Data Type	Groups	Test	Use Case
Continuous	2 independent	Two-sample t-test	Comparing means (revenue, scores)
Continuous	2 paired	Paired t-test	Before/after comparisons
Binary	2	Chi-square test	Comparing proportions (conversion rates)
Continuous	2 (non-normal)	Mann-Whitney U	Non-parametric mean comparison
Continuous	3+	One-way ANOVA	Comparing multiple groups

Two-Sample t-Test (Welch's)

Compare means between two independent groups:

using ExperimentFramework.Science.Statistics;

var control = new double[] { 10.2, 12.5, 9.8, 11.3, 10.9 };
var treatment = new double[] { 14.1, 15.3, 13.7, 14.8, 15.0 };

var result = TwoSampleTTest.Instance.Perform(control, treatment, alpha: 0.05);

Console.WriteLine($"Test: {result.TestName}");
Console.WriteLine($"t-statistic: {result.TestStatistic:F3}");
Console.WriteLine($"p-value: {result.PValue:F4}");
Console.WriteLine($"Significant: {result.IsSignificant}");
Console.WriteLine($"95% CI: [{result.ConfidenceIntervalLower:F2}, {result.ConfidenceIntervalUpper:F2}]");
Console.WriteLine($"Effect: {result.PointEstimate:F2}");

// Output:
// Test: Welch's Two-Sample t-Test
// t-statistic: -7.234
// p-value: 0.0001
// Significant: True
// 95% CI: [2.41, 4.79]
// Effect: 3.60

One-Sided Tests

// Test if treatment is greater than control
var result = TwoSampleTTest.Instance.Perform(
    control, treatment,
    alpha: 0.05,
    alternativeType: AlternativeHypothesisType.Greater);

// Test if treatment is less than control
var result = TwoSampleTTest.Instance.Perform(
    control, treatment,
    alpha: 0.05,
    alternativeType: AlternativeHypothesisType.Less);

Chi-Square Test

Compare proportions for binary outcomes:

using ExperimentFramework.Science.Statistics;

// Binary data: 1.0 = success, 0.0 = failure
var control = new double[] { 1, 0, 0, 1, 0, 0, 1, 0, 0, 0 };  // 30% success
var treatment = new double[] { 1, 1, 0, 1, 1, 0, 1, 1, 0, 1 }; // 70% success

var result = ChiSquareTest.Instance.Perform(control, treatment, alpha: 0.05);

Console.WriteLine($"Chi-square: {result.TestStatistic:F3}");
Console.WriteLine($"p-value: {result.PValue:F4}");
Console.WriteLine($"Difference in proportions: {result.PointEstimate:P1}");

// Access detailed results
var details = result.Details;
Console.WriteLine($"Control rate: {details["control_proportion"]:P1}");
Console.WriteLine($"Treatment rate: {details["treatment_proportion"]:P1}");

Paired t-Test

Compare paired observations (same subjects, before/after):

using ExperimentFramework.Science.Statistics;

// Each index represents the same subject
var before = new double[] { 100, 105, 98, 102, 110 };
var after = new double[] { 95, 98, 92, 97, 103 };

var result = PairedTTest.Instance.Perform(before, after, alpha: 0.05);

Console.WriteLine($"Mean difference: {result.PointEstimate:F2}");
Console.WriteLine($"p-value: {result.PValue:F4}");

Mann-Whitney U Test

Non-parametric alternative when data isn't normally distributed:

using ExperimentFramework.Science.Statistics;

var control = new double[] { 1, 2, 3, 100, 5 };     // Contains outlier
var treatment = new double[] { 10, 12, 15, 11, 14 };

var result = MannWhitneyUTest.Instance.Perform(control, treatment, alpha: 0.05);

Console.WriteLine($"U-statistic: {result.TestStatistic:F1}");
Console.WriteLine($"p-value: {result.PValue:F4}");

One-Way ANOVA

Compare three or more groups:

using ExperimentFramework.Science.Statistics;

var groups = new Dictionary<string, IReadOnlyList<double>>
{
    ["control"] = new double[] { 10, 12, 11, 9, 10 },
    ["variant-a"] = new double[] { 14, 15, 13, 14, 15 },
    ["variant-b"] = new double[] { 18, 17, 19, 18, 20 }
};

var result = OneWayAnova.Instance.Perform(groups, alpha: 0.05);

Console.WriteLine($"F-statistic: {result.TestStatistic:F2}");
Console.WriteLine($"p-value: {result.PValue:F4}");
Console.WriteLine($"Significant difference between groups: {result.IsSignificant}");

// Access group means from details
var details = result.Details;
foreach (var (group, mean) in (Dictionary<string, double>)details["group_means"])
{
    Console.WriteLine($"  {group}: {mean:F2}");
}

Effect Size

Effect size quantifies the magnitude of differences independent of sample size.

Cohen's d

Standard effect size for continuous data:

using ExperimentFramework.Science.EffectSize;

var control = new double[] { 100, 102, 98, 101, 99 };
var treatment = new double[] { 110, 112, 108, 111, 109 };

var effect = CohensD.Instance.Calculate(control, treatment);

Console.WriteLine($"Cohen's d: {effect.Value:F2}");
Console.WriteLine($"Magnitude: {effect.Magnitude}");
Console.WriteLine($"95% CI: [{effect.ConfidenceIntervalLower:F2}, {effect.ConfidenceIntervalUpper:F2}]");

// Output:
// Cohen's d: 3.16
// Magnitude: Large
// 95% CI: [1.52, 4.80]

Effect size interpretation:

Cohen's d	Magnitude	Interpretation
< 0.2	Negligible	Trivial difference
0.2 - 0.5	Small	Minor difference
0.5 - 0.8	Medium	Moderate difference
> 0.8	Large	Substantial difference

Odds Ratio

For binary outcomes (comparing odds of success):

using ExperimentFramework.Science.EffectSize;

// Control: 20 successes out of 100
// Treatment: 35 successes out of 100
var effect = OddsRatio.Instance.Calculate(
    controlSuccesses: 20, controlTotal: 100,
    treatmentSuccesses: 35, treatmentTotal: 100);

Console.WriteLine($"Odds Ratio: {effect.Value:F2}");
Console.WriteLine($"95% CI: [{effect.ConfidenceIntervalLower:F2}, {effect.ConfidenceIntervalUpper:F2}]");

// Output:
// Odds Ratio: 2.15
// 95% CI: [1.14, 4.07]
// Interpretation: Treatment has 2.15x higher odds of success

Odds ratio interpretation:

Value	Interpretation
1.0	No difference
> 1.0	Treatment increases odds
< 1.0	Treatment decreases odds

Relative Risk

For binary outcomes (risk ratio):

using ExperimentFramework.Science.EffectSize;

var effect = RelativeRisk.Instance.Calculate(
    controlSuccesses: 20, controlTotal: 100,
    treatmentSuccesses: 35, treatmentTotal: 100);

Console.WriteLine($"Relative Risk: {effect.Value:F2}");

// Output:
// Relative Risk: 1.75
// Interpretation: Treatment has 75% higher success rate

Multiple Comparison Corrections

When running multiple tests, apply corrections to control false discovery rate.

Bonferroni Correction

Most conservative - controls family-wise error rate:

using ExperimentFramework.Science.Corrections;

var pValues = new double[] { 0.01, 0.03, 0.04 };

// Adjust p-values (multiply by number of tests)
var adjusted = BonferroniCorrection.Instance.AdjustPValues(pValues);
// [0.03, 0.09, 0.12]

// Or determine significance directly
var significant = BonferroniCorrection.Instance.DetermineSignificance(pValues, alpha: 0.05);
// [true, false, false] - only first is significant at adjusted threshold

Holm-Bonferroni (Step-Down)

Less conservative than Bonferroni, more power:

using ExperimentFramework.Science.Corrections;

var pValues = new double[] { 0.01, 0.03, 0.04 };

var adjusted = HolmBonferroniCorrection.Instance.AdjustPValues(pValues);
var significant = HolmBonferroniCorrection.Instance.DetermineSignificance(pValues, alpha: 0.05);

Benjamini-Hochberg (FDR)

Controls false discovery rate - recommended for exploratory analysis:

using ExperimentFramework.Science.Corrections;

var pValues = new double[] { 0.01, 0.03, 0.04 };

var adjusted = BenjaminiHochbergCorrection.Instance.AdjustPValues(pValues);
var significant = BenjaminiHochbergCorrection.Instance.DetermineSignificance(pValues, alpha: 0.05);

Console.WriteLine($"Correction: {BenjaminiHochbergCorrection.Instance.Name}");
Console.WriteLine($"Controls for: {BenjaminiHochbergCorrection.Instance.ControlsFor}");

Correction Comparison

Method	Controls For	Power	Use Case
Bonferroni	Family-wise error	Lowest	Critical decisions, few tests
Holm-Bonferroni	Family-wise error	Medium	Confirmatory analysis
Benjamini-Hochberg	False discovery rate	Highest	Exploratory analysis

Experiment Analyzer

Analyze complete experiments with the analyzer service:

using ExperimentFramework.Science.Analysis;

public class ExperimentService
{
    private readonly IExperimentAnalyzer _analyzer;

    public ExperimentService(IExperimentAnalyzer analyzer)
    {
        _analyzer = analyzer;
    }

    public async Task<ExperimentReport> AnalyzeCheckoutExperiment()
    {
        return await _analyzer.AnalyzeAsync("checkout-v2", new AnalysisOptions
        {
            Alpha = 0.05,
            TargetPower = 0.80,
            ApplyMultipleComparisonCorrection = true,
            CorrectionMethod = MultipleComparisonMethod.BenjaminiHochberg
        });
    }
}

Report Generation

Markdown Reports

using ExperimentFramework.Science.Reporting;

public class ReportService
{
    private readonly IExperimentReporter _reporter;

    public ReportService(IExperimentReporter reporter)
    {
        _reporter = reporter;
    }

    public async Task<string> GenerateReport(ExperimentReport report)
    {
        return await _reporter.GenerateAsync(report);
    }
}

Example output:

# Experiment Report: checkout-v2

## Summary
- **Status**: Completed
- **Duration**: 14 days
- **Total Subjects**: 10,000

## Results

### Primary Endpoint: purchase_completed

| Metric | Control | Streamlined | Difference |
|--------|---------|-------------|------------|
| Conversion Rate | 29.0% | 37.4% | +8.4pp |
| Sample Size | 5,000 | 5,000 | - |

**Statistical Test**: Chi-Square Test for Independence
- Chi-square: 76.23
- p-value: < 0.0001
- 95% CI: [6.1%, 10.7%]
- **Result**: Statistically significant

**Effect Size**:
- Odds Ratio: 1.47 [1.31, 1.65]
- Relative Risk: 1.29 [1.19, 1.40]

## Conclusion

The streamlined checkout shows a statistically significant improvement
in conversion rate compared to control (37.4% vs 29.0%, p < 0.0001).

JSON Reports

var jsonReporter = serviceProvider.GetRequiredService<JsonReporter>();
var json = await jsonReporter.GenerateAsync(report);

// Returns structured JSON for integration with dashboards

Dependency Injection

services.AddExperimentScience();

This registers:

Service	Implementation
`IStatisticalTest`	`TwoSampleTTest`
`IPairedStatisticalTest`	`PairedTTest`
`IMultiGroupStatisticalTest`	`OneWayAnova`
`IEffectSizeCalculator`	`CohensD`
`IBinaryEffectSizeCalculator`	`OddsRatio`
`IPowerAnalyzer`	`PowerAnalyzer`
`IMultipleComparisonCorrection`	`BenjaminiHochbergCorrection`
`IExperimentAnalyzer`	`ExperimentAnalyzer`
`IExperimentReporter`	`MarkdownReporter`

Best Practices

1. Define Hypotheses Before Analysis

Specify your hypothesis before looking at data:

// Pre-register hypothesis
var hypothesis = new HypothesisDefinition
{
    Name = "Checkout Optimization",
    NullHypothesis = "No difference in conversion between variants",
    AlternativeHypothesis = "Streamlined checkout improves conversion",
    Type = HypothesisType.Superiority,
    PrimaryEndpoint = new Endpoint
    {
        Name = "purchase_completed",
        OutcomeType = OutcomeType.Binary,
        HigherIsBetter = true
    },
    ExpectedEffectSize = 0.05,
    SuccessCriteria = new SuccessCriteria
    {
        Alpha = 0.05,
        Power = 0.80,
        MinimumSampleSize = 1000
    }
};

2. Check Assumptions

Verify test assumptions before interpreting results:

// For t-test: check sample size
if (controlData.Count < 30 || treatmentData.Count < 30)
{
    // Consider Mann-Whitney U instead, or verify normality
    result = MannWhitneyUTest.Instance.Perform(controlData, treatmentData);
}
else
{
    result = TwoSampleTTest.Instance.Perform(controlData, treatmentData);
}

// For chi-square: check expected frequencies
var minExpected = Math.Min(
    (double)details["expected_control_success"],
    (double)details["expected_treatment_success"]);

if (minExpected < 5)
{
    Console.WriteLine("Warning: Expected frequency < 5, consider Fisher's exact test");
}

3. Report Effect Sizes

Always report effect sizes alongside p-values:

var testResult = TwoSampleTTest.Instance.Perform(control, treatment);
var effectSize = CohensD.Instance.Calculate(control, treatment);

Console.WriteLine($"Mean difference: {testResult.PointEstimate:F2}");
Console.WriteLine($"p-value: {testResult.PValue:F4}");
Console.WriteLine($"Effect size (d): {effectSize.Value:F2} ({effectSize.Magnitude})");
Console.WriteLine($"95% CI: [{testResult.ConfidenceIntervalLower:F2}, {testResult.ConfidenceIntervalUpper:F2}]");

4. Apply Multiple Comparison Corrections

When testing multiple hypotheses:

var pValues = results.Select(r => r.PValue).ToArray();
var correctedSignificance = BenjaminiHochbergCorrection.Instance
    .DetermineSignificance(pValues, alpha: 0.05);

for (int i = 0; i < results.Count; i++)
{
    Console.WriteLine($"{results[i].TestName}: " +
        $"p={pValues[i]:F4}, significant={correctedSignificance[i]}");
}

5. Use Appropriate Sample Sizes

See Power Analysis for calculating required sample sizes.

Common Pitfalls

Peeking at Results

Don't stop an experiment early when you see significance:

// Bad - stopping early inflates false positive rate
if (result.IsSignificant)
{
    StopExperiment(); // DON'T DO THIS
}

// Good - run to predetermined sample size
if (currentSamples >= requiredSampleSize)
{
    var result = AnalyzeExperiment();
}

P-Hacking

Don't test multiple metrics until you find significance:

// Bad - testing many metrics inflates false positives
foreach (var metric in allMetrics)
{
    var result = Test(metric);
    if (result.IsSignificant)
    {
        Report(result); // Cherry-picking
    }
}

// Good - pre-specify primary endpoint, correct for multiple tests
var primaryResult = Test(primaryMetric);
var secondaryPValues = secondaryMetrics.Select(m => Test(m).PValue).ToArray();
var corrected = BenjaminiHochbergCorrection.Instance.DetermineSignificance(secondaryPValues, 0.05);

Ignoring Effect Size

A significant result doesn't mean a meaningful difference:

// Statistically significant but practically meaningless
// p = 0.01, but effect size = 0.05 (negligible)
if (result.IsSignificant && effectSize.Magnitude == EffectSizeMagnitude.Negligible)
{
    Console.WriteLine("Warning: Statistically significant but trivial effect");
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical Analysis

Overview

Installation

Quick Start

Statistical Tests

Choosing the Right Test

Two-Sample t-Test (Welch's)

One-Sided Tests

Chi-Square Test

Paired t-Test

Mann-Whitney U Test

One-Way ANOVA

Effect Size

Cohen's d

Odds Ratio

Relative Risk

Multiple Comparison Corrections

Bonferroni Correction

Holm-Bonferroni (Step-Down)

Benjamini-Hochberg (FDR)

Correction Comparison

Experiment Analyzer

Report Generation

Markdown Reports

JSON Reports

Dependency Injection

Best Practices

1. Define Hypotheses Before Analysis

2. Check Assumptions

3. Report Effect Sizes

4. Apply Multiple Comparison Corrections

5. Use Appropriate Sample Sizes

Common Pitfalls

Peeking at Results

P-Hacking

Ignoring Effect Size

See Also

FilesExpand file tree

statistical-analysis.md

Latest commit

History

statistical-analysis.md

File metadata and controls

Statistical Analysis

Overview

Installation

Quick Start

Statistical Tests

Choosing the Right Test

Two-Sample t-Test (Welch's)

One-Sided Tests

Chi-Square Test

Paired t-Test

Mann-Whitney U Test

One-Way ANOVA

Effect Size

Cohen's d

Odds Ratio

Relative Risk

Multiple Comparison Corrections

Bonferroni Correction

Holm-Bonferroni (Step-Down)

Benjamini-Hochberg (FDR)

Correction Comparison

Experiment Analyzer

Report Generation

Markdown Reports

JSON Reports

Dependency Injection

Best Practices

1. Define Hypotheses Before Analysis

2. Check Assumptions

3. Report Effect Sizes

4. Apply Multiple Comparison Corrections

5. Use Appropriate Sample Sizes

Common Pitfalls

Peeking at Results

P-Hacking

Ignoring Effect Size

See Also