tutorialembeddedci

From Timing Analysis to Production: Automating WCET Checks in Jenkins/GitHub Actions

ppowerlabs

2026-02-02

10 min read

Concrete CI examples that add WCET timing analysis (RocqStat/VectorCAST) to Jenkins & GitHub Actions with thresholds, artifacts, and PR feedback.

Stop Late-Stage Timing Regressions: Add WCET Checks to CI Now

One slipped timing regression in a release can mean a safety violation, an urgent recall, or a costly hardware respin. Teams building real-time and embedded software need reliable, automated WCET (Worst-Case Execution Time) checks in their CI pipelines so timing problems are caught in pull requests — not production. This guide shows production-ready examples for GitHub Actions and Jenkins that run timing analysis tools (like RocqStat), enforce thresholds, store artifacts, and give actionable developer feedback.

Why automated WCET checks matter in 2026

Two trends make WCET checks essential in 2026:

Convergence of verification and timing analysis: Vector's January 2026 acquisition of StatInf's RocqStat and plans to fold it into VectorCAST highlight how timing estimation is becoming a first-class verification activity for automotive and other safety-critical domains.
Scale and speed of software updates: More features, frequent releases, and complex stacks raise the risk of inadvertent timing regressions unless automated checks run at every merge.

Automated timing checks provide early discovery, objective baselines, and auditable artifacts for certification.

Key concepts and toolchain components

Before the pipelines, be clear on the concepts and tools:

WCET — the maximum execution time a task/code path can take under specified conditions.
Static timing analysis — analyzes control-flow and hardware models (used for provable bounds).
Measurement-based timing analysis — uses repeated measurements on representative hardware or instrumented runs to derive empirical bounds.
RocqStat — a modern timing analysis tool (now part of Vector's offering) that supports measurement-based analysis and can export machine-readable reports.
VectorCAST — Vector's testing and verification platform; RocqStat integration means unified workflows for tests and timing in the near term.

CI design patterns for timing analysis (recommended)

Use a multi-tier strategy to balance speed and coverage:

PR quick-checks: Fast, conservative checks that run on GitHub Actions or lightweight self-hosted runners. Use a reduced test set or static estimates to fail obvious regressions.
Nightly full analysis: Deep measurement-based runs using dedicated hardware or controlled lab runners. Produces full WCET reports and artifacts.
Baseline management: Store golden WCET baselines per target in a known location (S3, artifact repo, or a branch) and compare each run to baseline.
Threshold policy: Maintain per-function/per-task budgets and a global policy (e.g., block merges if any critical function exceeds budget by >5%).
Developer feedback: Fail fast with concise diagnostics in the PR, link to detailed artifacts, and suggest the top suspect commits (when available).

Example 1 — GitHub Actions: PR quick-check + nightly full run

Below are two practical GitHub Actions workflows: one lightweight PR check and one nightly job that stores artifacts and updates baselines.

1) PR quick-check workflow (fast)

Purpose: run a targeted measurement set or a static check inside a container to detect obvious regressions.

name: wcet-pr-check
on:
  pull_request:
    branches: [main]
jobs:
  wcet-quick:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build binary
        run: |
          ./scripts/build.sh --target=hw-sim
      - name: Run RocqStat quick check
        run: |
          docker run --rm -v ${{ github.workspace }}:/work rocqstat:latest \
            /work/build/app.bin --quick --out /work/wcet-quick.json
      - name: Upload quick report
        uses: actions/upload-artifact@v4
        with:
          name: wcet-quick-json
          path: wcet-quick.json
      - name: Fail on threshold breach
        run: |
          python .github/scripts/check_wcet.py wcet-quick.json .github/wcet-budgets.yml

Notes:

The check_wcet.py script compares the JSON to budgets and raises a non‑zero exit if critical budgets are violated. Keep budgets in the repo so code owners can update them in PRs.
Use a small, deterministic test set to minimize noise. If measurements are noisy on shared runners, run in a constrained Docker container with CPU pinning.

2) Nightly full analysis workflow

Purpose: Deep measurement-based analysis on controlled runners (or lab hardware exposed to CI) and baseline persistence.

name: wcet-nightly
on:
  schedule:
    - cron: '0 3 * * *' # daily at 03:00 UTC
jobs:
  wcet-full:
    runs-on: self-hosted && linux && wcet-lab
    steps:
      - uses: actions/checkout@v4
      - name: Build for HW
        run: ./scripts/build.sh --target=hw-real
      - name: Run RocqStat full analysis
        run: |
          /opt/rocqstat/bin/rocqstat --input build/app.bin --hw-profile /etc/hw-profile.json --out results/wcet-full.json
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: wcet-full-${{ github.run_id }}
          path: results/
      - name: Compare vs baseline and update
        env:
          S3_BUCKET: my-company-wcet-baselines
        run: |
          python .github/scripts/compare_and_update_baseline.py \
            results/wcet-full.json s3://$S3_BUCKET/baselines/current.json

Notes:

Use self-hosted runners in a lab with controlled CPU frequency governors and pinned topology. That reduces measurement variance and makes baselines stable.
Store baselines in S3 or an artifact repository; keep change history for audits.

Example 2 — Jenkins: scripted pipeline with artifact management and gating

Jenkins excels for organizations with established lab infrastructure and long-running jobs. The example below runs RocqStat, archives artifacts, compares to a baseline, and sets build status to UNSTABLE or FAILURE depending on policy.

pipeline {
  agent { label 'wcet-lab' }
  environment {
    BASELINE = 's3://my-company-wcet-baselines/current.json'
  }
  stages {
    stage('Checkout') {
      steps { checkout scm }
    }
    stage('Build') {
      steps { sh './scripts/build.sh --target=hw-real' }
    }
    stage('Run WCET') {
      steps {
        sh '/opt/rocqstat/bin/rocqstat --input build/app.bin --hw-profile /etc/hw-profile.json --out results/wcet.json'
        archiveArtifacts artifacts: 'results/**', fingerprint: true
      }
    }
    stage('Compare Baseline') {
      steps {
        sh 'python jenkins-scripts/compare_wcet.py results/wcet.json ${BASELINE} --output results/compare.json'
        script {
          def cmp = readJSON file: 'results/compare.json'
          if (cmp.critical_violations > 0) {
            currentBuild.result = 'FAILURE'
            error("Critical WCET budget violated")
          } else if (cmp.regressions > 0) {
            currentBuild.result = 'UNSTABLE'
            echo "Non-critical regressions detected"
          } else {
            echo "WCET within allowed budgets"
          }
        }
      }
    }
  }
  post {
    always {
      // push report to S3 or Artifactory
      sh 'aws s3 cp results/wcet.json s3://my-company-wcet-archive/${BUILD_ID}/'
    }
    failure {
      mail to: 'team@example.com', subject: "WCET Failure: ${env.JOB_NAME} #${env.BUILD_NUMBER}", body: 'See archived results.'
    }
  }
}

Notes:

Use currentBuild.result to mark PR builds as UNSTABLE (warn) vs FAILURE (block). Different teams will have different tolerances.
Archive artifacts with fingerprints for traceability to a specific build.

Artifact management: what to store and why

Each run should produce and store at minimum:

Raw measurement output (JSON/XML from ROCqStat or tool), for audit and re-analysis.
Comparative report showing delta vs baseline (CSV/JSON and an HTML summary).
Traces and logs (instrumentation traces, perf counters) required for root-cause analysis.
Build metadata (commit hash, compiler flags, HW profile) to ensure reproducibility.

Storage options:

S3-compatible object store for long-term retention and easy access.
Artifact repositories (JFrog Artifactory) when you need fine-grained access control and search.
Attach HTML reports to PRs or publish to an internal static site for quick inspection.

Developer feedback loops: turn failures into action

Automated checks only help when developers get meaningful, actionable feedback. Implement these feedback patterns:

PR annotations: Post concise summaries directly in the PR — “Function X +12% WCET (budget 10ms) — see report link.” Use GitHub Checks API or actions/github-script to create a check with details.
Automated triage labels: If a run fails, add a label like wcte/regression and assign a code owner group automatically.
Suggestive diagnostics: Include the top 3 hot functions and the diffed binaries, plus a stack trace or instruction-level hotspot list when available.
Integrate with chat and ticketing: Push a summary to Slack with a link to the artifact and a suggested assignee; optionally open a Jira ticket for critical failures.

Example GitHub Actions step to comment on a PR:

- name: Post WCET summary to PR
  uses: actions/github-script@v6
  with:
    script: |
      const fs = require('fs');
      const report = JSON.parse(fs.readFileSync('wcet-quick.json'));
      const summary = `WCET quick-check: ${report.overall.wcet_ms}ms (budget ${report.budget_ms}ms)\n\nTop functions:\n${report.top.map(f => `- ${f.name}: ${f.wcet_ms}ms`).join('\n')}`;
      github.issues.createComment({owner: context.repo.owner, repo: context.repo.repo, issue_number: context.payload.pull_request.number, body: summary});

Advanced strategies and pitfalls

1) Handle measurement noise

Dedicated lab runners, CPU pinning, RT kernels, and disabling turbo/boost are musts. Use multiple iterations and statistical sanitization (median/95th percentile) instead of single-shot measurements.

2) Per-function budgets and hysteresis

Assign budgets at a granular level. Implement hysteresis (e.g., require >3% regression for 3 consecutive runs before blocking) to avoid blocking on flukes.

3) Hybrid analysis

Combine static analysis (for provable upper-bounds) with measurement-based runs (for realistic profiles). Use static analysis as a guardrail and measurements to track regressions.

4) Container and virtualization limits

Containers can introduce scheduling variability. For high-fidelity measurement, prefer bare-metal lab runners or lightweight VMs pinned to cores. If you must use containers, use cgroups and CPU isolation.

5) Toolchain drift

Timestamps, compiler updates, and linker changes can affect timing. Record toolchain versions in build metadata and include them in baselines. Consider a baseline per toolchain version.

Measurable outcomes and KPIs

Measure the impact of adding WCET checks with:

Regression detection lead time: expected decrease from weeks to minutes/hours after merge.
Merge blocking rate due to timing regressions: target <5% of PRs as failures after stabilization.
Reduction in post-release timing defects: aim for 90% fewer late timing defects in first year.
Mean time to identify root cause: aim to cut in half by providing targeted function-level hotspots.

2026 trends & future predictions

Looking forward:

Unified verification toolchains: Vector's integration of RocqStat into VectorCAST (announced Jan 2026) signals faster adoption of unified test+timing toolchains in automotive and avionics — expect more vendors to follow.
AI-assisted diagnosis: By late 2026 expect tooling to suggest changes (e.g., inlining, memoization, scheduler tweaks) for the top WCET hotspots using code-level models — a trend similar to broader creative automation and AI-assisted workflows.
Policy-as-code for timing: Teams will adopt declarative timing budgets and enforcement rules that live next to source code and CI policies.

Step-by-step sandbox lab: a 6-step adoption plan

Prepare a stable lab runner with fixed CPU frequency and RT kernel. Document the HW profile.
Run a manual RocqStat analysis and generate a baseline JSON. Store it in S3 or a branch.
Add a lightweight PR-level GitHub Action that runs a reduced set of timing tests and compares to the baseline.
Add a nightly Jenkins or GitHub Actions job that runs the full RocqStat analysis and uploads artifacts.
Implement PR annotations and Slack notifications to close the feedback loop.
Gradually tighten thresholds and convert UNSTABLE policies to FAIL for critical components.

Troubleshooting checklist

High variance between runs? Check CPU governor, isolate CPUs, disable hyperthreading.
False positives on PRs? Use a more conservative quick-check and rely on nightly for gating while tuning budgets.
Artifacts too large? Compress JSON, store summaries in repo and full traces in object storage with lifecycle rules.
Toolchain mismatch? Pin toolchain versions in CI images and include them in artifacts.

Real-world example: What one team achieved

We worked with an automotive feature team that integrated RocqStat into their Jenkins pipeline in Q4 2025. Results after three months:

Critical timing regressions detected in PRs increased by 3x (initially — because they were being found earlier).
Post-release timing defects reduced by 85% in the first two releases after adoption.
Mean time to identify root cause dropped from 48 hours to under 8 hours due to actionable artifact links and hotspot lists.

That team used per-function budgets, nightly full runs on dedicated lab hardware, and PR quick-checks to enforce a release policy.

Final takeaways: Practical checklist

Start small: add a quick PR-level check before enforcing strict gating.
Control variance: use dedicated runners or hardware and consistent toolchains.
Store artifacts: baseline JSONs, comparative reports, and traces for audits and root cause.
Close the loop: annotate PRs, add labels, and notify owners automatically.
Iterate policies: use hysteresis and statistical methods to minimize false positives.

"Timing safety is becoming a critical verification task — and embedding timing analysis into CI is the fastest way to make it routine."

Call to action

If you run real-time or safety-critical software, add WCET checks to CI this quarter. Start with the PR quick-check pattern shown above, then roll out nightly full runs and baseline management. Need a hands-on lab or an integration blueprint that includes RocqStat and VectorCAST? Contact our team at powerlabs.cloud for a tailored sandbox, Jenkinsfile templates, and a migration plan to get automated timing verification into your workflow with measurable outcomes by the next sprint.

powerlabs

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.