CallMeter Docs

Test Run Statuses

Complete reference for test run lifecycle statuses, transitions, failure causes, and diagnostic steps for each state in CallMeter.

Every test run in CallMeter progresses through a defined set of statuses from creation to completion. Understanding these statuses helps you monitor test execution, diagnose failures, and build automation around test run lifecycle events.

Test run statuses are visible on the test run detail page, in the test list view, and are returned by the API. Status transitions happen automatically as the platform processes each phase of test execution.

Status Reference

PENDING

The test run has been created but has not yet entered the execution queue. The platform is performing initial validation and preparing the run for execution.

  • Terminal: No
  • Triggered by: User clicks "Run" on a test, or the API receives a POST /tests/:id/run request
  • What happens during this state:
    • The platform validates the test configuration (registrars exist, SIP accounts are available, media files are present)
    • Endpoint allocation is calculated based on group sizes and worker availability
    • Billing eligibility is checked (sufficient credits or plan allowance)
  • What happens next: If validation passes, the run transitions to QUEUED. If validation fails (no workers available, billing limit reached), the run transitions to CANNOT_RUN_FOR_NOW.
  • Duration: Under 2 seconds in normal conditions. Longer durations may indicate platform queue congestion.

PENDING and Validation

The PENDING state performs pre-flight checks before committing resources. If your test run moves to CANNOT_RUN_FOR_NOW immediately after PENDING, the issue is typically worker availability or billing limits rather than a configuration error.

QUEUED

The test run has passed validation and is in the execution queue waiting for worker capacity to be allocated. The platform is assigning endpoints to available workers.

  • Terminal: No
  • Triggered by: Successful validation during the PENDING phase
  • What happens during this state:
    • The platform identifies online workers that match the test's region or worker selection criteria
    • Endpoints are distributed across available workers based on capacity
    • Workers receive their endpoint assignments and prepare to execute
  • What happens next: Once workers confirm they are ready, the run transitions to RUNNING. If workers cannot be allocated (all busy, disconnected, or insufficient capacity), the run transitions to CANNOT_RUN_FOR_NOW.
  • Duration: Typically 1 to 5 seconds. Longer durations indicate that workers are busy with other tests. If you have user-owned workers, verify they are connected and in ONLINE status.

RUNNING

The test run is actively executing. Workers are registering SIP endpoints, placing calls, exchanging media, and reporting real-time metrics to the platform.

  • Terminal: No
  • Triggered by: Workers confirm readiness and begin executing their assigned endpoints
  • What happens during this state:
    • Endpoints progress through their phase lifecycle: INITIALIZING, REGISTERED, CALLING, RINGING, NEGOTIATING, INCALL, DISCONNECTING, CLOSED (see Endpoint Statuses)
    • Real-time metrics flow from workers to the platform at one-second intervals
    • The test run detail page updates in real time with endpoint phase counts and live metric charts
    • The buildup timer staggers endpoint registration and call initiation to avoid SIP burst load
  • What happens next: When all endpoints complete their lifecycle (reach CLOSED phase), the run transitions to COMPLETED. If a critical error occurs (all workers disconnect, unrecoverable infrastructure failure), the run transitions to FAILED.
  • Duration: Matches the configured test duration plus registration buildup time and teardown time. A test configured for 60 seconds of call duration with 10 seconds of buildup will have a RUNNING phase of approximately 70 to 80 seconds total.

Real-Time Monitoring

During the RUNNING state, the test run detail page provides a live dashboard showing endpoint phase distribution, aggregate metrics, and per-endpoint drill-down. Use this to identify issues in real time rather than waiting for the test to complete.

COMPLETED

The test run has finished successfully. All endpoints have completed their lifecycle and reached CLOSED phase. Metrics and SIP traces are fully available for analysis.

  • Terminal: Yes
  • Triggered by: All endpoints reach CLOSED phase (whether through successful call completion or individual endpoint failures)
  • What is available:
    • Full metric time series for every endpoint
    • SIP message traces for every endpoint
    • Aggregate statistics (ASR, NER, average MOS, etc.)
    • Endpoint phase and outcome distribution summary
    • Call timing metrics (PDD, setup time, ring time)
  • Important: COMPLETED means the test run finished, not that all calls succeeded. A test where 80% of endpoints have a REGISTRATION_FAILED outcome will still show COMPLETED status. Check the endpoint outcome distribution and quality metrics to assess the actual result.

COMPLETED Does Not Mean All Calls Succeeded

A COMPLETED test run means the execution lifecycle finished normally. Individual endpoints may have outcomes like REGISTRATION_FAILED, NEGOTIATION_FAILED, or CALL_FAILED. Always review the endpoint outcome distribution and quality metrics after a COMPLETED run. See Analyzing Results.

FAILED

The test run encountered a critical error that prevented it from completing normally. This is distinct from individual endpoint failures (which result in a COMPLETED run with failed endpoints).

  • Terminal: Yes
  • Triggered by:
    • All assigned workers disconnected during execution
    • An unrecoverable platform error occurred
    • The test run exceeded the maximum execution time safety limit
  • What is available: Partial metrics and SIP traces for endpoints that were active before the failure. The failure reason is displayed on the test run detail page.
  • Common causes:
CauseIndicatorResolution
Worker disconnectAll endpoints suddenly CLOSED, partial resultsCheck worker connectivity, container health, resource limits. See Worker Statuses.
Network interruptionWorkers went offline mid-testVerify network path between workers and the platform gateway
Worker resource exhaustionWorker container OOM-killedIncrease worker container memory limits or reduce endpoints per worker
Platform timeoutTest exceeded maximum allowed execution timeReview test configuration for reasonable duration values

Troubleshooting FAILED runs:

  1. Check partial results. Even though the run failed, endpoints that were active before the failure have metrics and SIP traces. These partial results often reveal what went wrong.
  2. Review the failure reason. The test run detail page displays the specific failure reason. Common messages include "All workers disconnected," "Execution timeout exceeded," and "Unrecoverable platform error."
  3. Inspect worker status. Navigate to the Workers page and check whether the workers assigned to this run are still online. If they went offline during the run, the worker logs and connection history may reveal the cause (resource exhaustion, network disruption, container restart).
  4. Compare with previous runs. If the same test configuration succeeded previously, diff the conditions: worker availability, concurrent load, network path changes.
  5. Check endpoint distribution. If the failure occurred mid-run, look at which endpoints completed and which were interrupted. A pattern (e.g., all endpoints on a specific worker failed) narrows the root cause.

CANNOT_RUN_FOR_NOW

The platform determined that the test cannot be executed at this time due to resource or billing constraints. No endpoints were created and no SIP traffic was generated.

  • Terminal: Yes
  • Triggered by:
    • No online workers available in the selected region or worker pool
    • All available workers are fully allocated to other running tests
    • The test's endpoint count exceeds the plan's concurrent endpoint limit
    • The organization's billing credits are exhausted or the subscription is inactive
    • Worker capacity is insufficient for the requested endpoint count
  • Resolution:
CauseResolution
No online workersCheck the Workers page. Ensure cloud or user-owned workers are connected and in ONLINE status. See Worker Statuses.
Workers busyWait for currently running tests to complete, then retry
Plan limit exceededReduce the endpoint count in your test, or upgrade your plan. See Plans and Pricing.
Billing limit reachedAdd credits to your account, or wait for the next billing cycle. See Credits and Overages.
Insufficient worker capacityAdd more workers or reduce test size. Consider deploying user-owned workers for additional capacity. See Deploying Your Own Workers.

Troubleshooting CANNOT_RUN_FOR_NOW:

  1. Check worker availability first. This is the most common cause. Go to the Workers page and verify at least one worker is ONLINE in the required region.
  2. Check for running tests. Workers that are busy executing other test runs cannot accept new work. Wait for active runs to finish or add more workers.
  3. Review billing status. If your organization has no active subscription, no remaining credits, or has exceeded the plan's concurrent endpoint limit, runs are blocked. The platform displays the specific billing reason on the test run detail page.
  4. Reduce test size. If your test requests more endpoints than your available workers can handle, reduce the endpoint count per group or add more workers.
  5. Retry timing. CANNOT_RUN_FOR_NOW is not a permanent failure -- it indicates a temporary resource constraint. Once the constraint is resolved (workers free up, credits added, plan upgraded), re-running the test will succeed.

Status Transitions

The test run status transitions follow a strict forward-only path. No backward transitions are possible -- a run never returns to PENDING or QUEUED once it has moved forward.

PENDING ──> QUEUED ──> RUNNING ──> COMPLETED
   |           |          |
   |           |          └──> FAILED
   |           |
   |           └──> CANNOT_RUN_FOR_NOW
   |
   └──> CANNOT_RUN_FOR_NOW
PathTransition SequenceTrigger
Happy pathPENDING > QUEUED > RUNNING > COMPLETEDAll endpoints complete lifecycle
Execution failurePENDING > QUEUED > RUNNING > FAILEDCritical error during execution
Resource blocked earlyPENDING > CANNOT_RUN_FOR_NOWValidation fails (no workers, billing)
Resource blocked latePENDING > QUEUED > CANNOT_RUN_FOR_NOWWorkers unavailable after queuing

COMPLETED, FAILED, and CANNOT_RUN_FOR_NOW are terminal statuses. Once a run reaches any of these, its status never changes again.

Status Timeline

The test run detail page includes a status timeline showing when each transition occurred. This is useful for diagnosing issues:

  • Long PENDING to QUEUED: Platform validation taking longer than expected. Check for configuration issues.
  • Long QUEUED to RUNNING: Workers are slow to accept assignments. May indicate worker overload or network latency between workers and the platform.
  • Short RUNNING: If the RUNNING duration is much shorter than the configured test duration, endpoints may be failing early. Check the endpoint outcome distribution.
  • Expected RUNNING duration: Approximately equal to buildup_time + call_duration + teardown_time.

API Integration

When automating tests via the API, use the test run status to determine when to fetch results:

  1. Start a run: POST /tests/:id/run
  2. Poll the run status: GET /runs/:id and check the status field
  3. When status is COMPLETED or FAILED, fetch results: GET /runs/:id/metrics

For webhook-based workflows, configure webhooks to receive notifications on status transitions rather than polling.

Retry Behavior and Recovery

Manual Retries

Test runs do not automatically retry on failure. When a run ends in FAILED or CANNOT_RUN_FOR_NOW, you must manually start a new run after resolving the underlying issue. Each retry creates a new, independent test run with its own ID, metrics, and traces.

To retry from the UI, navigate to the test and click "Run" again. Via the API, submit a new POST /tests/:id/run request. The new run goes through the full lifecycle starting from PENDING.

Probe Automatic Recovery

Probes differ from ad-hoc tests in that they run on a schedule (every 5, 15, 30, or 60 minutes). If a probe execution ends in FAILED or CANNOT_RUN_FOR_NOW, the probe does not stop its schedule. The next scheduled execution will attempt to run normally. This provides automatic recovery: if a transient issue (worker temporarily offline, brief network interruption) caused a probe failure, the next scheduled execution will succeed without manual intervention.

The probe's health state reflects the most recent completed execution. A single FAILED execution sets the probe to UNKNOWN until the next successful execution re-evaluates the health thresholds.

Preventing Repeated Failures

If the same test or probe fails repeatedly, the pattern of failure statuses helps identify systemic issues:

PatternLikely CauseResolution
Repeated CANNOT_RUN_FOR_NOWPersistent worker shortage or billing issueAdd workers, upgrade plan, or add credits
Repeated FAILED with worker disconnectWorker infrastructure instabilityCheck worker container health, resource limits, and network stability
Alternating COMPLETED and FAILEDIntermittent issue (network, worker load)Investigate timing correlation -- does the failure occur at specific times or load levels?
COMPLETED but with many error outcomesSIP infrastructure issue, not platform issueFocus on endpoint-level diagnostics rather than run-level. See Endpoint Statuses.

Probes and Run Statuses

Probe executions follow the same status lifecycle as test runs (PENDING, QUEUED, RUNNING, COMPLETED, FAILED). Each probe execution produces a run that can be inspected individually. The probe's health state (HEALTHY, DEGRADED, UNHEALTHY, UNKNOWN) is determined by evaluating the completed run's metrics against configured thresholds. See Probe Health States.

On this page