Common Test Failures

Step-by-step diagnostic guide for the most frequent causes of failed test runs in CallMeter, including registration failures, call setup errors, media issues, and worker problems.

When a test run ends with a FAILED or CANNOT_RUN_FOR_NOW status, or when a significant number of endpoints fail within a COMPLETED run, use this guide to diagnose the root cause and resolve the issue. This page covers the most common failure patterns, organized by the phase in which they occur.

Decision Tree: Where to Start

Before diving into specific failure types, identify the failure phase:

Did the test run start at all? If the status is CANNOT_RUN_FOR_NOW, the issue is resource allocation. Go to Test Stuck in CANNOT_RUN_FOR_NOW.
Did the run fail immediately after starting? If the status went to FAILED within seconds, the issue is likely worker connectivity. Go to Worker Disconnection During Test.
Did endpoints fail to register? Check the endpoint outcome summary. If most endpoints have a REGISTRATION_FAILED outcome, go to Registration Failures.
Did calls fail to connect? If most endpoints registered but ended with CALL_FAILED, TIMEOUT, or NEGOTIATION_FAILED outcomes, go to Call Setup Failures.
Did calls connect but metrics show problems? If endpoints reached the INCALL phase but quality is poor, go to Poor Quality Metrics instead.

All Endpoints Fail vs Some Endpoints Fail

The scope of the failure is the single most useful diagnostic signal. Before reading any SIP traces, answer this question first.

All Endpoints Failed

When every endpoint in the test fails, the problem is almost always systemic:

Network or firewall: The worker cannot reach the registrar at all. No SIP messages arrive. Check connectivity from the worker's network to the registrar's IP and port.
Credential error: All endpoints share the same SIP account pool, and every account has incorrect credentials. Verify with a single-endpoint test first.
Wrong transport: The test is configured for UDP but the registrar only accepts TLS (or vice versa). The registrar never sees the SIP traffic.
DNS failure: The registrar hostname does not resolve from the worker's network. The worker cannot determine where to send REGISTER requests.
Worker-level failure: The worker itself crashed or ran out of resources before any endpoint could complete. Check the worker status on the Workers page.

Some Endpoints Failed

When a subset of endpoints fails while others succeed, the problem is typically capacity or configuration-related:

Rate limiting: The registrar throttles registration requests above a certain rate. The first endpoints succeed, later ones receive 408 or 503. Increase the buildup time.
SIP account shortage: The test has more endpoints than SIP accounts. Endpoints without assigned accounts fail immediately. Check the account pool size.
Per-group issue: One group's registrar is misconfigured while other groups succeed. Filter endpoint results by group to isolate the problem.
Worker capacity: One worker is overloaded while others are fine. Compare failure rates across workers on the test run detail page.
Intermittent network: A flaky network path causes random timeouts. Failures are scattered across endpoints with no clear pattern.

Test Stuck in CANNOT_RUN_FOR_NOW

The platform could not allocate workers to execute the test. No SIP traffic was generated.

Symptom

The test run shows status CANNOT_RUN_FOR_NOW immediately after you click Run.

Possible Causes and Resolutions

No online workers available

Navigate to the Workers page in your project
Check that at least one worker shows ONLINE status
If all workers are OFFLINE, verify that your cloud workers are active or that your user-owned workers are running
If workers show ERROR status, see Worker Statuses for troubleshooting

Workers are busy with other tests

Check if another test run is currently RUNNING
Wait for the current test to complete, then retry
If you frequently need concurrent tests, add more workers or deploy additional user-owned workers

Endpoint count exceeds plan limit

Check your plan's concurrent endpoint limit in Settings and then Billing
Reduce the total endpoint count in your test configuration
Upgrade your plan for higher limits. See Plans and Pricing.

Billing credits exhausted

Check your credit balance in Settings and then Billing
Add credits or wait for the next billing cycle
See Credits and Overages for details

Region mismatch

Verify that your test groups specify a region where workers are available
If you are using region-based allocation, check that ONLINE workers exist in the selected region
Consider switching to a region with available capacity or deploying a user-owned worker in the target region

Registration Failures

A large number of endpoints have a REGISTRATION_FAILED outcome. They never reached the call phase.

Symptom

The test run completed (or failed), and the endpoint outcome summary shows a high percentage of endpoints with REGISTRATION_FAILED outcome (phase: CLOSED).

Step-by-Step Diagnosis

Step 1: Check the SIP response code

Open a few failed endpoints and examine the SIP message trace. Look for the REGISTER request and the server's response. The response code tells you what went wrong:

Response Code	Meaning	Go To
401 Unauthorized	Credentials rejected	SIP Registration Errors - 401
403 Forbidden	Access denied	SIP Registration Errors - 403
404 Not Found	Domain or user not found	SIP Registration Errors - 404
408 Request Timeout	Server overloaded or unreachable	Step 2 below
No response	Network issue	Step 3 below

Step 2: Check for server overload (408)

If endpoints are receiving 408 responses, the registrar may be overwhelmed by the burst of registration requests:

Increase the buildup time in your test configuration. This staggers registrations over a longer period.
Reduce the number of concurrent endpoints in the test
Verify the SIP registrar's capacity for concurrent registrations
Check if the registrar has rate limiting enabled. If so, adjust the buildup time to stay within the rate limit.

Step 3: Check for network issues (no response)

If endpoints show no response to the REGISTER request:

DNS resolution: Verify the registrar domain resolves correctly from the worker's network. An incorrect domain or DNS misconfiguration will prevent the worker from reaching the registrar.
Firewall rules: Ensure the worker can reach the registrar on the configured transport port (UDP 5060, TCP 5060, or TLS 5061). For user-owned workers, check both the worker host's firewall and any network firewalls between the worker and the registrar.
Transport protocol: Verify the test's transport protocol matches the registrar's configuration. If the registrar only accepts TLS connections, a test configured for UDP will fail silently.
Registrar availability: Confirm the SIP registrar is operational. Try a manual SIP OPTIONS ping from the worker's network if possible.

Step 4: Check SIP account availability

Verify that the registrar has enough SIP accounts configured for the number of endpoints in your test
SIP account pooling assigns accounts to endpoints automatically. If there are fewer accounts than endpoints, some endpoints will not have credentials.
Check that SIP accounts are not locked or disabled on the registrar side

Registration Burst Load

A common cause of mass registration failure is SIP burst load. When a test with 500 endpoints starts, all 500 may attempt to REGISTER within a very short window. Many SIP registrars and SBCs have rate limits that will reject or timeout these bursts. Always configure an appropriate buildup time (start with 10 to 30 seconds) and increase it if you see 408 or 503 responses during registration.

Call Setup Failures

Endpoints registered successfully but calls did not connect. Endpoints ended with CALL_FAILED, TIMEOUT, or NEGOTIATION_FAILED outcomes.

CALL_FAILED / TIMEOUT (Caller Side)

The caller sent an INVITE but never received a 200 OK, or the operation timed out.

Step 1: Check the SIP response code

Response Code	Meaning	Resolution
404 Not Found	Callee address unknown	Verify the callee's SIP URI is correct. In cross-group tests, ensure the callee's registrar knows the dialed address.
480 Temporarily Unavailable	Callee offline	Check that callee endpoints registered successfully before callers began dialing. Increase buildup time.
486 Busy Here	Callee already in a call	The callee is busy. In bidirectional tests, ensure the endpoint pairing does not result in double-booking.
488 Not Acceptable	Codec mismatch	Review codec configuration. See Supported Codecs.
No response	Routing issue	Check SIP routing between caller and callee registrars. Verify that the INVITE is being routed to the correct destination.

Step 2: Check cross-group targeting

In bidirectional tests with multiple groups, verify that caller groups are correctly targeting callee groups. A misconfigured targeting rule will result in callers dialing addresses that do not match any callee.

Step 3: Check timing

If callees register after callers start dialing, callers will reach an unregistered address. Ensure the buildup configuration allows callee endpoints to complete registration before callers begin the call phase.

TIMEOUT (Callee Side)

The callee registered and waited but no incoming INVITE arrived within the timeout period.

Check that the caller group targeting this callee group is functional. If callers have REGISTRATION_FAILED outcomes, they cannot dial.
Check that the SIP routing between the caller's registrar and the callee's registrar is correct.
Verify that the callee's wait timeout is long enough for callers to complete their buildup and dial.

NEGOTIATION_FAILED

The SDP offer/answer exchange failed. The endpoints could not agree on media parameters.

Open a failed endpoint and check the SDP offer in the INVITE and the SDP answer in the response
Compare the offered codecs with the answerer's supported codecs
Ensure at least one common audio codec exists between the two sides
If the target system only supports specific codecs, configure your test to offer those codecs
Check for SDP manipulation by intermediate proxies (SBCs, B2BUAs) that may strip codecs from the offer

Worker Disconnection During Test

The test run shows FAILED, and endpoints show a sudden transition to CLOSED phase with OSERROR outcome.

Symptom

All or most endpoints on a specific worker abruptly moved to CLOSED phase. The test run may have continued on other workers or failed entirely if only one worker was assigned.

Diagnosis

Check worker status: Navigate to the Workers page. Is the worker now showing OFFLINE or ERROR status?
Resource exhaustion: The worker container may have been killed by the operating system (OOM). Check the host system's kernel logs for OOM kill events.
Network failure: The worker's network connection to the platform gateway may have dropped. Check network connectivity from the worker's host.
Container restart: If the container orchestration system restarted the worker, it will reconnect as a new session. Check container lifecycle logs.

Resolution

For OOM issues: Increase the worker container's memory limit or reduce the endpoint count per worker
For network issues: Ensure stable connectivity between the worker and the CallMeter gateway
For reliability: Deploy multiple workers so that a single worker failure does not fail the entire test run

Media Failures (During Active Calls)

Calls connected (endpoints reached the INCALL phase) but media quality is severely degraded or media is not flowing.

No RTP Traffic

Firewall blocking UDP: RTP uses UDP on a range of ports. Ensure the worker and the remote endpoint can exchange UDP packets on the negotiated ports.
NAT issues: If the worker is behind NAT, the SDP may contain a private IP address that the remote endpoint cannot reach. Consider deploying the worker with a public IP or using a STUN/TURN server.
SDP IP mismatch: Check the SDP for the correct media IP address. Intermediate proxies may have modified the SDP incorrectly.

One-Way Audio/Video

Asymmetric NAT: One direction works but the other does not. This is a classic NAT traversal problem.
Firewall rules: Ensure bidirectional UDP traffic is allowed, not just outbound.
Media relay: Some SBCs relay media. Check if the relay is functioning correctly.

Codec Mismatch Post-Negotiation

In rare cases, SDP negotiation succeeds but the actual media uses a different encoding than expected:

Check the SDP answer for the negotiated codec
Verify the RTP payload type in the media stream matches the negotiated payload type
Check for re-INVITEs that may have changed the codec mid-call

Firewall and NAT Troubleshooting

Firewalls and NAT are responsible for more VoIP failures than any other single factor. SIP and RTP use different ports, different protocols, and sometimes different IP addresses, all of which must be permitted through every firewall and NAT device in the path.

SIP Signaling Blocked

If no SIP response arrives at all (connection timeout), the firewall is likely blocking the signaling port:

Identify the transport and port: UDP/TCP 5060 (standard SIP) or TCP 5061 (SIP over TLS)
Check outbound rules on the worker's network: The worker must be able to send to the registrar's IP on the SIP port
Check inbound rules on the registrar's network: The registrar must accept connections from the worker's public IP
Check stateful inspection: Some firewalls require SIP ALG (Application Layer Gateway) to properly track SIP transactions. Others perform better with SIP ALG disabled. If you see intermittent failures, try toggling SIP ALG.

RTP Media Blocked

If SIP signaling works (endpoints register, calls connect) but there is no audio or video:

RTP port range: RTP uses UDP on a dynamic port range (typically 10000-20000). The firewall must allow UDP traffic on this entire range between the worker and the remote media endpoint.
Symmetric NAT: If the worker is behind symmetric NAT, the public port assigned to outbound RTP packets is unpredictable. The remote endpoint's response packets may be sent to a different port than the firewall expects, and be dropped. Use a STUN/TURN server to resolve this.
Conntrack timeouts: Linux firewalls using conntrack may expire UDP "connections" after 30 seconds of inactivity. If the media flow pauses briefly (e.g., during silence suppression), the conntrack entry expires and subsequent packets are dropped. Set the conntrack UDP timeout to at least 120 seconds.

NAT Traversal Failures

NAT creates a mismatch between the worker's private IP (in the SDP body) and its public IP (seen by the registrar). If the remote endpoint sends media to the private IP, it will never arrive.

Check the SDP body in the SIP trace. If the c= line contains a private IP address (10.x.x.x, 172.16-31.x.x, 192.168.x.x), the remote endpoint cannot route media to it.
STUN: Configure a STUN server so the worker discovers its public IP and includes it in the SDP.
TURN: If STUN is insufficient (symmetric NAT), use a TURN relay server. TURN relays all media through a public server, bypassing NAT entirely at the cost of added latency.
SIP registrar-side fix: Some registrars and SBCs can rewrite the SDP to use the source IP observed at the registrar. Check whether your registrar has a "NAT fix" or "SDP rewrite" feature.

SIP ALG Can Cause Problems

Many consumer and enterprise routers include SIP ALG (Application Layer Gateway) that attempts to rewrite SIP messages to fix NAT issues. In practice, SIP ALG frequently corrupts SIP headers, breaks authentication, or modifies SDP incorrectly. If you experience bizarre registration or call failures that defy explanation, try disabling SIP ALG on the router.

Worker issues can manifest as test-level failures (the entire run fails) or endpoint-level failures (some endpoints on a specific worker fail).

Worker Cannot Connect to Platform

If a user-owned worker shows OFFLINE on the Workers page and never transitions to ONLINE:

Outbound connectivity: The worker container must be able to reach the CallMeter platform gateway over the internet. Check that outbound HTTPS is not blocked by a proxy or firewall.
Token validity: Verify the cmw_ worker token is correct and has not been revoked. Re-copy the token from the Workers page and update the Docker container environment variable.
DNS resolution: The worker must resolve the CallMeter gateway hostname. Check DNS configuration inside the container.
Proxy configuration: If the worker's network requires an HTTP proxy for outbound connections, configure the proxy settings in the Docker environment.

Worker Out of Memory (OOM)

Each endpoint consumes memory for SIP state, RTP buffers, and media processing. If the worker container does not have enough memory for the allocated endpoint count:

Symptom: All endpoints on one worker abruptly move to CLOSED phase mid-test. The worker itself may restart.
Check host logs: The kernel OOM killer logs the killed process. Look for OOM events in dmesg or /var/log/kern.log on the Docker host.
Resolution: Increase the container memory limit (--memory flag in Docker), or reduce the number of endpoints assigned per worker. As a guideline, budget approximately 50-100 MB per endpoint for audio-only tests and 100-200 MB per endpoint when video is enabled.

Worker CPU Saturation

Media encoding and decoding are CPU-intensive, especially for video. If the worker runs out of CPU:

Symptom: Quality metrics degrade progressively during the test. Jitter and PLC events increase as CPU contention delays packet processing.
Check: Monitor CPU usage on the Docker host during a test. If CPU consistently hits 100%, the worker is saturated.
Resolution: Reduce the endpoint count per worker, deploy additional workers to share the load, or increase the host CPU allocation for the container.

Checklist: Before Reporting a Bug

Before contacting CallMeter support, verify the following:

The registrar domain resolves from the worker's network
SIP account credentials are correct (test with a single endpoint first)
The transport protocol matches the registrar's configuration
Firewall allows SIP signaling (UDP/TCP 5060 or TLS 5061) from the worker
Firewall allows RTP media (UDP port range) bidirectionally
The buildup time is sufficient for the number of endpoints
The worker has adequate resources (CPU, memory) for the endpoint count
At least one codec in the test configuration is supported by the target system

SIP Registration Errors -- Detailed registration failure diagnosis
Poor Quality Metrics -- Quality issues without failures
SIP Response Codes -- Full response code reference
Endpoint Statuses -- Endpoint lifecycle
Test Run Statuses -- Run lifecycle
Worker Statuses -- Worker connection states

Common Test Failures

On this page