Common Test Failures
Step-by-step diagnostic guide for the most frequent causes of failed test runs in CallMeter, including registration failures, call setup errors, media issues, and worker problems.
When a test run ends with a FAILED or CANNOT_RUN_FOR_NOW status, or when a significant number of endpoints fail within a COMPLETED run, use this guide to diagnose the root cause and resolve the issue. This page covers the most common failure patterns, organized by the phase in which they occur.
Decision Tree: Where to Start
Before diving into specific failure types, identify the failure phase:
- Did the test run start at all? If the status is CANNOT_RUN_FOR_NOW, the issue is resource allocation. Go to Test Stuck in CANNOT_RUN_FOR_NOW.
- Did the run fail immediately after starting? If the status went to FAILED within seconds, the issue is likely worker connectivity. Go to Worker Disconnection During Test.
- Did endpoints fail to register? Check the endpoint outcome summary. If most endpoints have a REGISTRATION_FAILED outcome, go to Registration Failures.
- Did calls fail to connect? If most endpoints registered but ended with CALL_FAILED, TIMEOUT, or NEGOTIATION_FAILED outcomes, go to Call Setup Failures.
- Did calls connect but metrics show problems? If endpoints reached the INCALL phase but quality is poor, go to Poor Quality Metrics instead.
All Endpoints Fail vs Some Endpoints Fail
The scope of the failure is the single most useful diagnostic signal. Before reading any SIP traces, answer this question first.
All Endpoints Failed
When every endpoint in the test fails, the problem is almost always systemic:
- Network or firewall: The worker cannot reach the registrar at all. No SIP messages arrive. Check connectivity from the worker's network to the registrar's IP and port.
- Credential error: All endpoints share the same SIP account pool, and every account has incorrect credentials. Verify with a single-endpoint test first.
- Wrong transport: The test is configured for UDP but the registrar only accepts TLS (or vice versa). The registrar never sees the SIP traffic.
- DNS failure: The registrar hostname does not resolve from the worker's network. The worker cannot determine where to send REGISTER requests.
- Worker-level failure: The worker itself crashed or ran out of resources before any endpoint could complete. Check the worker status on the Workers page.
Some Endpoints Failed
When a subset of endpoints fails while others succeed, the problem is typically capacity or configuration-related:
- Rate limiting: The registrar throttles registration requests above a certain rate. The first endpoints succeed, later ones receive 408 or 503. Increase the buildup time.
- SIP account shortage: The test has more endpoints than SIP accounts. Endpoints without assigned accounts fail immediately. Check the account pool size.
- Per-group issue: One group's registrar is misconfigured while other groups succeed. Filter endpoint results by group to isolate the problem.
- Worker capacity: One worker is overloaded while others are fine. Compare failure rates across workers on the test run detail page.
- Intermittent network: A flaky network path causes random timeouts. Failures are scattered across endpoints with no clear pattern.
Test Stuck in CANNOT_RUN_FOR_NOW
The platform could not allocate workers to execute the test. No SIP traffic was generated.
Symptom
The test run shows status CANNOT_RUN_FOR_NOW immediately after you click Run.
Possible Causes and Resolutions
No online workers available
- Navigate to the Workers page in your project
- Check that at least one worker shows ONLINE status
- If all workers are OFFLINE, verify that your cloud workers are active or that your user-owned workers are running
- If workers show ERROR status, see Worker Statuses for troubleshooting
Workers are busy with other tests
- Check if another test run is currently RUNNING
- Wait for the current test to complete, then retry
- If you frequently need concurrent tests, add more workers or deploy additional user-owned workers
Endpoint count exceeds plan limit
- Check your plan's concurrent endpoint limit in Settings and then Billing
- Reduce the total endpoint count in your test configuration
- Upgrade your plan for higher limits. See Plans and Pricing.
Billing credits exhausted
- Check your credit balance in Settings and then Billing
- Add credits or wait for the next billing cycle
- See Credits and Overages for details
Region mismatch
- Verify that your test groups specify a region where workers are available
- If you are using region-based allocation, check that ONLINE workers exist in the selected region
- Consider switching to a region with available capacity or deploying a user-owned worker in the target region
Registration Failures
A large number of endpoints have a REGISTRATION_FAILED outcome. They never reached the call phase.
Symptom
The test run completed (or failed), and the endpoint outcome summary shows a high percentage of endpoints with REGISTRATION_FAILED outcome (phase: CLOSED).
Step-by-Step Diagnosis
Step 1: Check the SIP response code
Open a few failed endpoints and examine the SIP message trace. Look for the REGISTER request and the server's response. The response code tells you what went wrong:
| Response Code | Meaning | Go To |
|---|---|---|
| 401 Unauthorized | Credentials rejected | SIP Registration Errors - 401 |
| 403 Forbidden | Access denied | SIP Registration Errors - 403 |
| 404 Not Found | Domain or user not found | SIP Registration Errors - 404 |
| 408 Request Timeout | Server overloaded or unreachable | Step 2 below |
| No response | Network issue | Step 3 below |
Step 2: Check for server overload (408)
If endpoints are receiving 408 responses, the registrar may be overwhelmed by the burst of registration requests:
- Increase the buildup time in your test configuration. This staggers registrations over a longer period.
- Reduce the number of concurrent endpoints in the test
- Verify the SIP registrar's capacity for concurrent registrations
- Check if the registrar has rate limiting enabled. If so, adjust the buildup time to stay within the rate limit.
Step 3: Check for network issues (no response)
If endpoints show no response to the REGISTER request:
- DNS resolution: Verify the registrar domain resolves correctly from the worker's network. An incorrect domain or DNS misconfiguration will prevent the worker from reaching the registrar.
- Firewall rules: Ensure the worker can reach the registrar on the configured transport port (UDP 5060, TCP 5060, or TLS 5061). For user-owned workers, check both the worker host's firewall and any network firewalls between the worker and the registrar.
- Transport protocol: Verify the test's transport protocol matches the registrar's configuration. If the registrar only accepts TLS connections, a test configured for UDP will fail silently.
- Registrar availability: Confirm the SIP registrar is operational. Try a manual SIP OPTIONS ping from the worker's network if possible.
Step 4: Check SIP account availability
- Verify that the registrar has enough SIP accounts configured for the number of endpoints in your test
- SIP account pooling assigns accounts to endpoints automatically. If there are fewer accounts than endpoints, some endpoints will not have credentials.
- Check that SIP accounts are not locked or disabled on the registrar side
Registration Burst Load
A common cause of mass registration failure is SIP burst load. When a test with 500 endpoints starts, all 500 may attempt to REGISTER within a very short window. Many SIP registrars and SBCs have rate limits that will reject or timeout these bursts. Always configure an appropriate buildup time (start with 10 to 30 seconds) and increase it if you see 408 or 503 responses during registration.
Call Setup Failures
Endpoints registered successfully but calls did not connect. Endpoints ended with CALL_FAILED, TIMEOUT, or NEGOTIATION_FAILED outcomes.
CALL_FAILED / TIMEOUT (Caller Side)
The caller sent an INVITE but never received a 200 OK, or the operation timed out.
Step 1: Check the SIP response code
| Response Code | Meaning | Resolution |
|---|---|---|
| 404 Not Found | Callee address unknown | Verify the callee's SIP URI is correct. In cross-group tests, ensure the callee's registrar knows the dialed address. |
| 480 Temporarily Unavailable | Callee offline | Check that callee endpoints registered successfully before callers began dialing. Increase buildup time. |
| 486 Busy Here | Callee already in a call | The callee is busy. In bidirectional tests, ensure the endpoint pairing does not result in double-booking. |
| 488 Not Acceptable | Codec mismatch | Review codec configuration. See Supported Codecs. |
| No response | Routing issue | Check SIP routing between caller and callee registrars. Verify that the INVITE is being routed to the correct destination. |
Step 2: Check cross-group targeting
In bidirectional tests with multiple groups, verify that caller groups are correctly targeting callee groups. A misconfigured targeting rule will result in callers dialing addresses that do not match any callee.
Step 3: Check timing
If callees register after callers start dialing, callers will reach an unregistered address. Ensure the buildup configuration allows callee endpoints to complete registration before callers begin the call phase.
TIMEOUT (Callee Side)
The callee registered and waited but no incoming INVITE arrived within the timeout period.
- Check that the caller group targeting this callee group is functional. If callers have REGISTRATION_FAILED outcomes, they cannot dial.
- Check that the SIP routing between the caller's registrar and the callee's registrar is correct.
- Verify that the callee's wait timeout is long enough for callers to complete their buildup and dial.
NEGOTIATION_FAILED
The SDP offer/answer exchange failed. The endpoints could not agree on media parameters.
- Open a failed endpoint and check the SDP offer in the INVITE and the SDP answer in the response
- Compare the offered codecs with the answerer's supported codecs
- Ensure at least one common audio codec exists between the two sides
- If the target system only supports specific codecs, configure your test to offer those codecs
- Check for SDP manipulation by intermediate proxies (SBCs, B2BUAs) that may strip codecs from the offer
Worker Disconnection During Test
The test run shows FAILED, and endpoints show a sudden transition to CLOSED phase with OSERROR outcome.
Symptom
All or most endpoints on a specific worker abruptly moved to CLOSED phase. The test run may have continued on other workers or failed entirely if only one worker was assigned.
Diagnosis
- Check worker status: Navigate to the Workers page. Is the worker now showing OFFLINE or ERROR status?
- Resource exhaustion: The worker container may have been killed by the operating system (OOM). Check the host system's kernel logs for OOM kill events.
- Network failure: The worker's network connection to the platform gateway may have dropped. Check network connectivity from the worker's host.
- Container restart: If the container orchestration system restarted the worker, it will reconnect as a new session. Check container lifecycle logs.
Resolution
- For OOM issues: Increase the worker container's memory limit or reduce the endpoint count per worker
- For network issues: Ensure stable connectivity between the worker and the CallMeter gateway
- For reliability: Deploy multiple workers so that a single worker failure does not fail the entire test run
Media Failures (During Active Calls)
Calls connected (endpoints reached the INCALL phase) but media quality is severely degraded or media is not flowing.
No RTP Traffic
- Firewall blocking UDP: RTP uses UDP on a range of ports. Ensure the worker and the remote endpoint can exchange UDP packets on the negotiated ports.
- NAT issues: If the worker is behind NAT, the SDP may contain a private IP address that the remote endpoint cannot reach. Consider deploying the worker with a public IP or using a STUN/TURN server.
- SDP IP mismatch: Check the SDP for the correct media IP address. Intermediate proxies may have modified the SDP incorrectly.
One-Way Audio/Video
- Asymmetric NAT: One direction works but the other does not. This is a classic NAT traversal problem.
- Firewall rules: Ensure bidirectional UDP traffic is allowed, not just outbound.
- Media relay: Some SBCs relay media. Check if the relay is functioning correctly.
Codec Mismatch Post-Negotiation
In rare cases, SDP negotiation succeeds but the actual media uses a different encoding than expected:
- Check the SDP answer for the negotiated codec
- Verify the RTP payload type in the media stream matches the negotiated payload type
- Check for re-INVITEs that may have changed the codec mid-call
Firewall and NAT Troubleshooting
Firewalls and NAT are responsible for more VoIP failures than any other single factor. SIP and RTP use different ports, different protocols, and sometimes different IP addresses, all of which must be permitted through every firewall and NAT device in the path.
SIP Signaling Blocked
If no SIP response arrives at all (connection timeout), the firewall is likely blocking the signaling port:
- Identify the transport and port: UDP/TCP 5060 (standard SIP) or TCP 5061 (SIP over TLS)
- Check outbound rules on the worker's network: The worker must be able to send to the registrar's IP on the SIP port
- Check inbound rules on the registrar's network: The registrar must accept connections from the worker's public IP
- Check stateful inspection: Some firewalls require SIP ALG (Application Layer Gateway) to properly track SIP transactions. Others perform better with SIP ALG disabled. If you see intermittent failures, try toggling SIP ALG.
RTP Media Blocked
If SIP signaling works (endpoints register, calls connect) but there is no audio or video:
- RTP port range: RTP uses UDP on a dynamic port range (typically 10000-20000). The firewall must allow UDP traffic on this entire range between the worker and the remote media endpoint.
- Symmetric NAT: If the worker is behind symmetric NAT, the public port assigned to outbound RTP packets is unpredictable. The remote endpoint's response packets may be sent to a different port than the firewall expects, and be dropped. Use a STUN/TURN server to resolve this.
- Conntrack timeouts: Linux firewalls using conntrack may expire UDP "connections" after 30 seconds of inactivity. If the media flow pauses briefly (e.g., during silence suppression), the conntrack entry expires and subsequent packets are dropped. Set the conntrack UDP timeout to at least 120 seconds.
NAT Traversal Failures
NAT creates a mismatch between the worker's private IP (in the SDP body) and its public IP (seen by the registrar). If the remote endpoint sends media to the private IP, it will never arrive.
- Check the SDP body in the SIP trace. If the
c=line contains a private IP address (10.x.x.x, 172.16-31.x.x, 192.168.x.x), the remote endpoint cannot route media to it. - STUN: Configure a STUN server so the worker discovers its public IP and includes it in the SDP.
- TURN: If STUN is insufficient (symmetric NAT), use a TURN relay server. TURN relays all media through a public server, bypassing NAT entirely at the cost of added latency.
- SIP registrar-side fix: Some registrars and SBCs can rewrite the SDP to use the source IP observed at the registrar. Check whether your registrar has a "NAT fix" or "SDP rewrite" feature.
SIP ALG Can Cause Problems
Many consumer and enterprise routers include SIP ALG (Application Layer Gateway) that attempts to rewrite SIP messages to fix NAT issues. In practice, SIP ALG frequently corrupts SIP headers, breaks authentication, or modifies SDP incorrectly. If you experience bizarre registration or call failures that defy explanation, try disabling SIP ALG on the router.
Worker-Related Failures
Worker issues can manifest as test-level failures (the entire run fails) or endpoint-level failures (some endpoints on a specific worker fail).
Worker Cannot Connect to Platform
If a user-owned worker shows OFFLINE on the Workers page and never transitions to ONLINE:
- Outbound connectivity: The worker container must be able to reach the CallMeter platform gateway over the internet. Check that outbound HTTPS is not blocked by a proxy or firewall.
- Token validity: Verify the
cmw_worker token is correct and has not been revoked. Re-copy the token from the Workers page and update the Docker container environment variable. - DNS resolution: The worker must resolve the CallMeter gateway hostname. Check DNS configuration inside the container.
- Proxy configuration: If the worker's network requires an HTTP proxy for outbound connections, configure the proxy settings in the Docker environment.
Worker Out of Memory (OOM)
Each endpoint consumes memory for SIP state, RTP buffers, and media processing. If the worker container does not have enough memory for the allocated endpoint count:
- Symptom: All endpoints on one worker abruptly move to CLOSED phase mid-test. The worker itself may restart.
- Check host logs: The kernel OOM killer logs the killed process. Look for OOM events in
dmesgor/var/log/kern.logon the Docker host. - Resolution: Increase the container memory limit (
--memoryflag in Docker), or reduce the number of endpoints assigned per worker. As a guideline, budget approximately 50-100 MB per endpoint for audio-only tests and 100-200 MB per endpoint when video is enabled.
Worker CPU Saturation
Media encoding and decoding are CPU-intensive, especially for video. If the worker runs out of CPU:
- Symptom: Quality metrics degrade progressively during the test. Jitter and PLC events increase as CPU contention delays packet processing.
- Check: Monitor CPU usage on the Docker host during a test. If CPU consistently hits 100%, the worker is saturated.
- Resolution: Reduce the endpoint count per worker, deploy additional workers to share the load, or increase the host CPU allocation for the container.
Checklist: Before Reporting a Bug
Before contacting CallMeter support, verify the following:
- The registrar domain resolves from the worker's network
- SIP account credentials are correct (test with a single endpoint first)
- The transport protocol matches the registrar's configuration
- Firewall allows SIP signaling (UDP/TCP 5060 or TLS 5061) from the worker
- Firewall allows RTP media (UDP port range) bidirectionally
- The buildup time is sufficient for the number of endpoints
- The worker has adequate resources (CPU, memory) for the endpoint count
- At least one codec in the test configuration is supported by the target system
Related Pages
- SIP Registration Errors -- Detailed registration failure diagnosis
- Poor Quality Metrics -- Quality issues without failures
- SIP Response Codes -- Full response code reference
- Endpoint Statuses -- Endpoint lifecycle
- Test Run Statuses -- Run lifecycle
- Worker Statuses -- Worker connection states
Worker Statuses
Worker connection states, lifecycle transitions, heartbeat behavior, and troubleshooting guidance for cloud and user-owned workers in CallMeter.
Poor Quality Metrics
Diagnose and resolve low MOS scores, high jitter, packet loss, high RTT, video freezes, and other quality issues in CallMeter test results.