Analyzing Results
Interpret test run metrics, drill into per-endpoint data, use filters, read time-series charts, compare runs, and identify quality patterns.
After a test run completes, CallMeter provides comprehensive analytics at multiple levels -- aggregate overview, per-group breakdown, per-endpoint detail, and individual SIP message traces. This guide explains how to navigate the results, interpret the data, and identify actionable patterns.
Accessing Test Run Results
- Open a test from your project's Tests page
- Click on a specific test run from the run history list
- The results page opens with the Overview tab selected
Each test run has its own independent results. Running the same test multiple times creates separate result sets that you can compare.
Test Run Overview
The overview tab provides a high-level summary of the entire test run:
Key Aggregate Metrics
| Metric | What It Shows |
|---|---|
| Call Success Rate | Percentage of endpoints that completed their calls without errors |
| Average MOS | Mean Opinion Score across all endpoints -- the primary quality indicator |
| Average Jitter | Mean inter-packet delay variation across all endpoints |
| Average Packet Loss | Mean packet loss percentage across all endpoints |
| Average RTT | Mean round-trip time across all endpoints |
| Setup Time | Average time from INVITE to call establishment |
Outcome Distribution
A breakdown showing how many endpoints ended with each outcome:
- SUCCESS -- call completed normally with no errors
- COMPLETED_WITH_WARNINGS -- call completed but non-fatal issues detected
- REGISTRATION_FAILED -- SIP registration was rejected or timed out
- CALL_FAILED -- call setup failed (callee unreachable, routing error)
- NEGOTIATION_FAILED -- codec or media negotiation failed
- TIMEOUT -- operation timed out (INVITE timeout, receiver wait timeout)
- OSERROR -- infrastructure-level error
- TEARDOWN_ERROR -- call teardown failed
This distribution is the first thing to check. If a significant percentage of endpoints have error outcomes, investigate those failures before analyzing quality metrics.
Timeline
A timeline visualization showing key events across the test duration:
- Registration starts and completions
- Call setup events
- Call completion events
- Any failures plotted at the time they occurred
The timeline helps identify whether failures were concentrated at a specific time (e.g., a burst at the beginning suggesting capacity issues) or distributed throughout the test.
Filtering Results
Use filters to narrow down the results view:
Filter by Group
When running multi-group tests, filter by group to compare quality between different configurations. This is essential for A/B testing scenarios where you need to isolate results from each group.
Filter by Outcome
Focus on specific endpoint results:
- All -- show every endpoint regardless of outcome
- SUCCESS / COMPLETED_WITH_WARNINGS -- only endpoints that finished successfully (useful for quality analysis)
- Error outcomes -- only endpoints with error outcomes like REGISTRATION_FAILED, CALL_FAILED, TIMEOUT, etc. (useful for troubleshooting)
Filter by Direction
Every metric is collected per-direction:
| Direction | What It Measures |
|---|---|
| Send | Quality of the outbound media stream -- what this endpoint transmits to the other side |
| Recv | Quality of the inbound media stream -- what this endpoint receives from the other side |
Filtering by direction is critical for diagnosing asymmetric quality issues. For example, if send quality is good but receive quality is poor, the problem likely lies on the remote side or in the network path from remote to local.
Per-Endpoint Detail
Click on any endpoint in the results list to open its detailed view. This is where the deepest analysis happens.
Metric Charts
Time-series charts display every collected metric over the duration of the call. Metrics are organized into categories:
Quality Metrics:
| Metric | Unit | What to Look For |
|---|---|---|
| MOS | Score (1-5) | Above 3.5 = good, 3.0-3.5 = acceptable, below 3.0 = poor |
| R-Factor | Score (0-120) | Above 80 = good, 60-80 = acceptable, below 60 = poor |
| Jitter | ms | Below 30ms = good, 30-50ms = marginal, above 50ms = problematic |
| RTT | ms | Below 150ms = good, 150-300ms = acceptable, above 300ms = conversational issues |
| Packet Loss | % | Below 1% = good, 1-3% = marginal, above 3% = noticeable degradation |
Network Metrics:
| Metric | Unit | Significance |
|---|---|---|
| Bitrate | kbps | Should match expected codec bitrate |
| Packets Sent/Received | count/s | Should be consistent throughout the call |
| Duplicate Packets | count | Non-zero indicates network issues |
| Out-of-Order Packets | count | Non-zero indicates routing instability |
Jitter Buffer Metrics:
| Metric | Significance |
|---|---|
| Buffer Delay | How much delay the jitter buffer adds to smooth out packet timing |
| Late Packets | Packets arriving after the jitter buffer play-out deadline |
| RTX Requests | Retransmission requests (NACK) sent for lost packets |
Audio Metrics:
| Metric | Significance |
|---|---|
| PLC Events | Packet Loss Concealment activations -- the codec is generating synthetic audio to fill gaps |
| Audio Level | The volume level of the audio stream. Flat zero means silence (possible muting or media issue) |
| DTMF | DTMF digits sent and received during the call (RFC 4733 RTP or SIP INFO) |
Video Metrics (when video is enabled):
| Metric | Significance |
|---|---|
| Frame Rate | Should match configured FPS. Drops indicate congestion or encoding issues |
| Freeze Events | Video freezes caused by missing frames |
| Decode Errors | Frames that could not be decoded |
Reading Time-Series Charts
Each chart shows metric values plotted over the call duration. Key patterns to recognize:
- Flat line at expected value -- healthy, stable metric (e.g., steady MOS around 4.0)
- Gradual degradation -- metric worsening over time suggests resource exhaustion (buffer overflow, CPU overload)
- Sudden spike -- a brief disturbance in an otherwise stable metric (network congestion burst, brief packet loss)
- Step change -- metric shifts to a new level and stays there (route change, codec renegotiation)
- Periodic pattern -- regular oscillations suggest a systematic issue (e.g., competing traffic, keep-alive interference)
SIP Message Trace
The SIP trace tab shows the complete signaling exchange for the endpoint:
- REGISTER request and response (200 OK or error)
- INVITE request (for callers) or incoming INVITE (for callees)
- 100 Trying -- provisional response
- 180 Ringing -- the callee is alerting
- 200 OK -- call answered, includes SDP with negotiated codec
- ACK -- confirms call establishment
- BYE -- call termination
- 200 OK (final) -- BYE acknowledged
The SIP trace is invaluable for diagnosing:
- Registration failures (check the SIP response code: 401, 403, 404, 408)
- Call setup failures (check INVITE responses: 486 Busy, 503 Service Unavailable)
- Codec negotiation issues (compare SDP offer and answer for compatible codecs)
- Unexpected call teardowns (look for unexpected BYE messages or error responses)
Comparing Runs
Run the same test multiple times to build a picture of your SIP infrastructure's performance over time:
- Each run produces independent results with a unique run ID and timestamp
- Open any past run from the test's run history list
- Compare aggregate metrics across runs to identify trends or regressions
- Look for consistency -- if MOS varies widely between runs, investigate network conditions
Comparison tips:
- Run tests at the same time of day for fair comparison
- Keep the test configuration identical between runs
- Document any infrastructure changes between runs (firmware updates, route changes, new capacity)
- Use multi-group tests to compare configurations within a single run for the fairest comparison
Key Metrics to Check First
When reviewing results, prioritize these metrics in order:
- Call Success Rate -- if endpoints are failing, fix that before analyzing quality
- MOS -- the single most important quality indicator, combining multiple factors into one score
- Packet Loss -- the most common cause of poor call quality
- Jitter -- the second most common quality issue, especially on congested networks
- RTT -- important for conversational quality and geographic distribution
- Setup Time -- how quickly calls are established, important for user experience
Spotting Common Patterns
| Pattern | Likely Cause | Investigation |
|---|---|---|
| High packet loss, low MOS | Network congestion or undersized links | Check bitrate vs. available bandwidth |
| High jitter, acceptable loss | Queuing delays on the network path | Check router/switch queue depths and QoS configuration |
| High RTT, everything else fine | Geographic distance or routing inefficiency | Run tests from closer regions or check routing |
| Good send quality, poor recv quality | Problem on the remote side or asymmetric path | Check the other endpoint's send metrics |
| All endpoints REGISTRATION_FAILED | Wrong registrar configuration | Verify domain, port, transport, credentials |
| First N endpoints succeed, rest fail | Registrar capacity limit | Check max registration count on your PBX |
| Quality degrades over time | Resource exhaustion on PBX | Check PBX CPU, memory, and session limits |
Exporting Data
For deeper analysis in external tools, you can export test run data. The results dashboard provides export options for metric data that can be imported into spreadsheets or analytics tools.
Next Steps
- Endpoint Statuses -- Complete endpoint lifecycle reference
- Test Run Statuses -- Full status transition details
- Common Test Failures -- Diagnose specific failure patterns
- Poor Quality Metrics -- Troubleshoot low MOS, high jitter, etc.
Running a Test
Execute a SIP load test, monitor real-time progress, understand run statuses, stop tests early, and manage concurrent test limits.
Creating a Probe
Set up continuous SIP monitoring with scheduled calls, threshold-based health evaluation, and automated alerting for your VoIP infrastructure.