Analyzing Results

Interpret test run metrics, drill into per-endpoint data, use filters, read time-series charts, compare runs, and identify quality patterns.

After a test run completes, CallMeter provides comprehensive analytics at multiple levels -- aggregate overview, per-group breakdown, per-endpoint detail, and individual SIP message traces. This guide explains how to navigate the results, interpret the data, and identify actionable patterns.

Accessing Test Run Results

Open a test from your project's Tests page
Click on a specific test run from the run history list
The results page opens with the Overview tab selected

Each test run has its own independent results. Running the same test multiple times creates separate result sets that you can compare.

Test Run Overview

The overview tab provides a high-level summary of the entire test run:

Key Aggregate Metrics

Metric	What It Shows
Call Success Rate	Percentage of endpoints that completed their calls without errors
Average MOS	Mean Opinion Score across all endpoints -- the primary quality indicator
Average Jitter	Mean inter-packet delay variation across all endpoints
Average Packet Loss	Mean packet loss percentage across all endpoints
Average RTT	Mean round-trip time across all endpoints
Setup Time	Average time from INVITE to call establishment

Outcome Distribution

A breakdown showing how many endpoints ended with each outcome:

SUCCESS -- call completed normally with no errors
COMPLETED_WITH_WARNINGS -- call completed but non-fatal issues detected
REGISTRATION_FAILED -- SIP registration was rejected or timed out
CALL_FAILED -- call setup failed (callee unreachable, routing error)
NEGOTIATION_FAILED -- codec or media negotiation failed
TIMEOUT -- operation timed out (INVITE timeout, receiver wait timeout)
OSERROR -- infrastructure-level error
TEARDOWN_ERROR -- call teardown failed

This distribution is the first thing to check. If a significant percentage of endpoints have error outcomes, investigate those failures before analyzing quality metrics.

Timeline

A timeline visualization showing key events across the test duration:

Registration starts and completions
Call setup events
Call completion events
Any failures plotted at the time they occurred

The timeline helps identify whether failures were concentrated at a specific time (e.g., a burst at the beginning suggesting capacity issues) or distributed throughout the test.

Filtering Results

Use filters to narrow down the results view:

Filter by Group

When running multi-group tests, filter by group to compare quality between different configurations. This is essential for A/B testing scenarios where you need to isolate results from each group.

Filter by Outcome

Focus on specific endpoint results:

All -- show every endpoint regardless of outcome
SUCCESS / COMPLETED_WITH_WARNINGS -- only endpoints that finished successfully (useful for quality analysis)
Error outcomes -- only endpoints with error outcomes like REGISTRATION_FAILED, CALL_FAILED, TIMEOUT, etc. (useful for troubleshooting)

Filter by Direction

Every metric is collected per-direction:

Direction	What It Measures
Send	Quality of the outbound media stream -- what this endpoint transmits to the other side
Recv	Quality of the inbound media stream -- what this endpoint receives from the other side

Filtering by direction is critical for diagnosing asymmetric quality issues. For example, if send quality is good but receive quality is poor, the problem likely lies on the remote side or in the network path from remote to local.

Per-Endpoint Detail

Click on any endpoint in the results list to open its detailed view. This is where the deepest analysis happens.

Metric Charts

Time-series charts display every collected metric over the duration of the call. Metrics are organized into categories:

Quality Metrics:

Metric	Unit	What to Look For
MOS	Score (1-5)	Above 3.5 = good, 3.0-3.5 = acceptable, below 3.0 = poor
R-Factor	Score (0-120)	Above 80 = good, 60-80 = acceptable, below 60 = poor
Jitter	ms	Below 30ms = good, 30-50ms = marginal, above 50ms = problematic
RTT	ms	Below 150ms = good, 150-300ms = acceptable, above 300ms = conversational issues
Packet Loss	%	Below 1% = good, 1-3% = marginal, above 3% = noticeable degradation

Network Metrics:

Metric	Unit	Significance
Bitrate	kbps	Should match expected codec bitrate
Packets Sent/Received	count/s	Should be consistent throughout the call
Duplicate Packets	count	Non-zero indicates network issues
Out-of-Order Packets	count	Non-zero indicates routing instability

Jitter Buffer Metrics:

Metric	Significance
Buffer Delay	How much delay the jitter buffer adds to smooth out packet timing
Late Packets	Packets arriving after the jitter buffer play-out deadline
RTX Requests	Retransmission requests (NACK) sent for lost packets

Audio Metrics:

Metric	Significance
PLC Events	Packet Loss Concealment activations -- the codec is generating synthetic audio to fill gaps
Audio Level	The volume level of the audio stream. Flat zero means silence (possible muting or media issue)
DTMF	DTMF digits sent and received during the call (RFC 4733 RTP or SIP INFO)

Video Metrics (when video is enabled):

Metric	Significance
Frame Rate	Should match configured FPS. Drops indicate congestion or encoding issues
Freeze Events	Video freezes caused by missing frames
Decode Errors	Frames that could not be decoded

Reading Time-Series Charts

Each chart shows metric values plotted over the call duration. Key patterns to recognize:

Flat line at expected value -- healthy, stable metric (e.g., steady MOS around 4.0)
Gradual degradation -- metric worsening over time suggests resource exhaustion (buffer overflow, CPU overload)
Sudden spike -- a brief disturbance in an otherwise stable metric (network congestion burst, brief packet loss)
Step change -- metric shifts to a new level and stays there (route change, codec renegotiation)
Periodic pattern -- regular oscillations suggest a systematic issue (e.g., competing traffic, keep-alive interference)

SIP Message Trace

The SIP trace tab shows the complete signaling exchange for the endpoint:

REGISTER request and response (200 OK or error)
INVITE request (for callers) or incoming INVITE (for callees)
100 Trying -- provisional response
180 Ringing -- the callee is alerting
200 OK -- call answered, includes SDP with negotiated codec
ACK -- confirms call establishment
BYE -- call termination
200 OK (final) -- BYE acknowledged

The SIP trace is invaluable for diagnosing:

Registration failures (check the SIP response code: 401, 403, 404, 408)
Call setup failures (check INVITE responses: 486 Busy, 503 Service Unavailable)
Codec negotiation issues (compare SDP offer and answer for compatible codecs)
Unexpected call teardowns (look for unexpected BYE messages or error responses)

Comparing Runs

Run the same test multiple times to build a picture of your SIP infrastructure's performance over time:

Each run produces independent results with a unique run ID and timestamp
Open any past run from the test's run history list
Compare aggregate metrics across runs to identify trends or regressions
Look for consistency -- if MOS varies widely between runs, investigate network conditions

Comparison tips:

Run tests at the same time of day for fair comparison
Keep the test configuration identical between runs
Document any infrastructure changes between runs (firmware updates, route changes, new capacity)
Use multi-group tests to compare configurations within a single run for the fairest comparison

Key Metrics to Check First

When reviewing results, prioritize these metrics in order:

Call Success Rate -- if endpoints are failing, fix that before analyzing quality
MOS -- the single most important quality indicator, combining multiple factors into one score
Packet Loss -- the most common cause of poor call quality
Jitter -- the second most common quality issue, especially on congested networks
RTT -- important for conversational quality and geographic distribution
Setup Time -- how quickly calls are established, important for user experience

Spotting Common Patterns

Pattern	Likely Cause	Investigation
High packet loss, low MOS	Network congestion or undersized links	Check bitrate vs. available bandwidth
High jitter, acceptable loss	Queuing delays on the network path	Check router/switch queue depths and QoS configuration
High RTT, everything else fine	Geographic distance or routing inefficiency	Run tests from closer regions or check routing
Good send quality, poor recv quality	Problem on the remote side or asymmetric path	Check the other endpoint's send metrics
All endpoints REGISTRATION_FAILED	Wrong registrar configuration	Verify domain, port, transport, credentials
First N endpoints succeed, rest fail	Registrar capacity limit	Check max registration count on your PBX
Quality degrades over time	Resource exhaustion on PBX	Check PBX CPU, memory, and session limits

Exporting Data

For deeper analysis in external tools, you can export test run data. The results dashboard provides export options for metric data that can be imported into spreadsheets or analytics tools.

Next Steps

Endpoint Statuses -- Complete endpoint lifecycle reference
Test Run Statuses -- Full status transition details
Common Test Failures -- Diagnose specific failure patterns
Poor Quality Metrics -- Troubleshoot low MOS, high jitter, etc.

Analyzing Results

On this page