CallMeter Docs

Analyzing Results

Interpret test run metrics, drill into per-endpoint data, use filters, read time-series charts, compare runs, and identify quality patterns.

After a test run completes, CallMeter provides comprehensive analytics at multiple levels -- aggregate overview, per-group breakdown, per-endpoint detail, and individual SIP message traces. This guide explains how to navigate the results, interpret the data, and identify actionable patterns.

Accessing Test Run Results

  1. Open a test from your project's Tests page
  2. Click on a specific test run from the run history list
  3. The results page opens with the Overview tab selected

Each test run has its own independent results. Running the same test multiple times creates separate result sets that you can compare.

Test Run Overview

The overview tab provides a high-level summary of the entire test run:

Key Aggregate Metrics

MetricWhat It Shows
Call Success RatePercentage of endpoints that completed their calls without errors
Average MOSMean Opinion Score across all endpoints -- the primary quality indicator
Average JitterMean inter-packet delay variation across all endpoints
Average Packet LossMean packet loss percentage across all endpoints
Average RTTMean round-trip time across all endpoints
Setup TimeAverage time from INVITE to call establishment

Outcome Distribution

A breakdown showing how many endpoints ended with each outcome:

  • SUCCESS -- call completed normally with no errors
  • COMPLETED_WITH_WARNINGS -- call completed but non-fatal issues detected
  • REGISTRATION_FAILED -- SIP registration was rejected or timed out
  • CALL_FAILED -- call setup failed (callee unreachable, routing error)
  • NEGOTIATION_FAILED -- codec or media negotiation failed
  • TIMEOUT -- operation timed out (INVITE timeout, receiver wait timeout)
  • OSERROR -- infrastructure-level error
  • TEARDOWN_ERROR -- call teardown failed

This distribution is the first thing to check. If a significant percentage of endpoints have error outcomes, investigate those failures before analyzing quality metrics.

Timeline

A timeline visualization showing key events across the test duration:

  • Registration starts and completions
  • Call setup events
  • Call completion events
  • Any failures plotted at the time they occurred

The timeline helps identify whether failures were concentrated at a specific time (e.g., a burst at the beginning suggesting capacity issues) or distributed throughout the test.

Filtering Results

Use filters to narrow down the results view:

Filter by Group

When running multi-group tests, filter by group to compare quality between different configurations. This is essential for A/B testing scenarios where you need to isolate results from each group.

Filter by Outcome

Focus on specific endpoint results:

  • All -- show every endpoint regardless of outcome
  • SUCCESS / COMPLETED_WITH_WARNINGS -- only endpoints that finished successfully (useful for quality analysis)
  • Error outcomes -- only endpoints with error outcomes like REGISTRATION_FAILED, CALL_FAILED, TIMEOUT, etc. (useful for troubleshooting)

Filter by Direction

Every metric is collected per-direction:

DirectionWhat It Measures
SendQuality of the outbound media stream -- what this endpoint transmits to the other side
RecvQuality of the inbound media stream -- what this endpoint receives from the other side

Filtering by direction is critical for diagnosing asymmetric quality issues. For example, if send quality is good but receive quality is poor, the problem likely lies on the remote side or in the network path from remote to local.

Per-Endpoint Detail

Click on any endpoint in the results list to open its detailed view. This is where the deepest analysis happens.

Metric Charts

Time-series charts display every collected metric over the duration of the call. Metrics are organized into categories:

Quality Metrics:

MetricUnitWhat to Look For
MOSScore (1-5)Above 3.5 = good, 3.0-3.5 = acceptable, below 3.0 = poor
R-FactorScore (0-120)Above 80 = good, 60-80 = acceptable, below 60 = poor
JittermsBelow 30ms = good, 30-50ms = marginal, above 50ms = problematic
RTTmsBelow 150ms = good, 150-300ms = acceptable, above 300ms = conversational issues
Packet Loss%Below 1% = good, 1-3% = marginal, above 3% = noticeable degradation

Network Metrics:

MetricUnitSignificance
BitratekbpsShould match expected codec bitrate
Packets Sent/Receivedcount/sShould be consistent throughout the call
Duplicate PacketscountNon-zero indicates network issues
Out-of-Order PacketscountNon-zero indicates routing instability

Jitter Buffer Metrics:

MetricSignificance
Buffer DelayHow much delay the jitter buffer adds to smooth out packet timing
Late PacketsPackets arriving after the jitter buffer play-out deadline
RTX RequestsRetransmission requests (NACK) sent for lost packets

Audio Metrics:

MetricSignificance
PLC EventsPacket Loss Concealment activations -- the codec is generating synthetic audio to fill gaps
Audio LevelThe volume level of the audio stream. Flat zero means silence (possible muting or media issue)
DTMFDTMF digits sent and received during the call (RFC 4733 RTP or SIP INFO)

Video Metrics (when video is enabled):

MetricSignificance
Frame RateShould match configured FPS. Drops indicate congestion or encoding issues
Freeze EventsVideo freezes caused by missing frames
Decode ErrorsFrames that could not be decoded

Reading Time-Series Charts

Each chart shows metric values plotted over the call duration. Key patterns to recognize:

  • Flat line at expected value -- healthy, stable metric (e.g., steady MOS around 4.0)
  • Gradual degradation -- metric worsening over time suggests resource exhaustion (buffer overflow, CPU overload)
  • Sudden spike -- a brief disturbance in an otherwise stable metric (network congestion burst, brief packet loss)
  • Step change -- metric shifts to a new level and stays there (route change, codec renegotiation)
  • Periodic pattern -- regular oscillations suggest a systematic issue (e.g., competing traffic, keep-alive interference)

SIP Message Trace

The SIP trace tab shows the complete signaling exchange for the endpoint:

  1. REGISTER request and response (200 OK or error)
  2. INVITE request (for callers) or incoming INVITE (for callees)
  3. 100 Trying -- provisional response
  4. 180 Ringing -- the callee is alerting
  5. 200 OK -- call answered, includes SDP with negotiated codec
  6. ACK -- confirms call establishment
  7. BYE -- call termination
  8. 200 OK (final) -- BYE acknowledged

The SIP trace is invaluable for diagnosing:

  • Registration failures (check the SIP response code: 401, 403, 404, 408)
  • Call setup failures (check INVITE responses: 486 Busy, 503 Service Unavailable)
  • Codec negotiation issues (compare SDP offer and answer for compatible codecs)
  • Unexpected call teardowns (look for unexpected BYE messages or error responses)

Comparing Runs

Run the same test multiple times to build a picture of your SIP infrastructure's performance over time:

  • Each run produces independent results with a unique run ID and timestamp
  • Open any past run from the test's run history list
  • Compare aggregate metrics across runs to identify trends or regressions
  • Look for consistency -- if MOS varies widely between runs, investigate network conditions

Comparison tips:

  • Run tests at the same time of day for fair comparison
  • Keep the test configuration identical between runs
  • Document any infrastructure changes between runs (firmware updates, route changes, new capacity)
  • Use multi-group tests to compare configurations within a single run for the fairest comparison

Key Metrics to Check First

When reviewing results, prioritize these metrics in order:

  1. Call Success Rate -- if endpoints are failing, fix that before analyzing quality
  2. MOS -- the single most important quality indicator, combining multiple factors into one score
  3. Packet Loss -- the most common cause of poor call quality
  4. Jitter -- the second most common quality issue, especially on congested networks
  5. RTT -- important for conversational quality and geographic distribution
  6. Setup Time -- how quickly calls are established, important for user experience

Spotting Common Patterns

PatternLikely CauseInvestigation
High packet loss, low MOSNetwork congestion or undersized linksCheck bitrate vs. available bandwidth
High jitter, acceptable lossQueuing delays on the network pathCheck router/switch queue depths and QoS configuration
High RTT, everything else fineGeographic distance or routing inefficiencyRun tests from closer regions or check routing
Good send quality, poor recv qualityProblem on the remote side or asymmetric pathCheck the other endpoint's send metrics
All endpoints REGISTRATION_FAILEDWrong registrar configurationVerify domain, port, transport, credentials
First N endpoints succeed, rest failRegistrar capacity limitCheck max registration count on your PBX
Quality degrades over timeResource exhaustion on PBXCheck PBX CPU, memory, and session limits

Exporting Data

For deeper analysis in external tools, you can export test run data. The results dashboard provides export options for metric data that can be imported into spreadsheets or analytics tools.

Next Steps

On this page