CallMeter Docs

Probe Health States

How CallMeter evaluates probe health, the four health states, threshold evaluation logic, transition rules, webhook triggers, and best practices for threshold configuration.

Probes are scheduled monitoring tests that run continuously to assess the health of your SIP infrastructure. After each probe execution, CallMeter evaluates the collected metrics against your configured thresholds and assigns a health state. This page documents the four health states, the evaluation logic, and best practices for threshold configuration.

The Four Health States

HEALTHY

All metrics from the most recent probe execution fall within acceptable ranges. The SIP infrastructure under test is performing as expected.

  • Visual indicator: Green
  • What it means: Every configured threshold metric is within its "good" range. MOS is above the warning threshold. Jitter, RTT, and packet loss are below their warning thresholds.
  • Action required: None. The probe will continue executing at its configured interval.
  • Webhook behavior: If the probe was previously DEGRADED or UNHEALTHY, a recovery webhook is sent on the transition to HEALTHY.

DEGRADED

One or more metrics from the most recent probe execution have breached the warning threshold but remain within the critical threshold. Quality is below optimal but has not reached an emergency level.

  • Visual indicator: Yellow/Orange
  • What it means: The SIP infrastructure is showing signs of stress. For example, jitter may have risen above 30ms (warning) but remains below 80ms (critical), or MOS may have dropped below 4.0 but remains above 3.0.
  • Action required: Investigate the contributing metrics. DEGRADED is an early warning that may precede a full outage.
  • Webhook behavior: A webhook is sent when the probe transitions from HEALTHY to DEGRADED or from UNHEALTHY to DEGRADED.
  • Common causes:
    • Network congestion causing moderate jitter or packet loss increases
    • SIP server load approaching capacity
    • DNS resolution delays increasing RTT
    • Background maintenance on network equipment

UNHEALTHY

One or more metrics from the most recent probe execution have breached the critical threshold. The SIP infrastructure has a significant quality or availability problem.

  • Visual indicator: Red
  • What it means: At least one metric is in the critical range. For example, packet loss may exceed 5%, MOS may be below 3.0, or the call may have failed entirely.
  • Action required: Immediate investigation. The SIP infrastructure is experiencing a service-impacting issue.
  • Webhook behavior: A webhook is sent when the probe transitions to UNHEALTHY from any other state.
  • Common causes:
    • Network outage or severe congestion
    • SIP server down or unreachable
    • Authentication failure (credentials changed, account locked)
    • Firewall rule change blocking SIP or RTP traffic
    • Total call failure (SIP registration or call setup failed)

UNKNOWN

The probe has insufficient data to determine health, or an evaluation error occurred. This is the initial state for new probes and the fallback state when something prevents health evaluation.

  • Visual indicator: Gray
  • What it means: One of several situations:
    • The probe has never executed (newly created)
    • The most recent execution failed before metrics could be collected
    • The probe configuration is incomplete (no thresholds configured)
    • An internal evaluation error prevented health assessment
  • Action required: Check the probe configuration and ensure it has executed at least once. If the probe has executed, check the run detail for errors.
  • Webhook behavior: A webhook is sent when a previously known state transitions to UNKNOWN.

Threshold Evaluation Logic

CallMeter evaluates probe health immediately after each probe execution completes. The evaluation follows a strict process.

Step 1: Collect Final Metrics

After the probe's call completes, the platform retrieves the aggregate metric values from the execution. These are the average values over the call duration for time-series metrics (jitter, RTT, packet loss, MOS) and the final values for cumulative metrics.

Step 2: Evaluate Each Threshold

Each metric with a configured threshold is compared against its warning and critical values. CallMeter handles two types of metrics:

Lower-is-better metrics (jitter, RTT, packet loss):

  • Value below the warning threshold: this metric is Healthy
  • Value at or above the warning threshold but below the critical threshold: this metric is Degraded
  • Value at or above the critical threshold: this metric is Unhealthy

Higher-is-better metrics (MOS):

  • Value above the warning threshold: this metric is Healthy
  • Value at or below the warning threshold but above the critical threshold: this metric is Degraded
  • Value at or below the critical threshold: this metric is Unhealthy

Inverted Threshold Logic

For metrics like MOS where higher is better, the warning threshold is a higher number than the critical threshold. For example, a MOS warning of 3.8 and critical of 3.0 means: above 3.8 is healthy, 3.0 to 3.8 is degraded, and below 3.0 is unhealthy. This is the opposite direction from jitter or loss thresholds.

Step 3: Determine Overall Health

The probe's overall health state is determined by the worst individual metric evaluation. If any single metric is UNHEALTHY, the probe is UNHEALTHY. If no metric is UNHEALTHY but any metric is DEGRADED, the probe is DEGRADED. Only if all evaluated metrics are within healthy ranges does the probe report HEALTHY.

This "worst metric wins" approach ensures that a single degraded dimension is never hidden by otherwise healthy metrics.

Step 4: Check for State Transition

The newly determined health state is compared against the probe's previous state. If the state has changed, the transition is recorded, the probe's current status is updated, and any configured webhooks are triggered.

Status Transitions and Webhooks

Any transition between health states triggers configured webhooks. The possible transitions are:

FromToSeverityMeaning
HEALTHYDEGRADEDWarningQuality declining, early warning
HEALTHYUNHEALTHYCriticalSudden quality failure
DEGRADEDUNHEALTHYCriticalQuality continuing to decline
DEGRADEDHEALTHYRecoveryQuality restored from warning state
UNHEALTHYDEGRADEDPartial recoveryCritical issue partially resolved
UNHEALTHYHEALTHYFull recoveryCritical issue fully resolved
AnyUNKNOWNData issueCannot evaluate health
UNKNOWNAnyResolutionHealth evaluation restored

HEALTHY to UNHEALTHY Is Possible

A probe can jump directly from HEALTHY to UNHEALTHY in a single execution. This happens when a sudden, severe issue occurs (e.g., the SIP server goes down, firewall blocks all traffic). There is no requirement to pass through DEGRADED first.

Webhook Payload

When a status transition occurs, CallMeter sends an HTTP POST to the configured webhook URL with a JSON payload containing the probe ID, probe name, previous status, new status, timestamp, and the metric values that triggered the transition. See Webhooks for the full payload format and security details.

Consecutive Failure Behavior

Each probe execution independently evaluates health. If a probe alternates between HEALTHY and DEGRADED across successive executions, each transition triggers a webhook. This can produce noise for borderline thresholds.

To reduce noise from flapping thresholds:

  • Increase threshold margins: Set warning and critical thresholds farther apart so that minor fluctuations do not cause transitions
  • Adjust probe interval: A longer interval (30 or 60 minutes) reduces the frequency of evaluations and therefore the frequency of potential transitions
  • Use appropriate metric windows: The evaluation uses average metric values over the call duration, which naturally smooths short-term spikes

Status Pages

Probe health states power CallMeter's public status pages. When you enable a status page for a probe, the current health state is displayed publicly:

Health StateStatus Page Display
HEALTHYOperational (green)
DEGRADEDDegraded Performance (yellow)
UNHEALTHYMajor Outage (red)
UNKNOWNUnder Maintenance (gray)

Status pages update automatically on each health state transition. See Status Pages for configuration details.

Threshold Configuration Best Practices

Start with Industry Baselines

If you are unsure what thresholds to configure, start with industry standard baselines and adjust based on your environment:

MetricWarning ThresholdCritical ThresholdNotes
MOS3.83.0MOS below 3.0 indicates poor quality
Jitter30 ms80 msAbove 80ms significantly impacts audio
Packet Loss1%5%Above 5% makes conversation difficult
RTT150 ms300 msAbove 300ms causes noticeable delay

Calibrate to Your Baseline

Run several probe executions without thresholds to establish your environment's normal metric ranges. Set warning thresholds at 1.5 to 2 times your normal values and critical thresholds at 3 to 4 times your normal values. This approach catches genuine degradation without alerting on normal variation.

Separate Thresholds by Route

Different SIP routes have different baseline quality characteristics. A probe monitoring a local data center path will have lower normal jitter than a probe monitoring an international route. Configure thresholds per probe based on the expected quality of each monitored path.

Avoid Over-Monitoring

Configuring thresholds on too many metrics can cause false alerts because the "worst metric wins" evaluation becomes more sensitive with more metrics. Focus thresholds on the 3 to 5 metrics most relevant to your use case:

  • Voice quality monitoring: MOS, jitter, packet loss
  • Network path monitoring: RTT, packet loss, jitter
  • Capacity monitoring: Registration success, call setup time

Review and Adjust

Revisit threshold configuration monthly. As your SIP infrastructure evolves (new routes, capacity changes, codec updates), your baseline quality will shift. Adjust thresholds to match the new reality.

On this page