Speech Activity (VAD)
Understand the Voice Activity Detection speech ratio metric — how CallMeter measures the percentage of active speech in audio streams, and what it reveals about DTX and conversation patterns.
Speech Activity measures the percentage of time that contains active speech in an audio stream, as determined by Voice Activity Detection (VAD). A ratio of 100% means continuous speech with no pauses. A ratio of 30% means the speaker is talking about a third of the time, with the remaining 70% being silence or background noise.
Think of it as an answer to "how much of this call is actual talking?" In a normal phone conversation, people speak roughly 40-60% of the time due to natural pauses, listening, and turn-taking.
How It Works
CallMeter runs Voice Activity Detection on the decoded audio signal. VAD classifies each audio frame as either "speech" or "non-speech" based on energy, spectral characteristics, and temporal patterns. The speech activity ratio is the percentage of frames classified as speech within each measurement interval.
This ratio is closely tied to DTX (Discontinuous Transmission). When VAD detects silence, DTX-enabled codecs stop sending full audio packets and instead send occasional silence descriptors or nothing at all. This saves significant bandwidth — an important consideration for large-scale VoIP deployments.
No fixed thresholds
Speech activity has no good/warning/critical thresholds because the expected ratio depends entirely on the call scenario. A one-way announcement has near-100% speech. An interactive call typically shows 30-60%. What matters is whether the ratio matches your expectations.
Why It Matters
Speech activity reveals several important aspects of call behavior:
- DTX effectiveness — Low speech ratio during an active call means DTX is actively saving bandwidth. If speech ratio is low but bandwidth stays high, DTX may not be working.
- Test scenario validation — Confirms that audio media files are playing correctly. A call expected to have speech showing 0% activity indicates a media problem.
- Conversation pattern analysis — In bidirectional calls, comparing speech activity across endpoints reveals whether both sides are participating.
- Codec behavior — Some codecs handle the transition between speech and silence differently, and speech activity helps correlate these transitions with other quality metrics.
Common Causes of Unexpected Speech Activity
| Cause | Explanation |
|---|---|
| 0% activity on expected speech | Media file not playing, muted endpoint, or audio path broken |
| 100% continuous activity | Noise being classified as speech, or VAD disabled |
| Very low activity on interactive call | One-sided conversation, endpoint not transmitting |
| Wildly fluctuating activity | VAD instability, borderline noise levels confusing the detector |
| Activity mismatch between directions | One endpoint sending, other not receiving (media path issue) |
How to Fix It
- Verify media playback — If speech activity is 0% during a test, confirm that the audio source (media file or microphone) is actually producing output.
- Check DTX configuration — If speech activity is low but you see constant packet flow, DTX may be disabled. Enable it to realize bandwidth savings during silence.
- Validate noise environment — If VAD reports near-100% activity when the caller is not speaking, background noise may be triggering false speech detection.
- Compare send and receive — If the sender shows speech activity but the receiver does not, something in the media path is blocking or corrupting the audio.
- Correlate with audio levels — Cross-reference with Audio Signal Level to confirm that detected speech actually has meaningful audio content.
Related Metrics
- Audio Signal Level — Volume of speech during active segments; complements VAD ratio
- Audio Noise Level — Noise during non-speech segments; high noise can confuse VAD
- Comfort Noise Rate — Comfort noise insertions during silence, related to DTX behavior
- Send Bitrate — Bandwidth usage; correlate with speech activity to verify DTX
Audio Noise Level
Understand background noise level measurement in VoIP calls — how CallMeter measures ambient noise during silence, and what high noise floors mean for call intelligibility.
Comfort Noise Rate
Understand Comfort Noise insertion measurement in VoIP calls — how CallMeter tracks CNG events during silence, and what the presence or absence of comfort noise means for call quality.