Forum: d0p3 BBS

How to properly use py-webrtcvad?

From marc nicole@3:633/280.2 to All on Thu Jan 23 08:54:12 2025

Hi,

I am getting audio from my mic using PyAudio as follows:

self.stream = audio.open(format=self.FORMAT,

channels=self.CHANNELS,
rate=self.RATE,
input=True,
frames_per_buffer=self.FRAMES_PER_BUFFER,
input_device_index=1)

then reading data as follows:

for i in range(0, int(self.RATE / self.FRAMES_PER_BUFFER *

self.RECORD_SECONDS)):
data = self.stream.read(4800)

on the other hand I am using py-webrtcvad as follows:

self.vad = webrtcvad.Vad()

and want to use *is_speech*() using audio data from PyAudio.
But getting the error:

return _webrtcvad.process(self._vad, sample_rate, buf, length)

Error: Error while processing frame

no matter how I changed the input data format (wav: using
speech_recognition's *get_wav_data*(), using numpy...)

Any suggestions (using Python 2.x)?
Thanks.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: ---:- FTN<->UseNet Gate -:--- (3:633/280.2@fidonet)

From Stefan Ram@3:633/280.2 to All on Sun Jan 26 21:37:40 2025

marc nicole <mk1853387@gmail.com> wrote or quoted:

return _webrtcvad.process(self._vad, sample_rate, buf, length)

Error: Error while processing frame

(I was not able to check the following tips myself!
So, please read them as a mere wild guess!)

That error you're running into - it's possibly because the
audio format webrtcvad wants isn't jiving with what you're
feeding it. Let me break it down for you:

WebRTC VAD is picky about its audio, like a foodie at a farmers
market:

- It wants 16-bit mono PCM, nothing fancy

- Sample rates got to be 8000, 16000, 32000, or 48000 Hz

- Frame durations should be 10, 20, or 30 ms, like clockwork

Tweak your PyAudio setup like you're fine-tuning a classic car:

Python

self.FORMAT = pyaudio.paInt16
self.CHANNELS = 1
self.RATE = 16000
self.FRAMES_PER_BUFFER = 480 # 30 ms at 16000 Hz, smooth as a SoCal highway

Give your audio reading loop a makeover:

Python

for i in range(0, int(self.RATE / self.FRAMES_PER_BUFFER * self.RECORD_SECONDS)):
data = self.stream.read(self.FRAMES_PER_BUFFER)
is_speech = self.vad.is_speech(data, self.RATE)

Make sure your audio data is on point:

Python

import numpy as np

# Turn that audio data into a numpy array, like magic
audio_array = np.frombuffer(data, dtype=np.int16)

# If it's not mono, make it mono - no stereo allowed at this party
if self.CHANNELS > 1:
audio_array = audio_array[::self.CHANNELS]

# Back to bytes it goes
audio_bytes = audio_array.tobytes()

is_speech = self.vad.is_speech(audio_bytes, self.RATE)

Crank up that VAD aggressiveness:

Python

self.vad = webrtcvad.Vad(3) # 3 is as aggressive as LA traffic

(Just remember to adjust your sample rate and frame duration
to fit your needs.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Stefan Ram (3:633/280.2@fidonet)

Who's Online
Recent Visitors
- RufusT
  Tue Feb 17 12:05:47 2026
  from Dallas, TX via RLogin
- RufusT
  Wed Feb 18 01:50:00 2026
  from Dallas, TX via RLogin
- RufusT
  Wed Feb 25 05:02:02 2026
  from Dallas, TX via RLogin
- The God Farther
  Mon Mar 9 13:05:07 2026
  from Lake Ozark, Missouri. via Telnet

System Info

Sysop:	Tetrazocine
Location:	Melbourne, VIC, Australia
Users:	15
Nodes:	8 (0 / 8)
Uptime:	195:40:46
Calls:	208
Files:	21,502
Messages:	84,362

How to properly use py-webrtcvad?

Who's Online

Recent Visitors

System Info