Subject: Re: Tutorial: Windows/Android privacy de-googled STT optimized for speed
On Wed, 5/6/2026 10:37 PM, Maria Sophia wrote:
Maria Sophia wrote:
a. Low end Android => use HeliBoard + WhisperIME STT
b. High end Android => try the all-in-one Futo Keyboard
Testing takes time...
I'm having trouble with the tiny models in a noisy environment,
with the transcription taking too long or not working at all.
It seems the AGC on the mic is allowing too much noise to filter through.
First, I confirmed the small models are running by running adb logcat.
"testing testing 123"
Since, at home, you never need to touch the phone itself, from Windows:
adb shell logcat -c
adb shell "logcat -d -v tag WhisperEngineJava:D *:S"
--------- beginning of main
D/WhisperEngineJava: Model is
loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-tiny.en.tflite
D/WhisperEngineJava: Filters and Vocab are loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/filters_vocab_en.bin
D/WhisperEngineJava: Model is loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-tiny.en.tflite
D/WhisperEngineJava: Filters and Vocab are loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/filters_vocab_en.bin
Where the specgtrogram was too big for such as small sentence:
D/WhisperEngineJava: Calculating Mel spectrogram...
D/WhisperEngineJava: Mel spectrogram is calculated...!
D/WhisperEngineJava: output_len: 449
So to lower the mic sensitivity on the Samsung A32-5G, I ran:
adb shell settings put global call_noise_reduction 1
adb reboot
Re-run "testing, testing, 123"
adb shell logcat -c
adb shell "logcat -d -v tag WhisperEngineJava:D *:S"
--------- beginning of main
D/WhisperEngineJava: Model is loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-tiny.en.tflite
D/WhisperEngineJava: Filters and Vocab are loaded.../storage/emulated/0/Android/data/org.woheller69.whisper/files/filters_vocab_en.bin
D/WhisperEngineJava: Calculating Mel spectrogram...
D/WhisperEngineJava: Mel spectrogram is calculated...!
D/WhisperEngineJava: output_len: 449
D/WhisperEngineJava: Skipping token: 50257, word: [_SOT_] D/WhisperEngineJava: Detected language code: en
D/WhisperEngineJava: Skipping token: 50259, word: [_extra_token_50259] D/WhisperEngineJava: It is Transcription...
D/WhisperEngineJava: Skipping token: 50359, word: [_extra_token_50359] D/WhisperEngineJava: Skipping token: 50363, word: [_BEG_] D/WhisperEngineJava: Skipping token: 50413, word: [_TT_50] D/WhisperEngineJava: Skipping token: 50513, word: [_TT_150] D/WhisperEngineJava: Inference is executed...!
Drat. It's still 449.
If that doesn't work in noisy environments, then I'll have to bump up
to the next-sized model, which I think is the base model.
adb push whisper-base.en.tflite /storage/emulated/0/Android/data/org.woheller69.whisper/files/
adb shell "cp /storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-base.en.tflite /storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper.tflite"
adb shell "cp /storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-base.en.tflite /storage/emulated/0/Android/data/org.woheller69.whisper/files/whisper-tiny.en.tflite"
It missed the word "It's" in the picture.
[Picture] dsnote-ubu2504.gif
https://imgur.com/a/9VxuCCa
https://postimg.cc/CRrHVQXP
That's "dsnote" in Ubuntu using a Whisper model.
I read the text of the lines above, and the model
missed the "It's" on the recorded attempt. A
previous attempt was OK.
Microphone was a Blue Yeti. Which doesn't have AGC.
And the level wasn't all that high either, maybe
-24dBm or so. I recorded the microphone first in
Audacity, to see I had to hold the mike two inches
from my face to get a signal.
While the spec for the microphone claims a 20-20000Hz
response (which would be 3dB down at the ends),
it is clearly a "voice" microphone and it
cuts off the high frequencies. That's one of the reasons
the fans in the room didn't get picked up. So as far as
being a "live" mic, it's a bit of a "dull potato" as
mics go. But it does seem to give a decent result.
And when you "blast" the four lines above at the model,
then stop and wait for the conversion, it must have taken
at least 10-15 seconds to do the amount of text in the picture.
It "feels" slightly better, if you feed it a sentence at a time.
Feed it just a few words. It seems happier that way. Dragon
Naturally Speaking has nothing to worry about :-)
Paul
--- PyGate Linux v1.5.14
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)