AI
Audio
Web
HuggingFace
NanoVoice
(Inspired from PicoVoice.ai) Using latest release of Google DeepMind FunctionGemma model to build a compact, self-contained Speech-to-Intent library.
I am so pumped to see this really working :)
About the project
Inspired by PicoVoice.ai Rhino, NanoVoice is built to validate a similar idea. It is a compact, self-contained speech recognition library that allows you to embed powerful natural language understanding into your web apps. No internet connection required. No API keys. Low latency.
It has been a transformative year for the Gemma family of models. FunctionGemma is a specialized version of Google's Gemma 3 270M model tuned for function calling.
Speech-to-Intent means converting voice commands to structured information that can be used to perform actions. For example, a voice command can be converted to a JSON object that can be used to perform an action.
I was not sure if this was a good idea, but I decided to build it anyway. I wanted to see if I could build a configurable, offline speech recognition library that works same as PicoVoice Rhino, it's not easy and same as PicoVoice Rhino, but it works.
My full respect to PicoVoice for building a great service, it's very expensive ($6000 Per Year), but it's worth it.
It has been a transformative year for the Gemma family of models. FunctionGemma is a specialized version of Google's Gemma 3 270M model tuned for function calling.
Speech-to-Intent means converting voice commands to structured information that can be used to perform actions. For example, a voice command can be converted to a JSON object that can be used to perform an action.
I was not sure if this was a good idea, but I decided to build it anyway. I wanted to see if I could build a configurable, offline speech recognition library that works same as PicoVoice Rhino, it's not easy and same as PicoVoice Rhino, but it works.
My full respect to PicoVoice for building a great service, it's very expensive ($6000 Per Year), but it's worth it.
Gallery

Click to view full size
1

Click to view full size
2

Click to view full size
3
Key Features
- ✓Works Offline: Perfect for kiosks, subways, or anywhere with spotty internet. Your app never breaks.
- ✓Zero Latency: No round-trip to the server. Intent recognition happens instantly on the user's device.
- ✓Hybrid Speech Engine: Switch intelligently between Online Speed (Web Speech API) and Offline Privacy (Whisper). Best of both worlds.
Tech Stack
Next.jsReactNodeJSPostgresGoogle GemmaTransformers.jsHuggingFaceWhisperGoogle FunctionGemmaChrome Speech API
Keep Exploring
Want to keep exploring?
Here's another project you can jump into next.
Next project
Funnell
A simple yet powerful signal processing engine that turns Reddit discussions into actionable insights using AI.
AIWebReddit
Read next