Antigravity Apps
Back to Showcase
AI
Audio
Web
HuggingFace

NanoVoice

(Inspired from PicoVoice.ai) Using latest release of Google DeepMind FunctionGemma model to build a compact, self-contained Speech-to-Intent library.

I am so pumped to see this really working :)

About the project

Inspired by PicoVoice.ai Rhino, NanoVoice is built to validate a similar idea. It is a compact, self-contained speech recognition library that allows you to embed powerful natural language understanding into your web apps. No internet connection required. No API keys. Low latency.

It has been a transformative year for the Gemma family of models. FunctionGemma is a specialized version of Google's Gemma 3 270M model tuned for function calling.

Speech-to-Intent means converting voice commands to structured information that can be used to perform actions. For example, a voice command can be converted to a JSON object that can be used to perform an action.

I was not sure if this was a good idea, but I decided to build it anyway. I wanted to see if I could build a configurable, offline speech recognition library that works same as PicoVoice Rhino, it's not easy and same as PicoVoice Rhino, but it works.

My full respect to PicoVoice for building a great service, it's very expensive ($6000 Per Year), but it's worth it.

Gallery

NanoVoice - Gallery 1
1
NanoVoice - Gallery 2
2
NanoVoice - Gallery 3
3

Key Features

  • Works Offline: Perfect for kiosks, subways, or anywhere with spotty internet. Your app never breaks.
  • Zero Latency: No round-trip to the server. Intent recognition happens instantly on the user's device.
  • Hybrid Speech Engine: Switch intelligently between Online Speed (Web Speech API) and Offline Privacy (Whisper). Best of both worlds.

Tech Stack

Next.jsReactNodeJSPostgresGoogle GemmaTransformers.jsHuggingFaceWhisperGoogle FunctionGemmaChrome Speech API