Hyprvoice: Voice-Powered Typing for the Modern Linux Desktop

Hyprvoice is a lightweight, Wayland-native daemon that transforms your speech into instant text input across any application without the overhead of heavy software or legacy X11 hacks.

Voice-to-text on the Linux desktop has always been a second-class citizen.

If you want to type with your voice, you usually have to choose between two equally frustrating paths:

The Bloated Path: A heavy Electron app that eats a gig of RAM just to listen to your microphone.
The Legacy Path: A fragile chain of X11 scripts that break the second you move to a modern Wayland compositor.

I was tired of the friction. I wanted something that felt invisible: a tool that didn't live in a window, but in my workflow. So I built Hyprvoice.

The Invisible Workflow

The best tools are the ones you forget exist.

When I’m in a deep-work loop, the last thing I want to do is grab my mouse to click a "Record" button. With Hyprvoice, I tapped Super+R, said "create a new React component called UserProfile," tapped Super+R again, and watched the code appear at my cursor.

That's the core philosophy: Voice is an input pipeline, not an application.

Why Wayland Native?

Most existing solutions rely on XTest or other legacy protocols. On a modern stack (like Hyprland or GNOME on Wayland), these require XWayland translation layers that are prone to latency and weird focus bugs.

Hyprvoice is built from the ground up for the modern Linux stack:

Go-powered Daemon: A lightweight control plane that stays resident in memory with almost zero overhead.
PipeWire Integration: Direct, low-latency audio capture without the PulseAudio baggage.
Direct Injection: It hooks into the native Wayland input flow using wtype and ydotool.

Under the Hood: The Secret Sauce

The project is structured around a state-machine pipeline. This ensures that the audio frames are handled efficiently without leaking memory or getting stuck in a recording loop.

Because this entire pipeline is multithreaded through goroutines, it needs to be secure and predictable. The secret sauce behind that calm? Go's context everywhere. Every long-running operation and external call carries a deadline or cancel.

Toggle again and the entire pipeline unwinds. A timeout fires and recording stops; sockets close; goroutines exit. On shutdown, WaitGroups drain, contexts propagate, and the system cleans up predictably. Here's the pattern you'll find all over the codebase:

runCtx, cancel := context.WithTimeout(ctx, p.config.Recording.Timeout)
p.setCancel(cancel)
 
p.wg.Add(1)
go p.run(runCtx)

Modular and Fully Configurable

Everything in the pipeline is exposed to the user.

You can swap transcription "brains" without a restart—switching from OpenAI to Groq for sub-second latency.

Beyond the backend, you can fine-tune audio sample rates, customize desktop notification messages for every state, and manage a priority-based injection fallback chain.

This ensures that (in the following example) if direct typing via wtype or ydotool can't reach a specific application, the clipboard path quietly takes over and restores your original content in the background, keeping the system stable without you ever needing to touch the code.

[transcription]
provider = "groq-transcription" # Also supports "openai" and "groq-translation"
api_key = "gsk_..." 
language = "en"
model = "whisper-large-v3-turbo"
 
[injection]
backends = ["ydotool", "wtype", "clipboard"]
ydotool_timeout = "5s"
wtype_timeout = "5s"
clipboard_timeout = "3s"

and more...

Installation and Contributing

Hyprvoice is open-source and currently in Beta. If you're on Arch, you can install it directly from the AUR:

yay -S hyprvoice-bin

I'm looking for contributors to help with the whisper.cpp implementation and much more. If you want to help make voice typing on Linux a first-class experience, check out the repository.

View on GitHub