便捷 - 免费开源语音转文本应用

便捷 - 免费开源语音转文本应用
Handy – Free open source speech-to-text app

## Handy：你的私密离线语音转文本工具 Handy 是一款免费、开源且跨平台的桌面应用程序，用于离线语音转文本转录。它使用 Rust 和 React/TypeScript 构建，优先考虑隐私，通过本地处理音频——你的声音*绝不会*离开你的电脑。只需配置一个键盘快捷键，说话，Handy 就会将转录的文本粘贴到任何应用程序中。它利用 Whisper 模型（具有 GPU 加速）或 CPU 优化的 Parakeet V3，提供灵活性和性能。 Handy 被设计为高度可扩展和“可分叉”，鼓励社区贡献。虽然正在积极开发中，但目前已知存在一些限制，包括 Whisper 在某些系统上可能崩溃，以及对 Linux 上 Wayland 的支持有限（需要 `xdotool` 或 `wtype` 进行文本输入）。对于位于代理服务器后或具有网络限制的用户，提供手动模型安装。开发工作正在进行中，路线图侧重于调试、键盘改进和增强设置。在 GitHub 上查找详细说明并参与贡献！

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 Handy – 免费开源语音转文本应用 (github.com/cjpais) 6 分，由 tin7in 1小时前发布 | 隐藏 | 过去 | 收藏 | 2 条评论 dotancohen 12分钟前 [–] 看起来很有趣。它为什么需要GUI？回复 kristianp 2分钟前 | 父评论 [–] 为了让更多人使用它？回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

A free, open source, and extensible speech-to-text application that works completely offline.

Handy is a cross-platform desktop application built with Tauri (Rust + React/TypeScript) that provides simple, privacy-focused speech transcription. Press a shortcut, speak, and have your words appear in any text field—all without sending your voice to the cloud.

Handy was created to fill the gap for a truly open source, extensible speech-to-text tool. As stated on handy.computer:

Free: Accessibility tooling belongs in everyone's hands, not behind a paywall
Open Source: Together we can build further. Extend Handy for yourself and contribute to something bigger
Private: Your voice stays on your computer. Get transcriptions without sending audio to the cloud
Simple: One tool, one job. Transcribe what you say and put it into a text box

Handy isn't trying to be the best speech-to-text app—it's trying to be the most forkable one.

Press a configurable keyboard shortcut to start/stop recording (or use push-to-talk mode)
Speak your words while the shortcut is active
Release and Handy processes your speech using Whisper
Get your transcribed text pasted directly into whatever app you're using

The process is entirely local:

Silence is filtered using VAD (Voice Activity Detection) with Silero
Transcription uses your choice of models:
- Whisper models (Small/Medium/Turbo/Large) with GPU acceleration when available
- Parakeet V3 - CPU-optimized model with excellent performance and automatic language detection
Works on Windows, macOS, and Linux

Download the latest release from the releases page or the website
Install the application following platform-specific instructions
Launch Handy and grant necessary system permissions (microphone, accessibility)
Configure your preferred keyboard shortcuts in Settings
Start transcribing!

For detailed build instructions including platform-specific requirements, see BUILD.md.

Handy is built as a Tauri application combining:

Frontend: React + TypeScript with Tailwind CSS for the settings UI
Backend: Rust for system integration, audio processing, and ML inference
Core Libraries:
- whisper-rs: Local speech recognition with Whisper models
- transcription-rs: CPU-optimized speech recognition with Parakeet models
- cpal: Cross-platform audio I/O
- vad-rs: Voice Activity Detection
- rdev: Global keyboard shortcuts and system events
- rubato: Audio resampling

Handy includes an advanced debug mode for development and troubleshooting. Access it by pressing:

macOS: Cmd+Shift+D
Windows/Linux: Ctrl+Shift+D

Known Issues & Current Limitations

This project is actively being developed and has some known issues. We believe in transparency about the current state:

Major Issues (Help Wanted)

Whisper Model Crashes:

Whisper models crash on certain system configurations (Windows and Linux)
Does not affect all systems - issue is configuration-dependent
- If you experience crashes and are a developer, please help to fix and provide debug logs!

Wayland Support (Linux):

Limited support for Wayland display server
Requires wtype or dotool for text input to work correctly (see Linux Notes below for installation)

Text Input Tools:

For reliable text input on Linux, install the appropriate tool for your display server:

Display Server	Recommended Tool	Install Command
X11	`xdotool`	`sudo apt install xdotool`
Wayland	`wtype`	`sudo apt install wtype`
Both	`dotool`	`sudo apt install dotool` (requires `input` group)

X11: Install xdotool for both direct typing and clipboard paste shortcuts
Wayland: Install wtype (preferred) or dotool for text input to work correctly
dotool setup: Requires adding your user to the input group: sudo usermod -aG input $USER (then log out and back in)

Without these tools, Handy falls back to enigo which may have limited compatibility, especially on Wayland.

Other Notes:

The recording overlay is disabled by default on Linux (Overlay Position: None) because certain compositors treat it as the active window. When the overlay is visible it can steal focus, which prevents Handy from pasting back into the application that triggered transcription. If you enable the overlay anyway, be aware that clipboard-based pasting might fail or end up in the wrong window.
If you are having trouble with the app, running with the environment variable WEBKIT_DISABLE_DMABUF_RENDERER=1 may help
You can manage global shortcuts outside of Handy and still control the app via signals. Sending SIGUSR2 to the Handy process toggles recording on/off, which lets Wayland window managers or other hotkey daemons keep ownership of keybindings. Example (Sway):
```
bindsym $mod+o exec pkill -USR2 -n handy
```
pkill here simply delivers the signal—it does not terminate the process.

macOS (both Intel and Apple Silicon)
x64 Windows
x64 Linux

System Requirements/Recommendations

The following are recommendations for running Handy on your own machine. If you don't meet the system requirements, the performance of the application may be degraded. We are working on improving the performance across all kinds of computers and hardware.

For Whisper Models:

macOS: M series Mac, Intel Mac
Windows: Intel, AMD, or NVIDIA GPU
Linux: Intel, AMD, or NVIDIA GPU

For Parakeet V3 Model:

CPU-only operation - runs on a wide variety of hardware
Minimum: Intel Skylake (6th gen) or equivalent AMD processors
Performance: ~5x real-time speed on mid-range hardware (tested on i5)
Automatic language detection - no manual language selection required

Roadmap & Active Development

We're actively working on several features and improvements. Contributions and feedback are welcome!

Debug Logging:

Adding debug logging to a file to help diagnose issues

macOS Keyboard Improvements:

Support for Globe key as transcription trigger
A rewrite of global shortcut handling for MacOS, and potentially other OS's too.

Opt-in Analytics:

Collect anonymous usage data to help improve Handy
Privacy-first approach with clear opt-in

Settings Refactoring:

Cleanup and refactor settings system which is becoming bloated and messy
Implement better abstractions for settings management

Tauri Commands Cleanup:

Abstract and organize Tauri command patterns
Investigate tauri-specta for improved type safety and organization

Manual Model Installation (For Proxy Users or Network Restrictions)

If you're behind a proxy, firewall, or in a restricted network environment where Handy cannot download models automatically, you can manually download and install them. The URLs are publicly accessible from any browser.

Step 1: Find Your App Data Directory

Open Handy settings
Navigate to the About section
Copy the "App Data Directory" path shown there, or use the shortcuts:
- macOS: Cmd+Shift+D to open debug menu
- Windows/Linux: Ctrl+Shift+D to open debug menu

The typical paths are:

macOS: ~/Library/Application Support/com.pais.handy/
Windows: C:\Users\{username}\AppData\Roaming\com.pais.handy\
Linux: ~/.config/com.pais.handy/

Step 2: Create Models Directory

Inside your app data directory, create a models folder if it doesn't already exist:

# macOS/Linux
mkdir -p ~/Library/Application\ Support/com.pais.handy/models

# Windows (PowerShell)
New-Item -ItemType Directory -Force -Path "$env:APPDATA\com.pais.handy\models"

Step 3: Download Model Files

Download the models you want from below

Whisper Models (single .bin files):

Small (487 MB): https://blob.handy.computer/ggml-small.bin
Medium (492 MB): https://blob.handy.computer/whisper-medium-q4_1.bin
Turbo (1600 MB): https://blob.handy.computer/ggml-large-v3-turbo.bin
Large (1100 MB): https://blob.handy.computer/ggml-large-v3-q5_0.bin

Parakeet Models (compressed archives):

V2 (473 MB): https://blob.handy.computer/parakeet-v2-int8.tar.gz
V3 (478 MB): https://blob.handy.computer/parakeet-v3-int8.tar.gz

For Whisper Models (.bin files):

Simply place the .bin file directly into the models directory:

{app_data_dir}/models/
├── ggml-small.bin
├── whisper-medium-q4_1.bin
├── ggml-large-v3-turbo.bin
└── ggml-large-v3-q5_0.bin

For Parakeet Models (.tar.gz archives):

Extract the .tar.gz file
Place the extracted directory into the models folder
The directory must be named exactly as follows:
- Parakeet V2: parakeet-tdt-0.6b-v2-int8
- Parakeet V3: parakeet-tdt-0.6b-v3-int8

Final structure should look like:

{app_data_dir}/models/
├── parakeet-tdt-0.6b-v2-int8/     (directory with model files inside)
│   ├── (model files)
│   └── (config files)
└── parakeet-tdt-0.6b-v3-int8/     (directory with model files inside)
    ├── (model files)
    └── (config files)

Important Notes:

For Parakeet models, the extracted directory name must match exactly as shown above
Do not rename the .bin files for Whisper models—use the exact filenames from the download URLs
After placing the files, restart Handy to detect the new models

Step 5: Verify Installation

Restart Handy
Open Settings → Models
Your manually installed models should now appear as "Downloaded"
Select the model you want to use and test transcription

Check existing issues at github.com/cjpais/Handy/issues
Fork the repository and create a feature branch
Test thoroughly on your target platform
Submit a pull request with clear description of changes
Join the discussion - reach out at [email protected]

The goal is to create both a useful tool and a foundation for others to build upon—a well-patterned, simple codebase that serves the community.

MIT License - see LICENSE file for details.

Whisper by OpenAI for the speech recognition model
whisper.cpp and ggml for amazing cross-platform whisper inference/acceleration
Silero for great lightweight VAD
Tauri team for the excellent Rust-based app framework
Community contributors helping make Handy better

"Your search for the right speech-to-text tool can end here—not because Handy is perfect, but because you can make it perfect for you."