Privacy-first voice transcription for macOS
At a Glance
*With tiny models on Apple Silicon
How It Works
Hold your hotkey to record, release to transcribe. Text appears at your cursor instantly. The entire pipeline runs on your Mac.
Press Option+Space to start recording from your microphone
Audio captured at 48kHz, resampled to 16kHz in real-time
Local Whisper model transcribes using Metal GPU acceleration
Remove fillers, clean punctuation, apply dictionary bias
Text inserted at cursor via clipboard, original clipboard restored
Features
Designed to be invisible until you need it. Runs in your menu bar, stays out of your way.
All audio processing happens on-device. No internet required for transcription. No data collection, no telemetry, no cloud APIs.
Leverages Apple Silicon's Metal API for hardware-accelerated inference. Transcription in under a second with small models.
From 31MB tiny to 2.9GB large-v3. Full precision and quantized variants (Q5, Q8). English-only models for extra speed.
Every transcription saved locally. Browse, copy, or delete individual entries. Never lose text if a paste fails.
Change your recording shortcut to any modifier+key combo. Capture UI with live key detection. Two configurable shortcuts.
Automatically strips "um", "uh", "er", "you know", "basically" and more. Smart pattern matching avoids false positives.
Add names, technical terms, and jargon that Whisper might not recognize. Biases the model toward your vocabulary.
Multilingual models auto-detect language. Supports English, Spanish, French, German, Japanese, Arabic, and 93 more.
Runs as a macOS accessory — no dock icon, no window clutter. Launches at login. Always ready when you are.
AI Models
Larger models are more accurate but slower. Quantized variants (Q5, Q8) reduce size with minimal quality loss. English-only models are faster for English.
31 – 75 MB
57 – 142 MB
181 – 466 MB
514 MB – 1.5 GB
547 MB – 2.9 GB
Why SantiSotto
Most transcription tools send your audio to the cloud. SantiSotto keeps everything on your device.
| Feature | SantiSotto | Otter.ai | macOS Dictation | Whisper (CLI) |
|---|---|---|---|---|
| 100% Local Processing | ✓ Yes | ✗ Cloud | ● Hybrid | ✓ Yes |
| No Account Required | ✓ Yes | ✗ Required | ✓ Yes | ✓ Yes |
| Works Offline | ✓ Yes | ✗ No | ● Limited | ✓ Yes |
| GPU Accelerated | ✓ Metal | N/A (cloud) | ✓ Yes | ● Manual setup |
| Hotkey Recording | ✓ Customizable | ✗ No | ● Fixed key | ✗ No |
| Auto-Paste at Cursor | ✓ Yes | ✗ Copy only | ✓ Yes | ✗ File output |
| Model Selection | ✓ 30 models | N/A | ✗ Fixed | ✓ Manual |
| Transcription History | ✓ Local | ✓ Yes | ✗ No | ✗ No |
| Filler Word Removal | ✓ Automatic | ● Basic | ✗ No | ✗ No |
| Free | ✓ Open source | ✗ Subscription | ✓ Yes | ✓ Open source |
| GUI / Easy Setup | ✓ Native app | ✓ Web app | ✓ Built-in | ✗ Terminal only |
Under the Hood
A modern, performant architecture built with native technologies.
TypeScript + Vite for fast, type-safe UI development. Lucide icon library for consistent, beautiful iconography. Pure CSS with light and dark mode support.
Rust for memory-safe, high-performance audio processing and transcription. Tauri v2 framework for native macOS integration with minimal overhead (~5MB binary).
whisper-rs (whisper.cpp bindings) with Metal feature flag for Apple Silicon GPU acceleration. Greedy sampling for lowest latency. Auto language detection for multilingual models.
cpal for cross-platform audio capture. Real-time 48kHz to 16kHz downsampling via sinc-based resampling with rubato. Zero-copy buffer management with privacy-first memory clearing.