Suvamsh Shivaprasad

Man Computer Symbiosis

Sun, 29 Mar 2026 00:00:00 GMT

Humans spent tens of thousands of years evolving language. We can convey complex ideas in seconds just by talking. Then 150 years ago we invented the keyboard because we didn't have the technology to turn speech directly into writing. So we built this intermediary, compressing everything down to ten fingers hunting for letters on a grid. It made sense at the time. But AI speech-to-text now runs on-device, low latency, high accuracy. We finally have the technology to go straight from voice to text. So why are we still typing?

I use push-to-talk transcription at work every day. Hold a key, talk, release, text appears. When I wanted the same on my personal laptop, every option was paid. $10/month, $30 one-time. The AI model doing the work is free and open source, the audio APIs are built into macOS, the compute runs on my own hardware. Why am I paying someone to wrap a free model in an Electron app?

Building Screamer

Screamer is a free, open-source app that turns your voice into text instantly. Hold a key, speak, release, done.

Mentioned the problem at dinner with friends, everyone shrugged with a "it is what it is" look on their face, so I went home and started building.

Under the Hood: What Makes It Fast

Screamer is Rust calling into whisper.cpp, so everything compiles down to a native binary with no runtime overhead. On Apple Silicon, inference runs on the GPU via Metal acceleration. No Electron, no Python, no server round-trips. Just native code talking directly to your hardware. Here's what that enables:

Two warm pipelines that never block each other. One state for live preview, one for final transcription. The live worker polls every 350ms with a non-blocking try_lock(), skips if busy. Final transcription uses its own state and goes straight to paste.

// Live preview: skip if busy
transcriber.try_transcribe(&padded_samples) // returns Ok(None) if locked

// Final: always succeeds, separate state
StateAccess::Borrowed(guard) => guard,

Inference tuned for short utterances. audio_ctx is sized to the actual utterance, not the full model window. Rounded to 64-unit GPU-friendly boundaries with hardware-specific floors (256 Apple Silicon, 384 Intel). Greedy decoding, single segment, no timestamps.

fn recommended_audio_ctx(&self, samples: &[f32]) -> i32 {
    let required = ceil_div(samples.len(), AUDIO_CTX_SAMPLES_PER_UNIT) as i32;
    round_up_to_multiple(required.max(self.config.adaptive_audio_ctx_min), 64)
        .min(self.ctx.n_audio_ctx())
}

params = FullParams::new(SamplingStrategy::Greedy { best_of: 1 });
params.set_no_context(true);
params.set_single_segment(true);

The model never sees audio it doesn't need. Screamer finds the actual speech region in 20ms RMS frames, trims everything else, drops clips under 0.3s. Buffer is pre-allocated and reused.

fn trimmed_speech_range(samples: &[f32]) -> Option<Range<usize>> {
    let (start, end) = speech_activity_bounds(samples)?;
    Some(start.saturating_sub(1600)..(end + 1600).min(samples.len()))
}

if trimmed_len < 4800 { return; } // 0.3s @ 16kHz

The Future is Voice

Every sci-fi movie or game you've watched or played, Iron Man, Halo, Her, Star Wars, nobody is typing. People talk to machines. That's always been the vision and yet here we are in 2026 on a keyboard layout from 1873.

We speak roughly 3x faster than we type. Typing was never the way we were meant to talk to machines, it was just the only option we had. It's not anymore. Speech-to-text runs locally, in real time, with accuracy that would have been science fiction five years ago. Think about self-driving cars. We didn't rebuild the roads, we adapted the machine to the infrastructure we already had. Language is the same. We don't need to adapt ourselves to machines through keyboards. We can use our language directly. The machine should meet us where we are.

The models are free. The tools are free. The code is open source. Maybe the app should be too.

Human Resources

Sun, 22 Mar 2026 00:00:00 GMT

For years, tech products got built by three roles: engineers, designers, and PMs. Each owned a bottleneck. It worked, but it was slow.

That's over. An engineer can spin up a decent UI without a design review. A PM can vibe-code a prototype in an afternoon. Nobody's writing 10-page PRDs to get alignment before a single line of code gets written. The pipeline has collapsed. Amazon's two-pizza team rule always made intuitive sense. Now it looks like the only way to operate.

This is shaking people into two camps. There are folks who grabbed onto these tools and are moving at a pace that feels unfair. Roadmaps scoped for a quarter are getting blown through in weeks. Then there are the skeptics, or people who just haven't found their footing yet. The data says the gap is widening in real time.

I'm seeing this play out around me at companies large and small. Roles are melding. The engineer who can design, the PM who can ship code, the designer who can think in systems. The lines that used to define who does what are blurring fast, and expectations are shifting to match.

The wild part is where the ceiling moved. We used to ask "can we build this?" Now we assume yes. Whatever you can dream up, describe, and sketch out, you can probably ship. The question that actually matters is: what should we even be building? What's worth imagining? That's the hard part, and no tool automates your way out of it.

The bottleneck isn't engineering complexity anymore. It's imagination.

I called this with friends back in December 2025 and turns out I was underselling it. The frustrating thing is I have no receipts. Hot takes that occasionally land, and by the time they do I've got nothing to point to. That's a big part of why I'm writing more now. To actually document this stuff as it happens.

Suvamsh Shivaprasad - Initial Post

Sat, 06 May 2017 00:00:00 GMT

Here's my personal website. I intend to showcase things I build or am building on here.