<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
  <title>Suvamsh Shivaprasad</title>
  <link>https://suvamsh.com/blog/</link>
  <description>Personal website of Suvamsh Shivaprasad.</description>
  <language>en-us</language>
  <lastBuildDate>Sun, 29 Mar 2026 00:00:00 GMT</lastBuildDate>
  <atom:link href="https://suvamsh.com/rss.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom" />
<item>
  <title>Man Computer Symbiosis</title>
  <link>https://suvamsh.com/blog/man-computer-symbiosis/</link>
  <guid>https://suvamsh.com/blog/man-computer-symbiosis/</guid>
  <pubDate>Sun, 29 Mar 2026 00:00:00 GMT</pubDate>
  <description>Talking is the most natural interface we have.</description>
  <content:encoded><![CDATA[<p>Humans spent tens of thousands of years evolving language. We can convey complex ideas in seconds just by talking. Then 150 years ago we invented the keyboard because we didn't have the technology to turn speech directly into writing. So we built this intermediary, compressing everything down to ten fingers hunting for letters on a grid. It made sense at the time. But AI speech-to-text now runs on-device, low latency, high accuracy. We finally have the technology to <a href="https://en.wikipedia.org/wiki/Man%E2%80%93Computer_Symbiosis">go straight from voice to text</a>. So why are we still typing?</p>
<p><img src="https://suvamsh.com/images/jarvis.gif" alt=""></p>
<p>I use push-to-talk transcription at work every day. Hold a key, talk, release, text appears. When I wanted the same on my personal laptop, every option was paid. $10/month, $30 one-time. The AI model doing the work is free and open source, the audio APIs are built into macOS, the compute runs on my own hardware. Why am I paying someone to <a href="https://x.com/karpathy/status/1886192184808149383">wrap a free model in an Electron app</a>?</p>
<h2>Building Screamer</h2>
<p><a href="https://www.screamer.app/">Screamer</a> is a free, <a href="https://github.com/suvamsh/screamer">open-source</a> app that turns your voice into text instantly. Hold a key, speak, release, done.</p>
<p>Mentioned the problem at dinner with friends, everyone shrugged with a "it is what it is" look on their face, so I went home and started building.</p>
<p><img src="https://suvamsh.com/images/process.jpeg" alt="Five days from dinner to launch"></p>
<h3>Under the Hood: What Makes It Fast</h3>
<p>Screamer is Rust calling into whisper.cpp, so everything compiles down to a native binary with no runtime overhead. On Apple Silicon, inference runs on the GPU via Metal acceleration. No Electron, no Python, no server round-trips. Just native code talking directly to your hardware. Here's what that enables:</p>
<p><strong>Two warm pipelines that never block each other.</strong> One state for live preview, one for final transcription. The live worker polls every 350ms with a non-blocking <code>try_lock()</code>, skips if busy. Final transcription uses its own state and goes straight to paste.</p>
<pre><code class="hljs language-rust"><span class="hljs-comment">// Live preview: skip if busy</span>
transcriber.<span class="hljs-title function_ invoke__">try_transcribe</span>(&#x26;padded_samples) <span class="hljs-comment">// returns Ok(None) if locked</span>

<span class="hljs-comment">// Final: always succeeds, separate state</span>
StateAccess::<span class="hljs-title function_ invoke__">Borrowed</span>(guard) => guard,
</code></pre>
<p><strong>Inference tuned for short utterances.</strong> <code>audio_ctx</code> is sized to the actual utterance, not the full model window. Rounded to 64-unit GPU-friendly boundaries with hardware-specific floors (256 Apple Silicon, 384 Intel). Greedy decoding, single segment, no timestamps.</p>
<pre><code class="hljs language-rust"><span class="hljs-keyword">fn</span> <span class="hljs-title function_">recommended_audio_ctx</span>(&#x26;<span class="hljs-keyword">self</span>, samples: &#x26;[<span class="hljs-type">f32</span>]) <span class="hljs-punctuation">-></span> <span class="hljs-type">i32</span> {
    <span class="hljs-keyword">let</span> <span class="hljs-variable">required</span> = <span class="hljs-title function_ invoke__">ceil_div</span>(samples.<span class="hljs-title function_ invoke__">len</span>(), AUDIO_CTX_SAMPLES_PER_UNIT) <span class="hljs-keyword">as</span> <span class="hljs-type">i32</span>;
    <span class="hljs-title function_ invoke__">round_up_to_multiple</span>(required.<span class="hljs-title function_ invoke__">max</span>(<span class="hljs-keyword">self</span>.config.adaptive_audio_ctx_min), <span class="hljs-number">64</span>)
        .<span class="hljs-title function_ invoke__">min</span>(<span class="hljs-keyword">self</span>.ctx.<span class="hljs-title function_ invoke__">n_audio_ctx</span>())
}

params = FullParams::<span class="hljs-title function_ invoke__">new</span>(SamplingStrategy::Greedy { best_of: <span class="hljs-number">1</span> });
params.<span class="hljs-title function_ invoke__">set_no_context</span>(<span class="hljs-literal">true</span>);
params.<span class="hljs-title function_ invoke__">set_single_segment</span>(<span class="hljs-literal">true</span>);
</code></pre>
<p><strong>The model never sees audio it doesn't need.</strong> Screamer finds the actual speech region in 20ms RMS frames, trims everything else, drops clips under 0.3s. Buffer is pre-allocated and reused.</p>
<pre><code class="hljs language-rust"><span class="hljs-keyword">fn</span> <span class="hljs-title function_">trimmed_speech_range</span>(samples: &#x26;[<span class="hljs-type">f32</span>]) <span class="hljs-punctuation">-></span> <span class="hljs-type">Option</span>&#x3C;Range&#x3C;<span class="hljs-type">usize</span>>> {
    <span class="hljs-keyword">let</span> (start, end) = <span class="hljs-title function_ invoke__">speech_activity_bounds</span>(samples)?;
    <span class="hljs-title function_ invoke__">Some</span>(start.<span class="hljs-title function_ invoke__">saturating_sub</span>(<span class="hljs-number">1600</span>)..(end + <span class="hljs-number">1600</span>).<span class="hljs-title function_ invoke__">min</span>(samples.<span class="hljs-title function_ invoke__">len</span>()))
}

<span class="hljs-keyword">if</span> trimmed_len &#x3C; <span class="hljs-number">4800</span> { <span class="hljs-keyword">return</span>; } <span class="hljs-comment">// 0.3s @ 16kHz</span>
</code></pre>
<div style="position: relative; padding-bottom: 64.86161251504213%; height: 0;"><iframe src="https://www.loom.com/embed/4bcc1f5310f74bc887f27967c47c3103" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>The Future is Voice</h2>
<p>Every sci-fi movie or game you've watched or played, Iron Man, Halo, Her, Star Wars, nobody is typing. People talk to machines. That's always been the vision and yet here we are in 2026 on a keyboard layout from 1873.</p>
<p>We speak roughly 3x faster than we type. Typing was never the way we were meant to talk to machines, it was just the only option we had. It's not anymore. Speech-to-text runs locally, in real time, with accuracy that would have been science fiction five years ago. Think about self-driving cars. We didn't rebuild the roads, we adapted the machine to the infrastructure we already had. Language is the same. We don't need to adapt ourselves to machines through keyboards. We can use our language directly. The machine should meet us where we are.</p>
<p><img src="https://suvamsh.com/images/c3po.gif" alt=""></p>
<p>The models are free. The tools are free. The code is open source. Maybe the app should be too.</p>]]></content:encoded>
</item>
<item>
  <title>Human Resources</title>
  <link>https://suvamsh.com/blog/human-resources/</link>
  <guid>https://suvamsh.com/blog/human-resources/</guid>
  <pubDate>Sun, 22 Mar 2026 00:00:00 GMT</pubDate>
  <description>The bottleneck isn&apos;t engineering complexity anymore. It&apos;s imagination.</description>
  <content:encoded><![CDATA[<p>For years, tech products got built by three roles: engineers, designers, and PMs. Each owned a bottleneck. It worked, but it was slow.</p>
<p>That's over. An engineer can spin up a decent UI without a design review. A PM can <a href="https://x.com/karpathy/status/1886192184808149383">vibe-code</a> a prototype in an afternoon. Nobody's writing <a href="https://www.news.aakashg.com/p/ai-prd">10-page PRDs</a> to get alignment before a single line of code gets written. The pipeline has collapsed. <a href="https://aws.amazon.com/executive-insights/content/amazon-two-pizza-team/">Amazon's two-pizza team rule</a> always made intuitive sense. Now it looks like the only way to operate.</p>
<p>This is shaking people into two camps. There are folks who grabbed onto these tools and are moving at a pace that feels unfair. Roadmaps scoped for a quarter are getting blown through in weeks. Then there are the skeptics, or people who just haven't found their footing yet. <a href="https://circleci.com/blog/five-takeaways-2026-software-delivery-report/">The data says the gap is widening in real time</a>.</p>
<p>I'm seeing this play out around me at companies large and small. <a href="https://medium.com/@david.bennell/product-management-in-2025-dc3f1e1b4319">Roles are melding</a>. The engineer who can design, the PM who can ship code, the designer who can think in systems. The lines that used to define who does what are blurring fast, and expectations are shifting to match.</p>
<p>The wild part is where the ceiling moved. We used to ask "can we build this?" Now we assume yes. Whatever you can dream up, describe, and sketch out, <a href="https://techcrunch.com/2025/03/06/a-quarter-of-startups-in-ycs-current-cohort-have-codebases-that-are-almost-entirely-ai-generated/">you can probably ship</a>. The question that actually matters is: what should we even be building? What's worth imagining? That's the hard part, and no tool automates your way out of it.</p>
<p>The bottleneck isn't engineering complexity anymore. It's imagination.</p>
<p><em>I called this with friends back in December 2025 and turns out I was underselling it. The frustrating thing is I have no receipts. Hot takes that occasionally land, and by the time they do I've got nothing to point to. That's a big part of why I'm writing more now. To actually document this stuff as it happens.</em></p>]]></content:encoded>
</item>
<item>
  <title>Suvamsh Shivaprasad - Initial Post</title>
  <link>https://suvamsh.com/blog/site-launched/</link>
  <guid>https://suvamsh.com/blog/site-launched/</guid>
  <pubDate>Sat, 06 May 2017 00:00:00 GMT</pubDate>
  <description>Here&apos;s my personal website. I intend to showcase things I build or am building on here.</description>
  <content:encoded><![CDATA[<p>Here's my personal website. I intend to showcase things I build or am building on here.</p>]]></content:encoded>
</item>
</channel>
</rss>