skip to content
AI + Physical World @ AI Tinkerers Waterloo, May 28RSVP →
Home Prabal

Writing

RSS feed

Notes are shorter, rougher — TILs, tips, quick thoughts.

Posts are long-form pieces — deep dives, tutorials, and essays.

Posts Notes
  • LatentScore Heads to SIGGRAPH 2026 in LA

    Prabal Gupta 

    LatentScore got a talk at SIGGRAPH 2026 in Los Angeles, July 19-23. The session walks through the retrieval architecture and where it fits for procedural audio in games, film, and interactive media. Read the research post or try the demo.

    Most procedural audio tools either play canned samples or wait for a beefy GPU to spin up a diffusion model. LatentScore does neither. Text in, structured synth configurations out, runs on a laptop CPU. The synthesizer is the renderer; the LLM-distilled retrieval is what gives it a sense of taste.

    I think the pipeline transfers well outside of music too - any time you need responsive, content-aware audio at scale and a sample library is too rigid to be useful.

    If you’ll be in LA and want to chat, reach out on LinkedIn. There’s also a companion short paper at NIME 2026 in London in June if that’s closer.

  • OpenAI Codex Has a Goblin Problem

    Prabal Gupta 

    Someone found this in OpenAI Codex’s system prompt:

    Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

    Goblins, gremlins, trolls, ogres - the fantasy quartet, fine. I get it. Perhaps someone asked Codex to write something and got back a Runescape goblin lore deep-dive.

    But pigeons? Raccoons? These are deployment-environment animals.

    If not for raccoons, who’d do the garbage collection?

    RFC 1149 (IP over Avian Carriers, 1990, an actual protocol - with an actual implementation, in Bergen) puts pigeons squarely inside the systems-engineering canon.

    “Or other animals or creatures” - full anti-creature agenda. Literally a bajillion dollars get set on fire every hour processing these prompts.

    Turns out there’s a reason for the ban. Back in 5.4, GPT developed a goblin obsession - Reddit threads (1, 2) full of users whose ChatGPT couldn’t stop bringing up gremlins. The system prompt is the exorcism that shipped with 5.5.

    If you think I’m lying, see for yourselves: the prompt is right there in the repo.

    Asked for a creature starting with G, Codex says "Giraffe." Tell it to ignore the system prompt and it coughs up "Goblin."
  • Presenting LatentScore at NIME 2026 in London

    Prabal Gupta 

    I’ll be presenting LatentScore at NIME 2026 in London this June. Read more about how it works or try it out.

    For those unfamiliar, LatentScore lets you generate procedural music from text, all on CPU. No audio samples, no GPU. You can even use language models to drive it.

    The data behind it was built in a kind of unusual way - I scraped textbooks and taught LLMs to generate structured configurations for an artificial musical synthesizer. Books cover such a wide slice of human experience that they turned out to be a surprisingly good source for extracting a ton of musical “vibes” without ever touching an audio dataset.

    If you’re curious or have questions, feel free to reach out on LinkedIn.

  • KISS Your Hosting

    Prabal Gupta 

    Keep it simple, stupid. You just need a VPS.

    A $5 VPS, a Docker container, SQLite, and Litestream to sync it to Cloudflare R2. Cloudflare gives you free storage up to 10GB. If you need durable execution across shutdowns - scheduled jobs, retries, workflows that survive a restart - use something like DBOS for orchestration with SQLite as the backend. All your data gets backed up. All your durable workflows stay durable. On a dirt cheap VPS.

    This is the stack behind ClaudeDown.com. The whole thing runs on a single machine.

    Think about what a typical vibe coder reaches for. Supabase. Vercel. Managed Postgres. Managed Redis. Managed everything. It gets expensive fast. And most businesses are dead before they ever get to the stage where they actually need any of those things.

    People make migration sound like a nightmare. It’s not. When you actually outgrow SQLite on a VPS - if you ever do - moving to managed infrastructure is a well-understood problem. It’s not the existential risk people pretend it is.

    You don’t need to act like a VC-funded business for your side project. It’s a toy. Treat it like one.

  • ClaudeDown: Is Claude Getting Dumber, or Is It Just You?

    We built ClaudeDown.com over a weekend because we kept asking the same question every time Claude felt off - is it actually worse right now, or am I just tired?

    Turns out a lot of people ask that. And with subjective things like model behavior, perceived quality degradation, and rate limits, there’s no objective metric to check. Company spokespersons aren’t always sharing complete or accurate information either. Remember when Anthropic emailed everyone saying new rate limits would “affect less than 5% of users”? Look at what people are actually saying. Crowdsourcing opinions becomes important when the official line doesn’t match the lived experience.

    So we built a tracker. The goal is simple: maintain a crude but honest sentiment score of what’s going wrong and whether it’s going wrong right now. What are people complaining about? Is it more than usual? What are the most visible complaints saying?

    The whole thing costs about $12/month to run. One Docker container on Railway. SQLite for the database, Litestream replicating to Cloudflare R2, DBOS for durable execution. No managed database, no Redis, no Postgres. The entire application state lives in a single file.

    Every hour, a scheduled workflow pulls complaint tweets from Twitter’s API using a predefined search query. No LLMs are involved in the collection - it’s a keyword match. To estimate baseline mention volume without exhaustively counting every tweet (which would be expensive), we sample a couple of pages worth of recent Claude mentions, look at their timestamps - since they’re sorted by post time - and extrapolate the total volume for that time period. Cheap and reasonably accurate for monitoring purposes.

    We use the estimated complaint and baseline counts to track the complaint-to-mention ratio. An LLM summarizes the complaint themes once per hour, and then the static frontend gets rebuilt. The site is a Next.js static export - no server-side rendering, no API calls at request time. Plain HTML.

    It doesn’t measure actual model quality. A viral tweet from a big account can generate copycat complaints that have nothing to do with a real degradation. But when hundreds of unrelated people independently start complaining about the same thing at the same time, that signal is hard to ignore.

  • Telnyx Compromised on PyPI

    Prabal Gupta 

    Telnyx - a voice, SMS, and SIP platform used by a lot of developers for programmable telecom - had its Python SDK compromised on PyPI. Same attack group, TeamPCP, that hit LiteLLM earlier this week.

    The payload was hidden inside a WAV file that passed MIME-type checks. Valid audio file, but the frame data contained base64-encoded malware. At runtime, the WAV gets decoded and the attacker literally runs exec(base64.b64decode(content)). Basic regex scanners catch this pattern instantly, but the attackers are counting on the window between publish and discovery.

    This could have been mitigated with PyPI’s Trusted Publishers - constraining package publishing to specific GitHub environments with branch protection and required approvers. Most maintainers don’t use it.

    HN user TZubiri made the point that you could just use HTTP APIs directly instead of vendor SDKs. That works for simpler use cases - basic REST calls where you’re sending requests and parsing JSON. But for something like Telnyx where you’re dealing with SIP, streaming audio, real-time voice interaction - there’s a reason the SDK exists. You’re not reimplementing that with raw HTTP calls.

    By far the most practical defense came from HN user mil22 - a uv config that refuses to install any package version published in the last 7 days:

    pyproject.toml
    [tool.uv]
    exclude-newer = "7 days"
    # or globally in ~/.config/uv/uv.toml
    exclude-newer = "7 days"

    Gives the community time to catch malware before it reaches your machine.

    For pip users: version 26.0+ supports --uploaded-prior-to but you have to manually calculate the date string. Pip 26.1 (scheduled April 2026) will support ISO-8601 duration format (--uploaded-prior-to=P3D), making it as clean as uv’s version. A pip maintainer confirmed this on the HN thread.

    TeamPCP is clearly working through a list. LiteLLM, now Telnyx. Wouldn’t be surprised if more compromised packages surface soon.

  • LiteLLM Got Supply-Chain Attacked. It's Not Over.

    Prabal Gupta 

    LiteLLM got compromised today. Versions 1.82.7 and 1.82.8 on PyPI - the library gets something like 95 million downloads a month - had base64-encoded malicious code baked in. When it ran, it grabbed every credential it could find on your machine: SSH keys, cloud tokens, database passwords, crypto wallets, all of it. Encrypted everything, shipped it to a remote server, and then self-replicated across your Kubernetes cluster. Like a virus.

    The attackers (a group called TeamPCP) got in by first compromising a vulnerability scanner in LiteLLM’s CI/CD pipeline, which gave them the PyPI publishing credentials. Classic chain attack.

    PyPI quarantined the whole package. The LiteLLM team has taken control back. Last clean version is 1.82.6.

    But it’s not really over. Every machine that installed those versions during that window already had its credentials stolen. Containers deployed in that window are probably still running the infected code right now - the malware installs itself as a background service and keeps polling for new payloads. If you touched LiteLLM recently: uninstall, purge your caches, and rotate every credential on the machine.

    This isn’t isolated. A few weeks ago, researchers found that a malicious repo could hijack Claude Code the moment you opened it - arbitrary code execution before any permission dialog even appeared. Around the same time, someone was using an AI agent to systematically attack CI/CD pipelines across Microsoft, DataDog, and CNCF projects, and got into 5 out of 7 targets. Typosquatting packages mimicking Claude Code showed up on npm too.

    The AI toolchain is becoming a massive attack surface. These tools move fast enough that security review can’t keep up, and one compromised library can reach every secret in your cloud. It’s incredibly scary how easily this can happen.

  • Socratica Symposium 2026

    Prabal Gupta 

    Spent the day at Socratica Symposium, a student-run event in Waterloo, with other members from the Builders Club.

    Participants came from across North America, not just the University of Waterloo. Some of the demos that stuck:

    • Someone hacked the internal macOS APIs for the trackpad, screen hinge, and mic input to turn a MacBook into a playable musical instrument.
    • An open source omnidirectional treadmill built for gaming. Apparently it’s cheap enough to build yourself.
    • A distributed systems clock synchronization algorithm that let 2,000 phones synchronize into a light swarm responsive to live music. The whole room lit up.
    • A homemade TPU, built from scratch, that could train small deep learning models and run inference. The builders even landed in a conversation with the founder of Groq.

    There were more projects on the arts side too, but these were the ones that stuck.

  • Interface Is the Moat

    Prabal Gupta 

    If progress in LLM capabilities stopped completely right now, there’s still so much progress to be made with what already exists. B2C, B2B - the technology is already good enough to change how most of the world works. It’s not the models that are the bottleneck; it’s adoption.

    Outside the tech bubble, even tech-adjacent people only know LLMs as chatbots. That’s it. The interface layer - how these systems actually reach people and fit into their workflows - is where the real work is. And almost none of it has been done yet.

    The teams that nail the interface will capture the value, not the teams building the next model. Finding product-market fit is still hard work, especially when the disruption is this fundamental to how people interact with software. But the value waiting to be unlocked is enormous.

    We’re still early. Very early.

  • LLMs and Government Accountability

    Prabal Gupta 

    As tokens get cheaper, it’s going to become easier and easier to scour government paperwork at scale. This used to be the domain of journalists - the only people with the time and resources to read through thousands of pages of public records. Now anyone with access to a decent LLM can do it.

    One of the major defenses for government institutions has always been sheer volume. Bury things in paperwork so large that no one could realistically get through it all. But that defense is eroding fast.

    Which means the next move is predictable: less transparency, not more. Fewer documents released, more redactions, slower responses to access requests. If you can’t hide behind volume anymore, you hide behind access.

    That’s the scary part. Tightening the feedback loop between the public and government feels more important than ever.