skip to content
Home Prabal
← Notes

ClaudeDown: Is Claude Getting Dumber, or Is It Just You?

We built ClaudeDown.com over a weekend because we kept asking the same question every time Claude felt off - is it actually worse right now, or am I just tired?

Turns out a lot of people ask that. And with subjective things like model behavior, perceived quality degradation, and rate limits, there’s no objective metric to check. Company spokespersons aren’t always sharing complete or accurate information either. Remember when Anthropic emailed everyone saying new rate limits would “affect less than 5% of users”? Look at what people are actually saying. Crowdsourcing opinions becomes important when the official line doesn’t match the lived experience.

So we built a tracker. The goal is simple: maintain a crude but honest sentiment score of what’s going wrong and whether it’s going wrong right now. What are people complaining about? Is it more than usual? What are the most visible complaints saying?

The whole thing costs about $12/month to run. One Docker container on Railway. SQLite for the database, Litestream replicating to Cloudflare R2, DBOS for durable execution. No managed database, no Redis, no Postgres. The entire application state lives in a single file.

Every hour, a scheduled workflow pulls complaint tweets from Twitter’s API using a predefined search query. No LLMs are involved in the collection - it’s a keyword match. To estimate baseline mention volume without exhaustively counting every tweet (which would be expensive), we sample a couple of pages worth of recent Claude mentions, look at their timestamps - since they’re sorted by post time - and extrapolate the total volume for that time period. Cheap and reasonably accurate for monitoring purposes.

We use the estimated complaint and baseline counts to track the complaint-to-mention ratio. An LLM summarizes the complaint themes once per hour, and then the static frontend gets rebuilt. The site is a Next.js static export - no server-side rendering, no API calls at request time. Plain HTML.

It doesn’t measure actual model quality. A viral tweet from a big account can generate copycat complaints that have nothing to do with a real degradation. But when hundreds of unrelated people independently start complaining about the same thing at the same time, that signal is hard to ignore.