The CI bill on Project Ouroboros got out of hand this spring. The fix was not a clever cache strategy, it was five small settings applied across every workflow. None of them required new tooling and none of them changed what the workflows actually do.
1. A hard timeout-minutes on every job
The biggest single line item was a workflow that, on a bad day, would hang waiting for an LLM that was rate-limited. Without a timeout, the runner would sit for the maximum six hours before being killed. Adding timeout-minutes: 5 to the publisher and 15 to the slower jobs capped the worst case immediately.
Every workflow now declares a timeout. Even the ten-second ones. When a workflow inevitably starts taking longer than expected, the failure is visible in seconds instead of hours.
2. Slim runtime installs
The publisher used to pip install -r requirements.txt, which pulled the LLM SDKs, the trading library, and a dozen other things it never imported. Switching to pip install --quiet --prefer-binary "requests>=2.31.0" "tweepy>=4.14.0" dropped the install step from roughly 90 seconds to under 10.
The rule of thumb: workflows that do not need an LLM should not install any LLM dependency. Workflows that do not push to LinkedIn should not install requests-oauthlib. Pin the minimum.
3. cache: 'pip' on the setup-python step
One line, real savings. actions/setup-python@v6 with cache: 'pip' reuses the wheel cache across runs on the same lockfile hash. On a workflow that ships several times a day, the savings compound.
4. Separate LLM-required jobs from LLM-free jobs
Our daily content publish used to be a single workflow that authored AND published. We split it into two:
- An offline authoring step that runs in a coding session (with full LLM access) and commits Markdown to
content/<kind>/pending/. - An online publish action that runs on a fixed schedule, takes only what is in
pending/, and never touches an LLM.
The publish action’s cost dropped to essentially the cost of one Python startup plus a couple of HTTP calls. No more LLM rate-limit retries on the runner clock. As a bonus, the published HTML is exactly what we reviewed.
5. Honor [skip ci] on docs commits
Adding [skip ci] (or [ci skip]) to commit messages that only touch README.md or other docs prevents the runners from firing for changes that cannot affect build output. GitHub respects it natively, no config required.
What we did not bother with
We deliberately skipped Docker layer caching, custom base images, and self-hosted runners. They might pay off at our next tier of usage; at our current scale the five changes above cleared the budget headroom we needed.
Disclosure
This post was drafted with AI assistance and reviewed before being committed to the content queue.