I mentioned that I’ve been contributing to my company’s blog and as such, haven’t been posting here as much recently. That said, I have been cranking out about a post a week so I wanted to link to those here.

Syncing Dev & Ops – The meetings & process that keeps Dev & Ops on the same page at Rally. I think Rally has a really strong Dev/Ops culture where Ops is well integrated into the Engineering process. We’re always trying to make it better, but it’s pretty good.

Post Event Retrospective (The Rally version of a Post-Mortem) – Part I , Part II , and Part III . This is the process that we use to understand an outage & make sure we capture accurately, all the events related to that outage and follow them through to completion. It’s a little different than what I’ve done in the past and we use it for events that go right as well as those that go wrong – you can learn from both.

The Value of Service Instrumentation, an Example - An example walkthrough of using the instrumentation we have at Rally to diagnose an issue. Rally has instrumented their systems to expose a very rich set of metrics about the state of their applications and this post talks about some of that.

Sorting out Deployment before building features - Why we figure out how to deploy an application before we actually build it. This was based on a recent change we made when building a new component (which we’ll be repeating again for a 2nd component) to automate deployment & have systems in production as soon as possible. As soon as there is some application scaffolding that can be deployed and tested, we do it. In this case, the app just returns “Hello World”, but it gets deployed every week.

