Applying resilient design to my own systems

When I teach a course in modern distributed systems design, I spend a lot of time on topics such as engineering for resilience and automating production processes. It turns out that applying those big topics in the small matters of my life is a big challenge.

Some big, good ideas

I teach the usual things for such courses, including:

Nothing especially surprising or novel, really.

Some local, small procedures

Within hours of finishing a class, I find myself in a situation in which one or more of the above principles apply:

and many other small-scale maintenance tasks.

Big ideas are ill-fitted to small slots

While uploading the backup, signing on to the important site, and all those other small things, I hear myself advising my students in class earlier that day. Really, shouldn’t I do in my own life the things I counselled them to do in their profession?

Most times though, the fit is awkward or even altogether impractical. The reasons vary:

  1. Possible but a lot of work: In the case of uploading lecture audio files, I am slowly increasing the automation. By the end of the semester, I might have it down to the absolute minimum of three clicks and a single command.
  2. Consumer tools lack the customization of production tools: The software I use to record lectures on my phone and the software I use to draw diagrams restrict my control over the names of files they produce. Production tools such as logging libraries allow filenames to be customized to a format that supports further automated processing whereas the consumer tools require manual filename adjustment to match conventions.
  3. Testing a full disaster response is impractical: It’s one thing to regularly test that I can restore individual files from backup (and I do) but it’s impractical to test a full machine restore. Though writing this post did suggest to me that I ought to try living an entire day solely using my earthquake preparedness supplies. That would probably demonstrate some gaps.
  4. Why seek out inconvenience? The sheer number of items I would have to test adds up to considerable effort. Do I want to pursue even more disruption in my daily activities?

So I muddle through

The principles of stress-testing and resilience engineering are powerful and worth considering but actually applying them to the smaller-scale processes of personal life often requires more resources, of time, money, space, or computing power, than is justified by the benefits. I adopt such principles as seem practical and reconsider my choices every so often. Not pursuing these principles in every applicable case, however small, is not hypocrisy, rather it is acknowledging that the principles apply at larger scales than much individual activity.