Why does rebooting fix everything?

2018-04-20

This comes from a long career of software testing where I went through more than a dozen release cycles, noting which testing methods work best.

We’re all familar with the following troubleshooting steps when encountering a problem on our computers or phones:

Quit and relaunch an app if that’s where the problem lies
Log out and log back in (on a Mac)
Reboot

In fact, it’s often the first thing many people try. And for me, I reboot all my devices every morning.

More often than not, the problem goes away. Why is that? Well, a few possible things can explain it:

Sometimes caches can be cleared on reboot that were causing problems
Sometimes background processes were running that were affecting an app you were running
The overall system became constrained for resources and rebooting starts everything from scratch

But I’d offer another explanation: It’s because of how software is tested.

To minimize the number of possible variables that we need to analyze when we encounter a problem, we often end up doing the following, especially with automated tests:

Clean installs of builds (no baggage from stale caches and preferences, among other things)
Reboot the system first, then let it settle down, just to make sure nothing interferes with a test
Start tests by launching an app first, killing it, then launching it again to avoid first-time lauch issues or performance
Start with empty data sets or artificially created data sets

This makes for more easily reproducible bugs because everything is controlled. Especially with automated tests, you don’t want erratic results that can be caused by having a “dirty” system (i.e. hasn’t been rebooted in awhile). Then you end up spending more time fixes the tests that doing testing. (This is also a hint that manual testing may be more appropriate in this case).

When the bugs that are found in this manner get fixed, it does improves the quality of the product. But often, the bugs only exist in cases where people reboot and restart their apps regularly.

That’s why rebooting “fixes” things, because we fixed the things found by testing that involved rebooting first. Serious bugs that are less predictable and more difficult to reproduce often get pushed aside to make room for these more artificial issues.

There’s a place for the sterile approach in certain cases (e.g. performance testing), especially if there is parallel testing going on that is not-so-sterile. And there’s no harm in having some sterile tests and some real-world tests going on. If you go too far, perhaps you might miss clean launch failures, for example.

Reboot daily

My regular routine is to reboot all of my devices before I first use them in the morning. Try this and if you don’t notice a difference in a few months, I’d be surprised. I would expect:

Performance improvements
Less apps getting hung
Less bugs in general
Less problems with AirPlay especially

Summary

The advice I’d give people is:

Predominantly test on “real world” setups, using consumer means of updating software, and real user data. This makes it easier to find issues
When writing automated tests, try to use more realistic setups or at least mix in some of them with your more clean systems
For manual testing, use clean systems only for regression and to help hone the reproducible steps