Unix and Linux administrators have historically relied on 5-, 10- and 15-minute load averages reported by utilities like uptime, w or procinfo to get a feel of how loaded a system is. For example, here is the output of uptime on one of my servers:
[pankaj@pankaj-k pankaj]$ uptime 15:30:16 up 17 days, 23:13, 1 user, load average: 0.22, 0.27, 0.29
But what exactly are these numbers? how are they computed? How should one interpret these numbers -- in terms of deciding what is good and what is bad (time to upgrade or add more CPUs)? Are these numbers always between 0 or 1?
One can only surmise that a lower value implies that the CPU is less loaded than when the value is higher. Though helpful, this simple interpretation is not always satisfactory.
Man entries for the above mentioned utilities are not always very helpful, but some are more insightful than others. The best I have seen so far is in Ubuntu (Debian) man page:
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
But wait ... this is not all. There is a lot more to know about UNIX/Linux load averages. I found the following two entries, which also happen to be first two entries in the Google search result for "load average", though in reverse order:
- TeamQuest article on UNIX Load Average by Dr. Neil Gunther.
- Wikipedia entry on Load (Computing).
The most interesting part is about the exponential dampening of the averages!