Learning Curves for Software Engineers

Introduction

Learning curves are all about ongoing improvement. Managers and researchers noticed, in field after field, from aerospace to mining to manufacturing to writing, that stable processes improve year after year rather than remain the same. Learning curves describe these patterns of long-term improvement. Learning curves help answer the following questions.

How fast can you improve to a productivity of x?
What are the limitations to improvement?
Are aggressive goals achievable?

Learning Curves denote the relationship between unit cost and cumulative output in stable processes. They also denote the relationship between unit defect rates and cumulative output in stable processes.

Learning curves can be used as metaphors, even if they are not used to estimate productivity. Advice drawn from learning curves is more than common sense, it follows directly from the basics of learning curves and echos the everyday experiences of processes in many industries.

Figure 1: The classic learning curve shape, in Linear-Linear space. You may expect your unit cost to remain the same, but odds on, it will actually improve slowly.

The Log-Linear Pattern of Improvement

The most common pattern of improvement is defined by the Log-Linear equation: y = a (x) ^ n, as shown in Figure 2. The a represents the cost of the first unit. The n represents the slope of the curve in log-log space.

Figure 2: The Log-Linear Equation, shown in Log-Log space.

The Short-Term and Long-Term Phases

The graph is divided into two parts: the Ramp-Up Phase and the Production Phase. Stable processes in all industries have both short-term losses and long-term gains.

Developers seldom begin a project at full productivity. During the Ramp-Up phase, developers slowly reach full productivity. The shor-term productivity losses during this phase can dramatically affect the fiscal planning of a project. To estimate the short-term productivity losses, integrate the difference between the two curves.

Developers seldom stop improving. During the Production phase, developers slowly surpass full productivity. The improvements during this phase are often ignored, but they can produce substantial fiscal savings. To estimate the long-term productivity gains, integrate the difference between the two curves.

Figure 3: Short-Term Losses Occur During the Ramp-Up Phase and Long-Term Gains Occur During the Productivity Phase, shown in Log-Log space.

Maximize Stability

Stability is vital for improvement. Stability minimizes the short-term losses from restarting the process and maximizes the long-term gains from learning. So to improve faster, increase the stability of your process.

Minimize disruptions to the process. Changing products, moving offices, and upgrading tools and processes will all disrupt productivity.
Minimize employee turnover. Shifting workloads from one developer to another and training new project members will disrupt productivity.
Remove process bottlenecks. Bottlenecks that limit productivity also limit improvement.
Motivate workers consistently. When motivation lapses, productivity may stabilize or worsen.

Keep teams together on long-term projects. Longer projects give developers more time to learn and benefit from their learning.

Keep teams together throughout each project. Slowly adding or removing team members disrupts learning. When the team changes, the work loads will shift, which will disrupt productivity. And, developers who join the team late or leave the team early have less time to learn and benefit from their learning.
Give projects as much time as possible. When given a choice between sheduling more workers or more time, and all other factors are equal, take more time.
Combine many small projects into one large project. Small projects disrupt learning.


Figure 4: Two Views of Loss to Disruption. Disruptions include changing products, moving offices, upgrading tools, and employee turnover.	Figure 5: Two Views of Loss to Bottleneck. Bottlenecks include slow hardware, poor applications, and pokey employees.	Figure 6: Two Views of Loss to Inconsistent Motivation. Inconsistent motivation comes from poor management.

Rate of Improvement

The rate of improvement is not arbitrary, it is a function of the process itself. You cannot simply choose the rate of improvement for a process. To improve faster, you must change the process itself to make this possible, by removing limitations to improvement. This often requires a capital investment to improve tools and skills and remove the limitations inherent in the process. Of course, such an investment must genuinely improve the process and not just reshuffle the work or reflect wishful thinking.

Capital investments to improve a process.

Train workers to improve their skills.
Upgrade tools and infrastructure to enhance workers' productivity.

A Family of Equations

Of the dozens of mathematic concepts of learning curves, the four most important equations are:

Log-Linear: $y = a (x) ^ n$
Stanford-B: $y = a (x + b) ^ n$
DeJong: $y = a + b (x) ^ n$
S-Curve: $y = a + b (x + c) ^ n$

The Log-Linear equation is the simplest and most common equation and it applies to a wide variety of processes. The Stanford-B equation is used to model processes where experience carries over from one production run to another, so workers start out more productively than the asymtote predicts. The Stanford-B equation has been used to model airframe production and mining. The DeJong equation is used to model processes where a portion of the process cannot improve. The DeJong equation is often used in factories where the assembly line ultimately limits improvement. The S-Curve equation combines the Stanford-B and DeJong equations to model processes where both experience carries over from one production run to the next and a portion of the process cannot improve.

The Log-Linear equation has been shown to model future productivity very effectively. In some cases, the DeJong and Stanford-B equations work better. The S-Curve equation often models past productivity more accurately, but usually models future productivity less accurately, than the other equations.

Figure 7: The Four Main Equations, in Log-Log space.

Webliography

A variety of resources about learning curves exist across the WWW. The following links point to a few of the more interesting pages.

Stephen R. Lawrence's Learning Curve Micro-Tutorial

NASA's Learning Curve Calculator

Wikipedia's Learning Curves

Learning Curves

Learning Curves at Research Institutions

Other Learning Curve Pages

Learning Curves at the University of Buffalo
Learning Curves at Baarns Publishing (Search for the phrase "learning curve")
Learning Curve Computation in Matlab

Bibliography

Hundreds of papers and dozens of books have been written about learning curves. Any reasonable index at a research library refers to dozens of papers. Yet, I know of only two papers that address software engineers directly.

*** Adedji B. Badiru (1992) Computational Survey of Univariate and Multivariate Learning Curve Models, in Transactions on Engineering Management, Volume 39, Number 2, Pages 176 to 188, May 1992, IEEE Press.
*** Ahmed Riahi-Belkaoui (1986) The Learning Curve: A Management Accounting Tool, Quorum Books.
** R. W. Conway and Andrew Shultz, Jr. (1959) The Manufacturing Progress Function, in Journal of Industrial Engineering, Volume 10, Number 1, Pages 39 to 54, Jan-Feb 1959.
*** John G. Everett and Sheriff Farghal (1994) Learning Curve Predictors for Construction and Field Operations, in Journal of Construction Engineering and Management, Volume 120, Number 3, Pages 603 to 616, September 1994, American Society of Civil Engineers.
* Arthur Fries (1993) Discrete Reliability-Growth Models Based on a Learning-Curve Property, in Transactions on Reliability, Volume 42, Number 2, Pages 303 to 306, June 1993, IEEE Press.
* Natalie S. Glance, Tad Hogg, and Bernardo A. Huberman (1994) Training and Turnover in Organizations, in Working Notes of the AAAI 1994 Spring Symposium on Computational Agent Design, AAAI.
*** R. A. Harvey and D. R. Towill (1981) Applications of Learning Curves and Progress Functions: Past, Present, and Future, in Industrial Applications of Learning Curves and Progress Functions, Pages 1 to 15.
* Jacqueline Holdsworth (1994) Software Process Design: Out of the Tar Pit, McGraw Hill.
* Stan Kelly-Bootle (1995) The Computer Contradictionary, Second Edition, MIT Press.
**** Chris F. Kemerer (1992) How the Learning Curve Affects CASE Tool Adoption, in Software, Volume 9, Number 3, Pages 23 to 28, May 1992, IEEE Press.
* Stellan Ohlsson (1992) The Learning Curve For Writing Books: Evidence from Professor Asimov, in Psychological Science, Volume 3, Number 6, Pages 380 to 382, November 1992, American Psychological Society.
*** Gene Pierson (1981) Learning Curves Make Productivity Gains Predictable, in Engineering and Mining Journal, Volume 182, Number 8, Pages 56 to 64, August 1981, McGraw Hill.
**** L. B. S. Raccoon (1996) A Learning Curve Primer for Software Engineers, in Software Engineering Notes, Volume 21, Number 1, Pages 77 to 86, January 1996, ACM Press.
** T. P. Wright (1936) Factors Affecting the Cost of Airplanes, in Journal of Aeronautical Science, Volume 3, Number 2, February 1936.
*** (1981) Industrial Applications of Learning Curves and Progress Functions, Proceedings Number 52, December 1981, Institution of Electronic and Radio Engineers.

I rated each paper and book by the following scale.

**** = Learning Curves in Software Engineering
*** = Good Survey or Lots of References
** = Historically Important Paper
* = Other

Go to L. B. S. Raccoon's home page.