# Regression Test Selection Twenty Years

Last year Regression Test Selection celebrated its twentieth year as a field of research. It was in 1993 that G Rothermel and MJ Harrold published their seminal paper on regression test selection. With Continuous Integration being the top agile practice, RTS remains important. How did scholars celebrate “regression test selection twenty years”?

[bibshow file=http://www.citeulike.org/bibtex/user/greger/]

## Definition of Regression Test Selection

Let’s start by defining regression test selection. It’s useful to agree on the topic, isn’t? The definition of regression test selection is usually given along these lines:

• Let $P$ be a version of a (software) Product.
• Let $T$ be a test suite for $P$.
• Let $P'$ be a new version of $P$.
• Let $T'$ be a new test suite.

Regression testing is the application of older test cases in $T \cap T'$ to $P'$ in order to insure that basically only new functionality is affected by the change. If supposedly unchanged parts stop working the product has regressed and a regression error has occured. Regression testing is the testing for regression errors. Note: What’s up with the $\cap T'$? Removing those tests that are no longer relevant is called Regression Test Reduction, cf [bibcite key=”citeulike:12907398″].

As the product grows with time, so does the test suite. A full regression test might require weeks of calendar time and several years of work time. It would seem to make sense to limit the test scope to the most important test cases in the test suite. (It would also make sense to run them in descending order of importance. This is called regression test prioritization and is a topic for a whole post of its’ own. [bibcite key=”citeulike:3852130″])

• Now, let $S \subset (T \cap T')$ be a subset of regression test cases such that
• $|S|\ll |T \cap T'|$ and that
• $faults(S,P') + \delta = faults(T \cap T',P')$

Or in other words a subset of the full regression test suite, which is significantly smaller than the original but finds almost as many faults when applied to the new program. I won’t go into how $S$ is selected here, but you can read my previous post on the topic of regression test selection approaches.

## A Brief History of Regression Test Selection

Source: ICST 2008

Mary Jean Harrold, co-creator of the field of regression test selection. Image from ICST 2008

In order to limit and define the field, I have used Google Scholar with the following search intitle:"regression test selection". The reason for that is to limit myself to a scope that is reasonable for a blog post. If you are interested in a more thourough systematic literature review, I recommend the work of EngstrÃ¶m et al. [bibcite key=”citeulike:5351564″]

Gregg Rothermel, co-creator of the field of regression test selection. Image from Spotlight on Gregg Rothermel.

In 1993 Gregg Rothermel and Mary Jean Harrold created Regression Test Selection (RTS) as an area of research with their seminal paper “A safe, efficient algorithm for regression test selection” [bibcite key=”citeulike:12904399″]. They remain the most influential authors in the field. They have authored or co-authored the seven most cited publications in the field [bibcite key=”citeulike:6902293,citeulike:423366,citeulike:12904403,citeulike:895461,citeulike:2283362,citeulike:12904399″].

So, it’s regression test selection twenty years. What has happened in the field during that time? After a slow start in the first ten years, the field of regression test selection has grown quite steadily in the last ten years.

## Regression Test Selection in 2013

I don’t know if there was a party or ceremony to celebrate regression test selection twenty years. Perhaps there was, perhaps there wasn’t. It doesn’t matter because the most important way science is celebrated is through more science, right?

While MJ Harrold passed away in 2013, Gregg Rothermel is still active in the field of regression test selection [bibcite key=”citeulike:12904857″]. The “regression test selection twenty years” publications followed a few themes:

• Evaluation of performance of regression test selection algorithms and implementations. [bibcite key=”citeulike:12904998,citeulike:12904995,citeulike:12904757,citeulike:12904756,citeulike:12904755,citeulike:12904750,citeulike:12904748″]
• Adaptation of regression test selection to new fields. [bibcite key=”citeulike:12904992, citeulike:12904864,citeulike:12904862″]
• New algorithms for regression test selection identification. [bibcite key=”citeulike:12904857,citeulike:12904862,citeulike:12904853,citeulike:12904756,citeulike:12904750,citeulike:12730774″]
• New algorithms for composing an optimal regression test selection test suite. [bibcite key=”citeulike:12904839,citeulike:12904752″]

Many of the fundamental problems of regression test selection seem to remain after twenty years. It is hard to link test cases to changes in the code. Even when the link is created, it is hard to efficiently calculate an efficient test suite. That is mainly because we’re basically dealing with an NP-complete, knapsack problem.

## Conclusion

The field of Regression Test Selection is still going strong after twenty years. While advances have been made, the holy grail remains to be found. While you’re waiting for it to be found, here is something to remember: Random regression test selection is always better than manual selection.

[/bibshow]

Image sources

## About Greger Wikstrand

Greger Wikstrand, Ph.D. M.Sc. is a TOGAF 9 certified enterprise architect with an interest in e-heatlh, m-health and all things agile as well as processes, methods and tools. Greger Wikstrand works as a consultant at Capgemini where he alternates between enterprise agile coaching, problem solving and designing large scale e-health services ...