Eclipse Programmers Should Avoid the IROP Keys

Posted: October 3rd, 2011 | Author: | Filed under: Professionalism, Software Engineering | Tags: , , , | 3 Comments »

In a brilliant and hilarious article Zeller, Zimmerman and Bird points out how easy it is to find correlations when mining software archives. In the article, their (mock) argument is that all program errors must enter the source code through the keyboard and thus certain keys introduce more errors. By statistical analysis of the Eclipse 3.0 source code they are able to determine that the keys IROP are extra error prone and should be banned!

The IROP keyboard is like a normal keyboard but with the I, R, O and P keys removed.

Using the IROP keyboard will not reduce errors in Eclipse code by 73%!

They then move on to explain why this kind of research is fundamentally flawed. Yet, we see a lot of it everywhere… Not just in SBSE and related fields. I liked this article because it made me question my own work on correlation based regression test selection. But then, it is based on an algorithm by Zimmerman et al so should be free of at least this error.

Interestingly enough and in the same alert from Google Scholar, I find two articles, where the authors have performed correlational studies to determine fault prone features of software. Krishnan et al. and Bell, Ostrand and Weyukur have both determined that the level of change in a software artefact is a good predictor of fault proneness. What to make of that? I think there are at least three reasons for the finding:

  • Software testers work based on the same requirement delta as developers thus writing test cases for the same code as the programmers are testing, so if they do their job properly they should find errors in exactly the changed code.
  • Already tested and released code should have been thoroughly tested and thus not contain additional errors, at least not error that are found by the existing test cases which have already passed. Unless of course there is already a bug report and then that code would be changed again.
  • It is much more likely that you break the code by changing it rather than by not changing it, even though the latter is certainly also possible.

So how interesting are these results? How actionable are they?

References

  • [2011,inproceedings] bibtex Go to document
    R. M. Bell, T. J. Ostrand, and E. J. Weyuker, "Does measuring code change improve fault prediction?," in Proceedings of the 7th International Conference on Predictive Models in Software Engineering, New York, NY, USA, 2011.
    @inproceedings{citeulike:9846488, abstract = {Background: Several studies have examined code churn as a variable for predicting faults in large software systems. High churn is usually associated with more faults appearing in code that has been changed frequently. Aims: We investigate the extent to which faults can be predicted by the degree of churn alone, whether other code characteristics occur together with churn, and which combinations of churn and other characteristics provide the best predictions. We also investigate different types of churn, including both additions to and deletions from code, as well as overall amount of change to code. Method: We have mined the version control database of a large software system to collect churn and other software measures from 18 successive releases of the system. We examine the frequency of faults plotted against various code characteristics, and evaluate a diverse set of prediction models based on many different combinations of independent variables, including both absolute and relative churn. Results: Churn measures based on counts of lines added, deleted, and modified are very effective for fault prediction. Individually, counts of adds and modifications outperform counts of deletes, while the sum of all three counts was most effective. However, these counts did not improve prediction accuracy relative to a model that included a simple count of the number of times that a file had been changed in the prior release. Conclusions: Including a measure of change in the prior release is an essential component of our fault prediction method. Various measures seem to work roughly equivalently.},
      address = {New York, NY, USA},
      author = {Bell, Robert M. and Ostrand, Thomas J. and Weyuker, Elaine J.},
      booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
      citeulike-article-id = {9846488},
      citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020392},
      citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020392},
      doi = {10.1145/2020390.2020392},
      isbn = {978-1-4503-0709-3},
      keywords = {20111003-irop},
      location = {Banff, Alberta, Canada},
      posted-date = {2011-10-03 10:06:16},
      priority = {2},
      publisher = {ACM},
      series = {Promise '11},
      title = {Does measuring code change improve fault prediction?},
      url = {http://dx.doi.org/10.1145/2020390.2020392},
      year = {2011}
    }
  • [2011,inproceedings] bibtex Go to document
    S. Krishnan, C. Strasburg, R. R. Lutz, and K. G. Popstojanova, "Are change metrics good predictors for an evolving software product line?," in Proceedings of the 7th International Conference on Predictive Models in Software Engineering, New York, NY, USA, 2011.
    @inproceedings{citeulike:9846487, abstract = {Background: Previous research on three years of early data for an Eclipse product identified some predictors of failure-prone files that work well for that data set. Additionally, Eclipse has been used to explore characteristics of product line software in previous research. Aims: To assess whether change metrics are good predictors of failure-prone files over time for the family of products in the evolving Eclipse product line. Method: We repeat, to the extent possible, the decision tree portion of the prior study to assess our ability to replicate the method, and then extend it by including four more recent years of data. We compare the most prominent predictors with the previous study's results. We then look at the data for three additional Eclipse products as they evolved over time. We explore whether the set of good predictors change over time for one product and whether the set differs among products. Results: We find that change metrics are consistently good and incrementally better predictors across the evolving products in Eclipse. There is also some consistency regarding which change metrics are the best predictors. Conclusion: Change metrics are good predictors for failure-prone files for the Eclipse product line. A small subset of these change metrics is fairly stable and consistent across products and releases.},
      address = {New York, NY, USA},
      author = {Krishnan, Sandeep and Strasburg, Chris and Lutz, Robyn R. and Popstojanova, Katerina G.},
      booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
      citeulike-article-id = {9846487},
      citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020397},
      citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020397},
      doi = {10.1145/2020390.2020397},
      isbn = {978-1-4503-0709-3},
      keywords = {20111003-irop},
      location = {Banff, Alberta, Canada},
      posted-date = {2011-10-03 10:05:53},
      priority = {2},
      publisher = {ACM},
      series = {Promise '11},
      title = {Are change metrics good predictors for an evolving software product line?},
      url = {http://dx.doi.org/10.1145/2020390.2020397},
      year = {2011}
    }
  • [2011,inproceedings] bibtex Go to document
    A. Zeller, T. Zimmermann, and C. Bird, "Failure is a four-letter word: a parody in empirical research," in Proceedings of the 7th International Conference on Predictive Models in Software Engineering, New York, NY, USA, 2011.
    @inproceedings{citeulike:9846469, abstract = {Background: The past years have seen a surge of techniques predicting failure-prone locations based on more or less complex metrics. Few of these metrics are actionable, though. Aims: This paper explores a simple, easy-to-implement method to predict and avoid failures in software systems. The {IROP} method links elementary source code features to known software failures in a lightweight, easy-to-implement fashion. Method: We sampled the Eclipse data set mapping defects to files in three Eclipse releases. We used logistic regression to associate programmer actions with defects, tested the predictive power of the resulting classifier in terms of precision and recall, and isolated the most defect-prone actions. We also collected initial feedback on possible remedies. Results: In our sample set, {IROP} correctly predicted up to 74\% of the failure-prone modules, which is on par with the most elaborate predictors available. We isolated a set of four easy-to-remember recommendations, telling programmers precisely what to do to avoid errors. Initial feedback from developers suggests that these recommendations are straightforward to follow in practice. Conclusions: With the abundance of software development data, even the simplest methods can produce "actionable" results.},
      address = {New York, NY, USA},
      author = {Zeller, Andreas and Zimmermann, Thomas and Bird, Christian},
      booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
      citeulike-article-id = {9846469},
      citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020395},
      citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020395},
      doi = {10.1145/2020390.2020395},
      isbn = {978-1-4503-0709-3},
      keywords = {20111003-irop},
      location = {Banff, Alberta, Canada},
      posted-date = {2011-10-03 09:38:50},
      priority = {0},
      publisher = {ACM},
      series = {Promise '11},
      title = {Failure is a four-letter word: a parody in empirical research},
      url = {http://dx.doi.org/10.1145/2020390.2020395},
      year = {2011}
    }
  • [2010,article] bibtex Go to document
    E. Engstrom, P. Runeson, and G. Wikstrand, "An Empirical Evaluation of Regression Testing Based on Fix-Cache Recommendations," Software Testing, Verification, and Validation, 2008 International Conference on, pp. 75-78, 2010.
    @article{citeulike:7426424, abstract = {Background: The fix-cache approach to regression test selection was proposed to identify the most fault-prone files and corresponding test cases through analysis of fixed defect reports. Aim: The study aims at evaluating the efficiency of this approach, compared to the previous regression test selection strategy in a major corporation, developing embedded systems. Method: We launched a post-hoc case study applying the fix-cache selection method during six iterations of development of a multi-million {LOC} product. The test case execution was monitored through the test management and defect reporting systems of the company. Results: From the observations, we conclude that the fix-cache method is more efficient in four iterations. The difference is statistically significant at alpha = 0.05. Conclusions: The new method is significantly more efficient in our case study. The study will be replicated in an environment with better control of the test execution.},
      address = {Los Alamitos, CA, USA},
      author = {Engstrom, Emelie and Runeson, Per and Wikstrand, Greger},
      citeulike-article-id = {7426424},
      citeulike-linkout-0 = {http://doi.ieeecomputersociety.org/10.1109/ICST.2010.40},
      citeulike-linkout-1 = {http://dx.doi.org/10.1109/icst.2010.40},
      doi = {10.1109/icst.2010.40},
      isbn = {978-0-7695-3990-4},
      journal = {Software Testing, Verification, and Validation, 2008 International Conference on},
      keywords = {20110817, 20111003-irop},
      pages = {75--78},
      posted-date = {2010-07-08 01:34:23},
      priority = {0},
      publisher = {IEEE Computer Society},
      title = {An Empirical Evaluation of Regression Testing Based on {Fix-Cache} Recommendations},
      url = {http://dx.doi.org/10.1109/icst.2010.40},
      volume = {0},
      year = {2010}
    }

3 Comments on “Eclipse Programmers Should Avoid the IROP Keys”

  1. 1 Greger Wikstrand said at 18:55 on October 5th, 2011:

    This article http://www.technologyreview.com/computing/38775/ is just another example of scientists warning aboutcorrelation and data mining studies from large data sets.

  2. 2 Greger Wikstrand said at 23:23 on November 30th, 2011:

    As always, there is a Dilbert strip for this http://www.dilbert.com/strips/comic/2011-11-28/ .

  3. 3 Agile Project Manager » Blog Archive » Agile Developers Trust their Teams said at 09:28 on March 20th, 2012:

    [...] I wrote about the methodological challenges of correlation based research on software artefacts. The article which is the base for this post [...]


Leave a Reply