IROP keyboard

Eclipse Programmers Should Avoid the IROP Keys

In a brilliant and hilarious article Zeller, Zimmerman and Bird points out how easy it is to find correlations when mining software archives. In the article, their (mock) argument is that all program errors must enter the source code through the keyboard and thus certain keys introduce more errors. By statistical analysis of the Eclipse 3.0 source code they are able to determine that the keys IROP are extra error prone and that programmers should avoid the IROP keys!

No really, it’s not true that programmers should avoid the IROP keys

The IROP keyboard is like a normal keyboard but with the I, R, O and P keys removed since programmers should avoid the IROP keys.
Using the IROP keyboard will not reduce errors in Eclipse code by 73%!

They then move on to explain why this kind of research is fundamentally flawed. Yet, we see a lot of it everywhere… Not just in SBSE and related fields. I liked this article because it made me question my own work on correlation based regression test selection. But then, it is based on an algorithm by Zimmerman et al so should be free of at least this error.

Interestingly enough and in the same alert from Google Scholar, I find two articles, where the authors have performed correlational studies to determine fault prone features of software. Krishnan et al. and Bell, Ostrand and Weyukur have both determined that the level of change in a software artefact is a good predictor of fault proneness. What to make of that? I think there are at least three reasons for the finding:

  • Software testers work based on the same requirement delta as developers thus writing test cases for the same code as the programmers are testing, so if they do their job properly they should find errors in exactly the changed code.
  • Already tested and released code should have been thoroughly tested and thus not contain additional errors, at least not error that are found by the existing test cases which have already passed. Unless of course there is already a bug report and then that code would be changed again.
  • It is much more likely that you break the code by changing it rather than by not changing it, even though the latter is certainly also possible.

So how interesting are these results? How actionable are they?

References

  • [DOI] R. M. Bell, T. J. Ostrand, and E. J. Weyuker, “Does measuring code change improve fault prediction?,” in Proceedings of the 7th international conference on predictive models in software engineering, New York, NY, USA, 2011.
    [Bibtex]
    @inproceedings{citeulike:9846488,
    abstract = {Background: Several studies have examined code churn as a variable for predicting faults in large software systems. High churn is usually associated with more faults appearing in code that has been changed frequently. Aims: We investigate the extent to which faults can be predicted by the degree of churn alone, whether other code characteristics occur together with churn, and which combinations of churn and other characteristics provide the best predictions. We also investigate different types of churn, including both additions to and deletions from code, as well as overall amount of change to code. Method: We have mined the version control database of a large software system to collect churn and other software measures from 18 successive releases of the system. We examine the frequency of faults plotted against various code characteristics, and evaluate a diverse set of prediction models based on many different combinations of independent variables, including both absolute and relative churn. Results: Churn measures based on counts of lines added, deleted, and modified are very effective for fault prediction. Individually, counts of adds and modifications outperform counts of deletes, while the sum of all three counts was most effective. However, these counts did not improve prediction accuracy relative to a model that included a simple count of the number of times that a file had been changed in the prior release. Conclusions: Including a measure of change in the prior release is an essential component of our fault prediction method. Various measures seem to work roughly equivalently.},
    address = {New York, NY, USA},
    author = {Bell, Robert M. and Ostrand, Thomas J. and Weyuker, Elaine J.},
    booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
    citeulike-article-id = {9846488},
    citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020392},
    citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020392},
    doi = {10.1145/2020390.2020392},
    isbn = {978-1-4503-0709-3},
    keywords = {20111003-irop},
    location = {Banff, Alberta, Canada},
    posted-date = {2011-10-03 10:06:16},
    priority = {2},
    publisher = {ACM},
    series = {Promise '11},
    title = {Does measuring code change improve fault prediction?},
    url = {http://dx.doi.org/10.1145/2020390.2020392},
    year = {2011}
    }
  • [DOI] S. Krishnan, C. Strasburg, R. R. Lutz, and K. G. Popstojanova, “Are change metrics good predictors for an evolving software product line?,” in Proceedings of the 7th international conference on predictive models in software engineering, New York, NY, USA, 2011.
    [Bibtex]
    @inproceedings{citeulike:9846487,
    abstract = {Background: Previous research on three years of early data for an Eclipse product identified some predictors of failure-prone files that work well for that data set. Additionally, Eclipse has been used to explore characteristics of product line software in previous research. Aims: To assess whether change metrics are good predictors of failure-prone files over time for the family of products in the evolving Eclipse product line. Method: We repeat, to the extent possible, the decision tree portion of the prior study to assess our ability to replicate the method, and then extend it by including four more recent years of data. We compare the most prominent predictors with the previous study's results. We then look at the data for three additional Eclipse products as they evolved over time. We explore whether the set of good predictors change over time for one product and whether the set differs among products. Results: We find that change metrics are consistently good and incrementally better predictors across the evolving products in Eclipse. There is also some consistency regarding which change metrics are the best predictors. Conclusion: Change metrics are good predictors for failure-prone files for the Eclipse product line. A small subset of these change metrics is fairly stable and consistent across products and releases.},
    address = {New York, NY, USA},
    author = {Krishnan, Sandeep and Strasburg, Chris and Lutz, Robyn R. and Popstojanova, Katerina G.},
    booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
    citeulike-article-id = {9846487},
    citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020397},
    citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020397},
    doi = {10.1145/2020390.2020397},
    isbn = {978-1-4503-0709-3},
    keywords = {20111003-irop},
    location = {Banff, Alberta, Canada},
    posted-date = {2011-10-03 10:05:53},
    priority = {2},
    publisher = {ACM},
    series = {Promise '11},
    title = {Are change metrics good predictors for an evolving software product line?},
    url = {http://dx.doi.org/10.1145/2020390.2020397},
    year = {2011}
    }
  • [DOI] A. Zeller, T. Zimmermann, and C. Bird, “Failure is a four-letter word: a parody in empirical research,” in Proceedings of the 7th international conference on predictive models in software engineering, New York, NY, USA, 2011.
    [Bibtex]
    @inproceedings{citeulike:9846469,
    abstract = {Background: The past years have seen a surge of techniques predicting failure-prone locations based on more or less complex metrics. Few of these metrics are actionable, though. Aims: This paper explores a simple, easy-to-implement method to predict and avoid failures in software systems. The {IROP} method links elementary source code features to known software failures in a lightweight, easy-to-implement fashion. Method: We sampled the Eclipse data set mapping defects to files in three Eclipse releases. We used logistic regression to associate programmer actions with defects, tested the predictive power of the resulting classifier in terms of precision and recall, and isolated the most defect-prone actions. We also collected initial feedback on possible remedies. Results: In our sample set, {IROP} correctly predicted up to 74\% of the failure-prone modules, which is on par with the most elaborate predictors available. We isolated a set of four easy-to-remember recommendations, telling programmers precisely what to do to avoid errors. Initial feedback from developers suggests that these recommendations are straightforward to follow in practice. Conclusions: With the abundance of software development data, even the simplest methods can produce "actionable" results.},
    address = {New York, NY, USA},
    author = {Zeller, Andreas and Zimmermann, Thomas and Bird, Christian},
    booktitle = {Proceedings of the 7th International Conference on Predictive Models in Software Engineering},
    citeulike-article-id = {9846469},
    citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=2020395},
    citeulike-linkout-1 = {http://dx.doi.org/10.1145/2020390.2020395},
    doi = {10.1145/2020390.2020395},
    isbn = {978-1-4503-0709-3},
    keywords = {20111003-irop},
    location = {Banff, Alberta, Canada},
    posted-date = {2011-10-03 09:38:50},
    priority = {0},
    publisher = {ACM},
    series = {Promise '11},
    title = {Failure is a four-letter word: a parody in empirical research},
    url = {http://dx.doi.org/10.1145/2020390.2020395},
    year = {2011}
    }
  • [DOI] E. Engstrom, P. Runeson, and G. Wikstrand, “An empirical evaluation of regression testing based on Fix-Cache recommendations,” Software testing, verification, and validation, 2008 international conference on, pp. 75-78, 2010.
    [Bibtex]
    @article{citeulike:7426424,
    abstract = {Background: The fix-cache approach to regression test selection was proposed to identify the most fault-prone files and corresponding test cases through analysis of fixed defect reports. Aim: The study aims at evaluating the efficiency of this approach, compared to the previous regression test selection strategy in a major corporation, developing embedded systems. Method: We launched a post-hoc case study applying the fix-cache selection method during six iterations of development of a multi-million {LOC} product. The test case execution was monitored through the test management and defect reporting systems of the company. Results: From the observations, we conclude that the fix-cache method is more efficient in four iterations. The difference is statistically significant at alpha = 0.05. Conclusions: The new method is significantly more efficient in our case study. The study will be replicated in an environment with better control of the test execution.},
    address = {Los Alamitos, CA, USA},
    author = {Engstrom, Emelie and Runeson, Per and Wikstrand, Greger},
    citeulike-article-id = {7426424},
    citeulike-linkout-0 = {http://doi.ieeecomputersociety.org/10.1109/ICST.2010.40},
    citeulike-linkout-1 = {http://dx.doi.org/10.1109/icst.2010.40},
    doi = {10.1109/icst.2010.40},
    isbn = {978-0-7695-3990-4},
    journal = {Software Testing, Verification, and Validation, 2008 International Conference on},
    keywords = {20110817, 20111003-irop, 20131022},
    pages = {75--78},
    posted-date = {2010-07-08 01:34:23},
    priority = {0},
    publisher = {IEEE Computer Society},
    title = {An Empirical Evaluation of Regression Testing Based on {Fix-Cache} Recommendations},
    url = {http://dx.doi.org/10.1109/icst.2010.40},
    volume = {0},
    year = {2010}
    }

Image sources

3 thoughts on “Eclipse Programmers Should Avoid the IROP Keys”

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>