Effectiveness of Performance Bisection

The goal of this empirical study is to analyze effectiveness of bisection - a common technique used to localize software bugs - when applied to software performance regressions. An effectiveness measure was derived to aid in this analysis, and to provide a way for developers to rigorously assess the suitability of bisection for their specific use case. By conducting this study, several contributing properties were identified that contribute to this effectiveness, and these contributing properties were analyzed alongside 300+ performance regression bug reports to help us understand these properties' real-world manifestations.

Journal Paper (EMSE'22): On the effectiveness of bisection in performance regression localization

Motivation

Bisection is often used to localize functional regressions, and is intuitively suitable for such problems because functional regressions tend to be mostly monotonic (i.e., the functionality flips from "working" to "not working" at one well-defined point in the range of commits) and binary (i.e., either the regression is present in a commit, or it is not). Developers also use bisection to localize software performance regressions, but the problem is that such regressions are neither binary nor monotonic due to the noise inherent in performance measurements. Unfortunately, bisections tend to be quite costly, and running an ineffective bisection can lead to waste of developer time. This work aims to analyze what constitutes an effective bisection based on the properties that contribute to this efectiveness. Based on this analysis, this work also explores what these properties look like in practice.

Methodology

There are two overarching questions explored in this study:

What impact do the contributing properties have on the effectiveness of the bisection?

What characteristics do the contributing properties have in practice?

The first question is answered by first formulating an "effectiveness measure", which is a numerical value that represents the probability of an effective bisection. The properties that contribute to this measure are then explored separately, using one-at-a-time sensitivity analysis. Then, 310 bug reports are qualitatively analyzed in the context of this analysis on the contributing properties.

Presentation

Data

Raw Impact Results: impact_excels.zip

Bug Reports: [Link]