Innovations on Automatic Fault Localization
By: Farid Feyzi, Ph.D.
Fall 2018
Spectrum-Based Fault Localization (SBFL) techniques estimate the suspiciousness of each program statement by analyzing the spectra of passing and failing test cases. They attempt to rank program elements according to their presence (as well as absence) in failing and passing executions. The more correlation between such program elements and the presence of the observed failures, the larger degree of suspiciousness is assigned to the element. Despite the proven applicability of SBFL methods in automatic fault localization, these approaches are biased by data collected from different executions of the program. This biasness could result in unstable statistical models which may vary dependent on test data provided for trial executions of the program. To resolve the difficulty, we introduced a new ‘fault-proneness’-aware statistical approach based on Elastic-Net regression, namely FPA-FL[1]. The main idea behind FPA-FL is to consider the static structure and the fault-proneness of the program statements in addition to their dynamic correlations with the program termination state. SBFL methods effectiveness can also be adversely affected by coincidental correctness, which occurs when the buggy statement is executed but the program execution does not result in failure. According to the Propagation–Infection–Execution (PIE) model, the execution of a defect is not a sufficient condition for failure, and the propagation of the infectious state to the output is also required. Recent studies have demonstrated that coincidental correctness is prevalent and is a safety reducing factor for SBFL. That is, when coincidentally correct tests are present, the faulty statement will likely be ranked as less suspicious than when they are not present. To mitigate the negative impact of coincidental correctness on the performance of SBFL techniques, several researchers investigated techniques to cleanse test suites from these tests to enhance SBFL. Existing approaches for identifying coincidentally correct tests can be classified into three categories: clustering-based, classification-based and probabilistic approaches. In this regard, given a test suite in which each test has been categorized as failing or passing, we provided a slicing-based technique [2] that identifies the subset of passing tests that are likely to be coincidentally correct. Our proposed approach first identifies a set of statements that may affect the program failure, called Failure Candidate Causes (FCC). It then applies the Ochiai fault localization technique to estimate the fault suspiciousness of statements included in the FCC set. Finally, it uses two heuristics to calculate a value for every passing test case, p, representing the likelihood that p is coincidentally correct. The first is the average suspiciousness score of the statements included in the FCC set that directly affects the program output and the second is the coverage ratio of those statements. Since none of these approaches are deterministic, they are unable to identify all the coincidentally correct test cases correctly, and in some cases, the high false positive and false negative ratios are inevitable. We also recently [3] demonstrated that a large number of faults are caused by undesired interactions between statements. So, SBFL techniques that analyze the statements in isolation could not perform well. In this case, since the faulty statement appears in many coincidentally correct tests, it may gain a low suspiciousness score if the impact of each statement on program failure is analyzed in isolation. To reduce the negative impact of coincidentally correct tests on fault localization performance and effectively localize this type of bugs, we suggested the idea of locating failure-causing statements considering their combinatorial effect on program failure [3]. The idea is inspired by the observation that most program failures are only revealed when a specific combination of correlated statements is executed.
Refrences:
[1] Feyzi, F., & Parsa, S. (2018). FPA-FL: Incorporating static fault-proneness analysis into statistical fault localization. Journal of Systems and Software, 136, 39-58.
[2] Feyzi, F., & Parsa, S. (2018). A program slicing-based method for effective detection of coincidentally correct test cases. Computing, DOI: 10.1007/s00607-018-0591-z.
[3] Feyzi F, Parsa S: Inforence. (2017). Effective Fault Localization Based on Information-Theoretic Analysis and Statistical Causal Inference. Frontiers of Computer Science. DOI: 10.1007/s11704-017-6512-z.