Mutation Testing and Three Variations

by
Irene Koo
Date: Nov 29, 1996.

Abstract

Mutation Testing is a powerful error-based testing technique for unit testing. It provides high test coverage and detects many simple syntactic faults. However, it requires huge computational resources. Various methods have been proposed to reduce computational expenses of mutation testing while trying to maintain the effectiveness of test sets. Based on studies done by different researchers, this paper presents strong, weak, firm, and selective mutation testing. Statistics obtained from different studies is used to compare the cost and effectiveness of the methods presented. Selective mutation is found to have the lowest requirement on computational resources while maintaining relatively high mutation scores.

1. Introduction

Mutation testing is a powerful fault-based testing technique for unit level testing. Since it is a fault-based testing technique, it is aimed at testing and uncovering some specific kinds of faults, namely simple syntactic changes to a program. Mutation testing is based on two assumptions: the competent programmer hypothesis and the coupling effect. The competent programmer hypothesis assumes that competent programmers tend to write nearly "correct" programs [14]. That is, programs written by experienced programmers may not be correct, but they will differ from the corrected version by some relatively simple faults such as off-by-one fault. The coupling effect stated that a set of test data that can uncover all simple faults in a program is also capable of detecting more complex faults [14]. Mutation testing can be described in the following way: A large number of simple faults, called mutations, are introduced in a program one at a time. The resulting changed versions of the test program are called mutants. Test data is then be constructed to cause these mutants to fail. The effectiveness of the test data set is measured by the percentage of mutants killed. Since the number of mutants that can be generated is large (the number is usually on the order of N2, where N is the number of variable references in the program), methods have been suggested to reduce the computational expenses of this testing technique. Methods proposed over the years to combat the expensive computation problem include weak mutation, firm mutation, and selective mutation. All of these methods are variations to the original mutation technique, which has become strong mutation testing. Weak mutation only requires the test data to cause a mutated component to take on a different value in at least one execution, instead of outputting a different value from the expected result. Firm mutation requires only a fragment of the mutated program to be executed instead of the whole program. Selective mutation is similar to strong mutation since it requires the execution of the entire program. It reduces the number of computations by selectively choosing the type of mutants to be executed. In doing so, it hopes that the mutations introduced would be able to uncover all faults that can be detected by the mutants not chosen, as well as those they are specified to uncover. This paper discusses mutation testing and its variations in details and provides a comparison among them with respect to their cost and effectiveness.

2. Strong Mutation Testing

Mutation Testing was first introduced by R. A. DeMillo et al in 1978 [1]. In the paper, the authors proposed a way to demonstrate that a test program does not contain a specific set of faults. The method is to find a set of test data that cause the program with faults introduced to fail, thereby detecting the faults. Strong mutation is a testing technique through which test data are constructed to uncover a set of simple, but specific, syntactic faults. To perform mutation testing, simple syntactic faults are introduced into the test program one by one. The resulting faulty versions of the test program are called mutants. Test data are generated to cause the mutated program to fail. The definition of failure in strong mutation is that the program produces a different output from the expected result. If the program fails, the mutant is said to be killed by the test case. If the mutant is live at the end of the test, the set of test data may need to be refined or enhanced. Sometimes, a mutant cannot be killed by any set of test data. This type of mutants is said to be functionally equivalent to the original program [2]. The quality of test data is measured by mutation scores. A mutation score is the percentage of non-equivalent mutants killed. Using the notation specified in [3], the mutation score is calculated as:

(1)

whereP is the test program;
T is the set of test data;
Mk is the number of mutants killed by T;
Mt is the total number of mutants generated for the program;
Mq is the number of equivalent mutants for the program.

A set of test data is mutation-adequate if its mutation score is 100%.

2.1 Mutation Operators

Mutation operators are the souls of mutation testing. In a recent mutation testing environment, Mothra [4], twenty-two mutation operators are defined. These mutation operators identify the syntactic modifications responsible for mutant programs [5]. The Mothra Mutation Operators described in [4] are listed in Table 1.

Table1. Mothra Mutation Operators
TypeDescription
aararray for array replacement
absabsolute value insertion
acrarray constant replacement
aorarithmetic operator replacement
asrarray for variable replacement
carconstant for array replacement
cnrcomparable array replacement
csrconstant for scalar replacement
derDO statement end replacement
lcrlogical connector replacement
rorrelational operator replacement
sarscalar for array replacement
scrscalar for constant replacement
svrscalar variable replacement
crpconstant replacement
dsadata statement alterations
glrgoto label replacement
rsrreturn statement replacement
sanstatement analysis
sdlstatement deletion
srcsource constant replacement
uoiunary operation insertion

Consider an example used in [4] and [6], which determines the index of the first occurrence of a maximum element in a single-dimensioned array. The C version of the program is given in Fig 1.

int max(int* input, int size)
{
int j, result;
result = 0;
for (j =1; j < size; j++) {
if (input[j] > input[result])
result = j;
}
return result;
}

Fig 1. C Program "max"

Test sets for the original "max" program are shown in Table 2.

Table 2. Initial Test Sets for "max"
Data Set #input(0)input(1)input(2)Expected result
Data set 11233
Data set 23 12 1
Data set 31 32 2

A relational mutation operator (ror) is applied to the "max" program. The resulting mutant is shown in Fig 2.

int max(int* input, int size)
{
int j, result;
result = input[0];
for (j =1; j < size; j++) {
if (input[j] >= input[result])
result = j;
}
return result;
}

Fig 2. C Program "max" with Relational Operator Replacement.

The test set in Table 2 is applied to the rationally mutated program. The result obtained are stated in Table 3 below.

Table 3. Outputs Obtained from Mutated "max" Using Initial Test Data.
Data Set #input(0)input(1) input(2)Output of mutated "max" Expected result
Data set 11 23 33
Data set 23 12 11
Data set 31 32 22

From Table 3, the initial set of test data cannot kill the mutant. The set of test data is enhanced (Table 4) to kill the live mutant.

Table 4. An Enhanced Test Set for "max"

Data Set #input(0)input(1) input(2)Output of mutated "max" Expected result
Data set 11 23 33
Data set 23 12 11
Data set 31 32 22
Data set 41 22 32

A functionally equivalent mutant of "max" is illustrated in Fig 3. The equivalent mutant cannot be killed by any set of data.

int max(int* input, int size)
{
int j, result;
result = input[0];
for (j =0; j < size; j++) {
if (input[j] > input[result])
result = j;
}
return result;
}

Fig 3. An Equivalent Mutant (CRP) to "max".

2.2 The Cost of Strong Mutation Testing

Analyses on the cost of mutation testing have been made in the past [11] [12]. The major cost of strong mutation testing has been identified as the computational cost for running the mutants against test cases [5]. Budd [11] found the number of mutants generated for a program to be approximately proportional to the product of the number of data references and the size of the set of data objects [5]. Acree et al [12] suggested that the number of mutants is on the order of the square of the number of lines in the program [5]. The examples presented in Table 5 illustrate this point. For example, BUBBLE is an eleven line subroutine that sorts an array of integers. CALENDAR has 29 lines, and it computes the number days between two given dates. FIND is a 28 line subroutine. GCD has 55 lines, and it computes the greatest common divisor of an array. From equations 1 and 2, the difference between the number of mutants generated and the number of live mutants of an example equals to the number of mutants killed divided by the mutation score.

(2)

Since the number of live mutants in an example is usually low, Mt - Mq gives a good indication of the number of mutants generated. Thus, the 11 line BUBBLE has about three hundred mutants generated; the 29 line CALENDAR has around 2600 mutants; FIND has about 950 mutants; GCD has around 4750 mutants. Each mutant must be executed at least once; many are executed more than once. Thus, the number of computations for a test set is huge. Methods have been proposed to improve the performance of strong mutation testing. Mathur [10] suggested the use of parallel computers. He had attempted to exploit difference architectures such as vector machines, SIMD machines, and MIMD machines. Programs that have fast compile time compared to their execution time benefit from the use of parallel machines. A different approach, known as Compiler Integrated testing (CIT), integrates testing methods into a compiler. This approach takes advantage of the compiler's ability to generate mutants efficiently, thereby, increasing the parallel machine's utilization. Besides using better hardware, variations of mutation testing have been introduced to the reduce the number of mutants generated while maintaining a reasonably good test coverage. Three better known variations of strong mutation testing are weak mutation, firm mutation, and selective mutation.

2.3 Effectiveness of Strong Mutation Testing

Mutation testing has been empirically found to be one of the most effective testing methods in finding faults: it is effective in finding faults of omissions and faults of commissions [7]. The notion of effectiveness in testing has been defined in several ways. Goodenough and Gerhart defined effectiveness in terms of reliability and validity. A test is reliable for an error if its use is guaranteed to detect the presence of the error [8]. According to Weyuker and Ostrand [9], a test set is effective if the test set reveals all errors that are represented by the difference between a function and a mutated version of the function. Weyuker and Ostrand's definition of effectiveness can be measured by mutation scores.

Table 5 and Table 6 summarize the averaged results of strong mutation testing done in [4] and [5] respectively. Data were collected over five different test sets. The tables describe the number of test cases generated, the number of mutants killed, and the mutation score of various programs.

Table 5. Results of Tests Done in [4] Table 6. Results of Tests Done in [5]
ProgramNumber of Test Cases Number of Mutants killed Mutation Score ProgramNumber of Test Cases Number of Mutants killed Mutation Score
BUBBLE6.4 3041.00 BANKER 57.42765 1.000
CALENAR83.8 2624.95 BUBBLE 7.2338 1.000
FIND11.6 953.99 CALENDAR 50.03010 1.000
GCD65 4747.99 EUCLID 3.8196 1.000
TRITYP107.2 959.997 FIND 14.61019 0.9994
INSERT 3.8460 1.000
MID 25.2183 1.000
QUAD 12.2359 1.000
TRITYP 44.6950 0.9998
WARSHALL 7.2305 1.000

Observing the results in Table 5 and 6, one can conclude that strong mutation testing indeed is effective: the mutation scores of various programs range from 95% to 100%. The test cases generated using strong mutation testing can kill almost all of the non-equivalent mutants.

3. Weak Mutation Testing

Strong mutation testing is a powerful unit-testing technique, but it suffers from expensive computational cost. Howden [7] proposed a modification known as weak mutation testing. Weak mutation testing is computationally more efficient because it provides a less stringent test. Weak mutation testing can be described as follows:

Given a program P and a simple component C of P. C' is a mutated version of C, and P' is the mutant of P containing C'. A test set is adequate for weak mutation testing if it causes C' to compute a different value from the value computed by C on at least one of execution of C'. Note that even though C' may produce a different value from C, P' may still produce the result as P. Weak mutation requires a less stringent test set because it only requires C and C' to produce different outputs [7].

3.1 Definition of a Component

When Howden first proposed weak mutation testing, he did not define what a component is. Instead, he listed five types of program components that he considered:

  1. variable reference
  2. variable assignment
  3. arithmetic expression
  4. relational expression
  5. boolean expression

Offutt and Lee [2], later in their Leonardo weak mutation system, defined component as the location of the program at which the states of the original and the mutated program are compared. They varied the location for comparison in order to determine the best comparison point. Using their terminology, the variants that they used are:

  1. EXpression-WEAK/1. EX-WEAK/1 compares the states after the execution of the innermost expression with the mutation operator.
  2. STatement-WEAK/1. ST-WEAK/1 compares the state after the first execution of the mutated statement.
  3. Basic-Block-WEAK/1. BB-WEAK/1 compares the state after the first execution of the basic mutated block.
  4. Basic-Block-WEAK/N. BB-WEAK/N compares the state after the Nth execution of the basic mutated block.

Lets return to the "max" example. "Max" with a constant for array replacement (CAR) mutation operator applied is shown in Fig 4.

1 int max(int* input, int size)
2 {
3 int j, result;
4 result = input[0];
5 for (j =1; j < size; j++) {
6 if (input[j] > result)
7 result = j;
8 }
9 return result;
10 }

Fig 4. C Program "max" with a Constant for Array Replacement

The CAR mutation operator changes the array element, input[result], to a constant, result, in line 6. Under EX-WEAK/1, the value of the expression (result) will be compared with the expression (input[result]). Under ST-WEAK/1, the predicate (input[j] > result) will be compared with (input[j] > input[result]). Under BB-WEAK/1, the program state at line 8 will be compared. Under BB-WEAK/N, the program state will be compared every time line 8 is reached. The points of comparison for the four variants are shown in Fig. 5.

Fig 5. Locations of Comparisons for Weak Mutation Variants

3.2 The Cost of Weak Mutation Testing

A good cost measure of mutation testing is the number of test cases generated for programs. Table 7 lists the number of test cases generated for various programs in Offutt and Lee's experiment.

Table7. Number of Test Cases Generated by Weak Mutation Testing
ProgramEX-WEAK/1ST-WEAK/1 BB-WEAK/1BB-WEAK/N
BUB7 11107
CAL13 252520
EUCLID4 543
FIND11 15147
INSERT5 774
PAT8 111012
WARSHALL5 993

Comparing the values above with the ones from Table 5 and Table 6, the number of test cases generated for weak mutation testing decrease drastically. Thus, the cost of weak mutation testing is less than that of strong mutation testing.

3.3 Effectiveness of Weak Mutation Testing

According to Offutt and Lee [2], if the same set of test data is applied to both strong mutation testing and weak mutation testing, the adequate score of weak mutation testing will usually be as high as the adequate score of strong mutation testing. Weak mutation scores better than strong mutation most of the time. It is not because weak mutation is better, but weak mutation requires a less stringent test. Thus, the test set created by weak mutation is weaker than the ones by strong mutation. In one of Offutt and Lee's experiment, they created 100% adequate test sets for all four variants of weak mutation testing on 11 programs. The 100% adequate test sets were then applied to strong mutation testing. The resulting adequate scores are compared and shown in Fig 6.

Fig 6. Strong Mutation Score on Weak Mutation Test Sets [2]

As seen from Fig 6, test sets from EX-WEAK/1 tend to be weaker than those of the other variants. Test sets from ST-WEAK and BB-WEAJ/1 tend to be the strongest, giving strong mutation scores ranged from 94.8% to 100%. The test set for BB-WEAK/N performs better than BB-WEAK/1 in only one case. That is, the state of the mutated block does not change at the end of the first iteration, but it changes at least once in the next N-1 iteration. The 100% adequate weak mutation test sets do not give rise to 100% strong mutation test scores in most of the cases. Even though test sets from weak mutation testing may cause a mutated program to produce different values for a component, the program may still output a correct result. Therefore, test sets from weak mutation testing are effective only at the component level.

4. Firm Mutation Testing

The strategies used in strong mutation testing and weak mutation testing seem to be on the extremes of the testing scale. Strong mutation testing gives effective results but suffers from expensive computational cost. Weak mutation testing reduces the number of test cases generated drastically, but the test cases are only effective to the components of the programs. Firm mutation testing has been proposed to give a more balance computational cost with acceptable effectiveness.

Firm mutation testing is actually an extension to weak mutation testing. In weak mutation testing, the state of the program is compared after every execution of the component. In strong mutation testing, the output of the program is compared at the end of the program execution. Firm mutation compares the state of the program at some time slice of the program execution that is at least as long as the execution of the mutated statement [13]. The exact component for comparison in firm mutation will be of the user's choices. As described by Woodward and Halewood [13], firm mutation offers users a great degree of control in the following ways:

  1. code selection. The user-selectable section of a program can be any program structure or sequence of structures. The possible code selections for the "max" example are shown in Table 8.

    Table 8. Possible Code Selections for "max"
    1.j
    2.if (j > input[result])
    result = j;
    3.for (j =1; j < size; j++)
    {
    if (input[j] > input[result])
    result = j;
    }
    4.result = 0;
    for (j =1; j < size; j++)
    {
    if (input[j] > input[result])
    result = j;
    }
    5.int max(int* input, int size) {
    int j, result;
    result = 0;
    for (j =1; j < size; j++)
    {
    if (input[j] > input[result])
    result = j;
    }
    return result;
    }

  2. result comparison. Any object that is either defined, or referenced and defined in the selected region, can be used for result comparison [13]. Table 9 shows some of the possible variables for each possible code selection shown in Table 8 that can be used for comparisons.

    Table 9. Possible Result Comparisons for "max".
    Code selection #Possible result comparisons
    1j
    2result,input[result],j
    3result,input[result],input[j],j,size
    4result,input[result],input[j],j,size
    5result,input[result],input[j],j,input,size
  3. mutant operator selection. Any mutation operator specified in Table 1 can be applied to the selection of codes.

The biggest advantage of firm mutation testing is that users can specify the part of codes to be tested, the variables for comparisons, and the mutant operator to be used in the test. That is, the users have great control in determining what criteria needed to kill a mutant.

Some combinations of choices for firm mutation testing are listed in Table 10.

Table 10. Possible Combinations of Code Selections, Result Comparisons, and Mutation Operators
Code SelectionResult Comparison Mutation OperatorEffects
1 for (j =1; j < size; j++) {
if (input[j] > input[result])
result = j;
}
result ACR on j:
result = j; -->
result = input[j];
Compare the value of "result" at the end of the for loop execution
2 if (j > input[result])
result = j;
result ROR on >:
> --> <
compare the value of "result" after every execution of the
result = j;
statement

Firm mutation testing is computation less expensive than strong mutation testing since it can select a part of the program for execution. It gives more control to users since it allows users to specify the section of codes to be tested, the point of comparison, and the type of mutation. If the changes made to the program can be reversed in the program execution, several test cases can be combined in to one single test. More computational expenses can be saved.

5. Selective Mutation Testing

Both weak and strong mutation testings reduce computational expenses by executing a part of the mutant. The number of mutants generated for tests is the same as strong mutation testing. Offutt et al suggested that the cost of mutation testing can be reduced by reducing the number of mutants generated [5]. Mutated programs are results of simple syntactic modifications of the original program. The modifications are determined by a set of mutation operators listed in Table 1. The idea of selective mutation is to omit the mutation operators that produce the most mutants. Models have been formed to predict the number of mutants that can be generated [5] [10] [12]. Offutt [5] found that the number of mutants generated in a multi-unit program can be best predicted by the number of variables and the number of variable references. For single-unit programs, the best predictor is the number of variables in the program [5] [12]. Both Mathur [10] and Offutt et al [5] suggested the two mutation operators should be omitted are SVR (Scalar variable replacement) and ASR (array for scalar replacement). These two operators generated the most mutants because SVR replaces every scalar variable in a program with every compatible scalar variable found in the program. ASR replaces every scalar variable in a program with every array reference of compatible type.

5.1 The Cost of Selective Mutation Testing

Mathur's hypothesis on selective mutation was confirmed by the experiment done in [5]. Selective mutation testing is as powerful as strong mutation testing. SVR and ASR operators are omitted in 2-selective mutation; SVR, ASR, CSR, and SCR mutants are excluded in 4-selective mutation; 6-selective ignores the ASR, CSR, SCR, SVR, ARC, and ACR mutants. According to Fig 7, SVR and ASR contribute to about 30% of the mutation operators. CSR and SCR contribute to an additional 20%. ASR, CSR, SCR, SVR, ARC, and ACR all together accounts for about 63% of the total number of mutation operators. Eliminating them from 2-, 4-, and 6-selective mutation, a reduction of 30%, 50%, and 60% in computational costs are expected. The expected results can be observed in Offutt's experiment [5]:

Table 11. Saving Obtained by Selective Mutation
ProgramNo. of Strong Mutants No. of 2-Selective MutantsPercentage Saved by 2-Selective Mutation No. of 4-Selective MutantsPercentage Saved by 4-Selective Mutation No. of 6-Selective MutantsPercentage Saved by 6-Selective Mutation
Banker2765 194429.691546 44.09118657.11
Bub338 27718.05224 33.7319741.71
Cal3010 247217.871804 33.7273875.48
Euclid196 14227.56119 40.0711940.07
Find1022 62239.15525 39.2950550.59
Insert460 35223.48276 48.6320655.22
Mid183 13327.32133 27.3213327.32
Quad359 127922.28212 40.9520044.29
Trityp951 81014.83579 39.1253343.95
Warshall305 25915.08205 32.7916246.89
Total9589 729023.985623 41.36378260.56

2-selective mutation obtained an average saving of 23.98% and the highest saving at 39.15% for FIND. 4-selective mutation obtained an average saving of 41.36%. The highest saving was 48.63% from INSERT. 6-selective mutation obtained an average saving of 60.56%. The highest saving was 75.48% from CAL. Since selective mutation testing can achieve a saving that exceeds 60% without affecting the effectiveness of test sets, it seems to be a very good alternative to strong mutation testing.

5.2 Effectiveness of Selective Mutation Testing

Mathur hypothesized that selective mutation is as powerful as strong mutation testing. Offutt et al [5] performed an experiment to confirm Mathur's hypothesis. 2-selective mutation, is a term used by Offutt et al, for the selective mutation in which the two operators (SVR and ASR) that generate the most mutants are excluded. Similarly, N-selective mutation excludes N operators that generate the most mutants. A distribution of mutation operators founded in programs used by Offutt is shown in Fig 7.


Fig 7. Distribution of Mutation Operators [5]

In Offutt et al's experiment, they generated test sets that are 100% mutation adequate for 2-selective mutation. The test sets were later applied to strong mutation testing. The averaged results from 5 different test sets are shown in Table 12.

Table 12. Strong Mutation Scores Achieved by 2-Selective Mutation Test Sets
ProgramTest CasesNumber of Live Mutants Strong Mutation Score
Banker57.4 01.00
Bub7.2 01.00
Cal50.0 01.00
Euclid3.8 01.00
Find14.6 0.60.9994
Insert3.8 01.00
Mid25.2 01.00
Quad12.2 01.00
Trityp44.6 0.20.9998
Warshall7.2 01.00
Average22.6 0.10.9999

The test sets that were 100% adequate for 2-selective mutation were almost 100% adequate for strong mutation. In fact, eight out of the ten 2-selective test sets were 100% adequate for strong mutation. The 2-selective mutation test sets achieve high scores on strong mutation testing because the mutants generated by the two operators omitted can be easily killed by tests from other types of mutants [5]. Offutt further extended the experiment by performing 4-selective and 6-selective mutation testing. The four operators omitted in 4-selective mutation testing were SVR, ASR, CSR, and SCR. The six mutation operators omitted in 6-selective mutation testing were ASR, CSR, SCR, SVR, ARC, and ACR. The averaged results from 5 different test sets are shown in Table 13 and Table 14.

Table 13. Strong Mutation Scores Achieved by 4-Selective Mutation Test Sets
ProgramTest CasesNumber of Live Mutants Strong Mutation Score
Banker38.4 0.20.9999
Bub606 01.00
Cal31.4 0.80.9997
Euclid2.4 2.20.9872
Find13.2 01.00
Insert5.0 01.00
Mid24.8 01.00
Quad10.2 01.00
Trityp44.8 2.00.9976
Warshall7.4 01.00
Average18.4 0.50.9984

Table 14. Strong Mutation Scores Achieved by 6-Selective Mutation Test Sets
ProgramTest CasesNumber of Live Mutants Strong Mutation Score
Banker28.4 1.20.9996
Bub6.6 01.00
Cal26.0 5.20.9981
Euclid2.8 1.80.9895
Find13.4 2.20.9977
Insert4.0 01.00
Mid24.8 01.00
Quad10.0 2.00.9939
Trityp39.8 01.00
Warshall4.2 2.00.9926
Average160 1.40.9971

The 2-selective test sets had an average of 99.99%; the 4-selective mutation test sets had an average of 99.84%; the 6-selective test sets had 99.71%. The lowest score for 2-selective was 99.94%; the lowest for 4-selective was 99.72%; the lowest for 6-selective was 98.95%. The results from the three selective mutation test sets were not significantly different. However, they reduced the cost by a wide margin.

6. Conclusion

Mutation testing is a powerful error-based testing technique. Test sets, generated by applying simple syntactic modifications to a program, guarantee those simple errors are absent from the program up to certain confidence level. The confidence level is quantified by the mutation score. Mutation testing has several advantages that cannot be compared by other testing methods: it offers more test coverage; it detects more faults [3]; it can be applied to non-procedural programs, as well as a wide variety of programming languages and styles [6]. However, mutation testing suffers from expensive computational cost. Variations such as weak mutation, firm mutation, and selective mutation testing, have been proposed to combat the problem. Strong, weak, firm, and selective mutation testing are studied in this paper. The effectiveness of the variations is compared in terms of mutation score. The savings are compared in terms of the number of mutants or test cases generated. A summary of their performances is presented in Table 15 and Table 16.

Table 15. Strong Mutation Scores of Various Mutation Testing Methods
Weak Mutation Selective Mutation
EX-WEAK/1ST-WEAK/1BB-WEAK/1BB-WEAK/N2-Selective4- elective6-Selective
Banker- -- -1.00 0.99990.9996
Bub0.986 0.9940.993 0.991.00 1.001.00
Cal0.976 0.9920.99 0.9821.00 0.99970.9981
Euclid0.998 0.9980.9975 0.991.00 0.98720.9895
Find0.972 0.980.982 0.9720.9994 1.000.9977
Insert1.00 1.001.00 1.001.00 1.001.00
Mid0.9675 1.001.00 1.001.00 1.001.00
Quad0.989 1.001.00 1.001.00 1.000.9939
Trityp- 0.9480.955 -0.9998 0.99761.00
Warshall- -- -1.00 1.000.9926
Average0.984 0.990.99 0.990.9999 0.99840.9971

Table 16. The Number of Mutants Generated for Various Mutation Methods
Strong Weak Selective Mutation
Mutation Mutation 2-Selective 4-Selective6-Selective
Banker2765 -19441546 1186
Bub338 338277224 197
Cal3010 300924721804 738
Euclid196 195142119 119
Find1022 1022622525 505
Insert460 460352276 206
Mid183 183133133 133
Quad359 3591279212 200
Trityp951 951810579 533
Warshall305 305259205 162

Firm mutation testing can be seen as an extension of weak mutation testing. Its cost and effectiveness have yet to be determined empirically. From Table 15 and 16, selective mutation testing is found to perform better than weak mutation testing in terms of both mutation scores and computational savings. Of the four weak mutation methods, ST-WEAK/1 seemed to achieve the highest mutation score but had the highest computational cost. 2-selective mutation gives the best mutation scores of the three selective mutation methods, but it has the least computational saving. A trade off exists between effectiveness and computational saving. Testers can choose the most appropriate method for a program according their desired confidence level and the available resources.

7. References

  1. R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on Test Data Selection: Help for the Practicing Programmer. IEEE Computer, 11(4):34-41, April 1978.
  2. A. J. Offutt, and Ss. D. Lee. An Empirical Evaluation of Weak Mutation. IEEE Transactions on Software Engineering, vol. 20, no. 5, 337-344, May 1994.
  3. A. J. Offutt, J. Pan, K. Tewary, and T. Zhang. An Experimental Evaluation of Data Flow and Mutation Testing. Software-Practice and Experience, vol. 26(2): 165-176, February 1996.
  4. R. A. DeMillo, D. S. Guindi, W. M. McCracken, A. J. Offutt, and K. N. King. An Extended Overview of the Mothra Software Testing Environment. IEEE Proceedings of Second Workshop on Software Testing, Verification, and Analysis, 142-151, July 1988.
  5. A. J. Offutt, G. Rothermel, and C. Zapf. An Experimental Evaluation of Selective Mutation. International Conference on Software Engineering, 1993. 100-107. 1993.
  6. M. R. Woodward. Mutation Testing - An Evolving Technique. IEE Colloquium on Software Testing for Critical Systems. No. 108. 3/1-3/6. 1990.
  7. W. E. Howden. Weak Mutation Testing and Completeness of Test Sets. IEEE Transactions of Software Engineering, vol. SE-8, No. 4, 371-379, July 1982.
  8. W. E. Howden. Reliability of the Path Analysis Testing Strategy. IEEE Transactions of Software Engineering, vol. SE-2, 1976.
  9. E. J. Weyuker and T. J. Ostrand. Theories of Program Tesing and the Application of Revealing Subdomains. IEEE Transactios of Software Engineering, vol. SE-6, 1980.
  10. A. P. Mathur. Performance, Effectiveness, and Reliability Issues in Software Testing. IEEE Proceedings of 15th Annual International Computer Software & Application Conference. 604-605, 1991.
  11. T. A. Budd. Mutation Analysis of Program Test Data. PhD thesis, Yale University, New Haven CT, 1980.
  12. A. T. Acree, T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Mutation Analysis. Technical report GIT-ICS-79/08, School of Information and Computer Science, Georgia Institute of Technology, Atlanta GA, September 1979.
  13. M. R. Woodward, K. Halewood. From Weak To Strong, Dead Or Alive?? An Analysis of Some Mutation Testing Issues. IEEE Proceedings of Second Workshop on Software Testing, Verification, and Analysis, 152-158, July 1988.
  14. A. J. Offutt. Investigations of the Software Testing Coupling Effect. ACM Transactions on Software Engineering and Methodology, Vol. 1, No. 1, 5-20, January 1992.