Systematic
software testing
Peter
Sestoft
IT University of Copenhagen, Denmark1
Version
2, 2008-02-25
This note
introduces techniques for systematic functionality testing of software.
Contents
1 Why
software testing? 1
2
White-box testing 5
3
Black-box testing 10
4
Practical hints about testing 14
5
Testing in perspective 15
6
Exercises 16
1 Why
software testing?
Programs
often contain errors (so-called bugs), even though the compiler accepts the
program
as
well-formed: the compiler can detect only errors of form, not of meaning. Many
errors
and
inconveniences in programs are discovered only by accident when the program is
being
used.
However, errors can be found in more systematic and e_ective ways than by
\random
experimentation".
This is the goal of software testing.
You may
think, why don't we just _x errors when they are discovered? After all, what
harm can
a program do? Consider some e_ects of software errors:
_ In the
1991 Gulf war, some Patriot missiles failed to hit incoming Iraqi Scud
missiles,
which
therefore killed people on the ground. Accumulated rounding errors in the
control
software's
clocks caused large navigation errors.
_ Errors in
the software controlling the baggage handling system of Denver International
Airport
delayed the entire airport's opening by a year (1994{1995), causing losses of
around
360 million dollars. Since September 2005 the computer-controlled baggage
system
has not been used; manual baggage handling saves one million dollars a month.
_ The _rst
launch of the European Ariane 5 rocket failed (1996), causing losses of
hundreds
of
million dollars. The problem was a bu_er overow in control software taken over
from
Ariane 4.
The software had not been re-tested | to save money.
1Original
1998 version written for the Royal Veterinary and Agricultural University, Denmark.
1
_ Errors in
a new train control system deployed in Berlin
(1998) caused train cancellations
and
delays for weeks.
_ Errors in
poorly designed control software in the Therac-25 radio-therapy equipment
(1987)
exposed several cancer patients to heavy doses of radiation, killing some.
A large
number of other software-related problems and risks have been recorded by the
RISKS
digest
since 1985, see the archive at http://catless.ncl.ac.uk/risks.
1.1 Syntax errors, semantic
errors, and logic errors
A program
in Java, or C# or any other language, may contain several kinds of errors:
_ syntax
errors: the program may be syntactically ill-formed (e.g. contain while x
{},
where
there are no parentheses around x), so that strictly
speaking it is not a Java
program
at all;
_ semantic
errors: the program may be syntactically well-formed, but attempt to access
non-existing
local variables or non-existing _elds of an object, or apply operators to the
wrong
type of arguments (as in true * 2, which attempts to multiply a
logical value
by a
number);
_ logical
errors: the program may be syntactically well-formed and type-correct, but
compute
the wrong answer anyway.
Errors of
the two former kinds are relatively trivial: the Java compiler javac will
automati-
cally
discover them and tell us about them. Logical errors (the third kind) are
harder to deal
with:
they cannot be found automatically, and it is our own responsibility to _nd
them, or
even
better, to convince ourselves that there are none.
In these
notes we shall assume that all errors discovered by the compiler have been
_xed.
We
present simple systematic techniques for _nding semantic errors and thereby
making it
plausible
that the program works as intended (when we can _nd no more errors).
1.2 Quality assurance
and di_erent kinds of testing
Testing
_ts into the more general context of software quality assurance; but what is
software
quality?
ISO Standard 9126 (2001) distinguishes six quality characteristics of software:
_ functionality: does
this software do what it is supposed to do; does it work as intended?
_ usability: is this
software easy to learn and convenient to use?
_ e_ciency: how
much time, memory, and network bandwidth does this software con-
sume?
_ reliability: how well
does this software deal with wrong inputs, external problems such
as
network failures, and so on?
_ maintainability: how
easy is it to _nd and _x errors in this software?
_ portability: how
easy is it to adapt this software to changes in its operating environment,
and how
easy is it to add new functionality?
The
present note is concerned only with functionality testing, but
note that usability testing
and performance
testing address quality characteristics number two and three. Reliability can
be addressed
by so-called stress testing, whereas maintainability and portability are rarely
systematically
tested.
2
1.3 Debugging versus
functionality testing
The
purpose of testing is very di_erent from that of debugging. It is
tempting to confuse the
two,
especially if one mistakenly believes that the purpose of debugging is to
remove the last
bug from
the program. In reality, debugging rarely achieves this.
The real
purpose of debugging is diagnosis. After we have observed that the
program does
not work
as intended, we debug it to answer the question: why doesn't this program work?
When we
have found out, we modify the program to (hopefully) work as intended.
By
contrast, the purpose of functionality testing is to strengthen our belief that
the
program
works as intended. To do this, we systematically try to
show that it does not work.
If our
best e_orts fail to show that the program does not work, then we have
strengthened
our
belief that it does work.
Using
systematic functionality testing we might _nd some cases where the program does
not work.
Then we use debugging to _nd out why. Then we _x the problem. And then we
test
again to make sure we _xed the problem without introducing new ones.
1.4 Pro_ling versus
performance testing
The
distinction between functionality testing and debugging has a parallel in the
distinction
between performance
testing and pro_ling. Namely, the purpose of pro_ling is diagnosis. After
we have
observed that the program is too slow or uses too much memory, we use pro_ling
to
answer
the question: why is this program so slow, why does it use so much memory? When
we have
found out, we modify the program to (hopefully) use less time and memory.
By
contrast, the purpose of performance testing is to strengthen our belief that
the pro-
gram is e_cient
enough. To do this, we systematically measure how much time and memory
it uses
on di_erent kinds and sizes of inputs. If the measurements show that it is
e_cient
enough
for those inputs, then we have strengthened our belief that the program is e_cient
enough
for all relevant inputs.
Using
systematic performance testing we might _nd some cases where the program is too
slow.
Then we use pro_ling to _nd out why. Then we _x the problem. And then we test
again to
make sure we _xed the problem without introducing new ones.
Schematically,
we have:
Purpose n Quality
Functionality E_ciency
Diagnosis
Debugging Pro_ling
Quality
assurance Functionality testing Performance testing
1.5 White-box testing
versus black-box testing
Two
important techniques for functionality testing are white-box
testing and black-box testing.
White-box
testing, sometimes called structural testing or internal testing, focuses on
the
text of the
program. The tester constructs a test suite (a collection of inputs and
corresponding
expected
outputs) that demonstrates that all branches of the program's choice and loop
constructs
| if, while, switch, try-catch-finally, and so
on | can be executed. The
test
suite is said to cover the statements of the program.
Black-box
testing, sometimes called external testing, focuses on the problem
that the pro-
gram is
supposed to solve; or more precisely, the problem statement or speci_cation
for the
3
program.
The tester constructs a test data set (inputs and corresponding expected
outputs)
that
includes `typical' as well as `extreme' input data. In particular, one must
include inputs
that are
described as exceptional or erroneous in the problem description.
White-box
testing and black-box testing are complementary approaches to test case gener-
ation. White-box
testing does not focus on the problem area, and therefore may not discover
that some
subproblem is left unsolved by the program, whereas black-box testing should.
Black-box
testing does not focus on the program text, and therefore may not discover that
some
parts of the program are completely useless or have an illogical structure,
whereas
white-box
testing should.
Software
testing can never prove that a program contains no
errors, but it can strengthen
one's
faith in the program. Systematic software testing is necessary if the program
will be
used by
others, if the welfare of humans or animals depends on it (so-called
safety-critical
software),
or if one wants to base scienti_c conclusions on the program's results.
1.6 Test coverage
Given
that we cannot make a perfect test suite, how do we know when we have a
reasonably
good one?
A standard measure of a test suite's comprehensiveness is coverage. Here
are some
notions
of coverage, in increasing order of strictness:
_ method
coverage: does the test suite make sure that every method (including function,
procedure,
constructor, property, indexer, action listener) gets executed at least once?
_ statement
coverage: does the test suite make sure that every statement of every method
gets
executed at least once?
_ branch
coverage: does the test suite make sure that every transfer of control gets exe-
cuted at
least once?
_ path
coverage: does the test suite make sure that every execution path through the
program
gets executed at least once?
Method
coverage is the minimum one should expect from a test suite; in principle we
know
nothing
at all about a method that has not been executed by the test suite.
Statement
coverage is achieved by the white-box technique described in Section 2, and is
often the
best coverage one can achieve in practice.
Branch
coverage is more demanding, especially in relation to virtual method calls (so-
called
virtual dispatch) and exception throwing. Namely, consider a single method call
state-
ment a.m() where
expression a has type A, and class A has many
subclasses A1, A2 and so on,
that override
method m(). Then to achieve branch
coverage, the test suite must make sure
that a.m() gets
executed for a being an object classs A1, an
object of class A2, and so on.
Similarly,
there is a transfer of control from an exception-throwing statement throw
exn to
the
corresponding exception handler, if any, so to achieve branch coverage, the
test suite must
make sure
that each such statement gets executed in the context of every relevant
exception
handler.
Path
coverage is usually impossible to achieve in practice, because any program that
contains
a loop will usually have an in_nite number of possible execution paths.
4
2
White-box testing
The goal
of white-box testing is to make sure that all parts of the program have been
executed,
for some
notion of part, as described in Section 1.6 on test coverage. The approach
described
in this
section gives statement coverage. The resulting test suite includes enough
input data
sets to
make sure that all methods have been called, that both the true and false
branches
have been
executed in if statements, that every loop has been executed zero, one, and more
times,
that all branches of every switch statement have been executed, and
so on. For every
input
data set, the expected output must be speci_ed also. Then, the program is run
with
all the
input data sets, and the actual outputs are compared to the expected outputs.
White-box
testing cannot demonstrate that the program works in all cases, but it is a
surprisingly
e_cient (fast), e_ective (thorough), and systematic way to discover errors in
the
program.
In particular, it is a good way to _nd errors in programs with a complicated
logic,
and to
_nd variables that are initialized with the wrong values.
2.1 Example 1 of
white-box testing
The
program below receives some integers as argument, and is expected to print out
the
smallest
and the greatest of these numbers. We shall see how one performs a white-box
test
of the
program. (Be forewarned that the program is actually erroneous; is this
obvious?)
public
static void main ( String[] args )
{
int mi,
ma;
if
(args.length == 0) /* 1 */
System.out.println("No
numbers");
else
{
mi = ma
= Integer.parseInt(args[0]);
for
(int i = 1; i < args.length; i++) /* 2 */
{
int obs
= Integer.parseInt(args[i]);
if (obs
> ma) ma = obs; /* 3 */
else if
(mi < obs) mi = obs; /* 4 */
}
System.out.println("Minimum
= " + mi + "; maximum = " + ma);
}
}
The
choice statements are numbered 1{4 in the margin. Number 2 is the for statement.
First we construct
a table that shows, for every choice statement and every possible outcome,
which
input data set covers that choice and outcome:
5
Choice
Input property Input data set
1 true No
numbers A
1 false
At least one number B
2 zero
times Exactly one number B
2 once
Exactly two numbers C
2 more
than once At least three numbers E
3 true
Number > current maximum C
3 false
Number _ current maximum D
4 true
Number _ current maximum and > current minimum E, 3rd number
4 false
Number _ current maximum and _ current minimum E, 2nd number
While
constructing the above table, we construct also a table of the input data sets:
Input
data set Input contents Expected output Actual output
A (no
numbers) No numbers No numbers
B 17 17
17 17 17
C 27 29
27 29 27 29
D 39 37
37 39 39 39
E 49 47
48 47 49 49 49
When
running the above program on the input data sets, one sees that the outputs are
wrong
| they
disagree with the expected outputs | for input data sets D and E. Now one may
run the
program manually on e.g. input data set D, which will lead one to discover that
the
condition
in the program's choice 4 is wrong. When we receive a number which is less than
the
current minimum, then the variable mi is not
updated correctly. The statement should
be:
else if
(obs < mi) mi = obs; /* 4a */
After
correcting the program, it may be necessary to reconstruct the white-box test.
It may
be very
time consuming to go through several rounds of modi_cation and re-testing, so
it
pays o_
to make the program correct from the outset! In the present case it su_ces to
change
the
comments in the last two lines of the table of choices and outcomes, because
all we did
was to
invert the condition in choice 4:
Choice
Input property Input data set
1 true No
numbers A
1 false
At least one number B
2 zero times
Exactly one number B
2 once
Exactly two numbers C
2 more
than once At least three numbers E
3 true
Number > current maximum C
3 false
Number _ current maximum D
4a true
Number _ current maximum and < current minimum E, 2nd number
4a false
Number _ current maximum and _ current minimum E, 3rd number
The input
data sets remain the same. The corrected program produced the expected output
for all
input data sets A{E.
6
2.2 Example 2 of
white-box testing
The
program below receives some non-negative numbers as input, and is expected to
print out
the two
smallest of these numbers, or the smallest, in case there is only one. (Is this
problem
statement
unambiguous?). This program, too, is erroneous; can you _nd the problem?
public
static void main ( String[] args )
{
int mi1
= 0, mi2 = 0;
if
(args.length == 0) /* 1 */
System.out.println("No
numbers");
else
{
mi1 =
Integer.parseInt(args[0]);
if
(args.length == 1) /* 2 */
System.out.println("Smallest
= " + mi1);
else
{
int obs
= Integer.parseInt(args[1]);
if (obs
< mi1) /* 3 */
{ mi2 =
mi1; mi1 = obs; }
for
(int i = 2; i < args.length; i++) /* 4 */
{
obs =
Integer.parseInt(args[i]);
if (obs
< mi1) /* 5 */
{ mi2 =
mi1; mi1 = obs; }
else if
(obs < mi2) /* 6 */
mi2 =
obs;
}
System.out.println("The
two smallest are " + mi1 + " and " + mi2);
}
}
}
As before
we tabulate the program's choices 1{6 and their possible outcomes:
Choice
Input property Input data set
1 true No
numbers A
1 false
At least one number B
2 true
Exactly one number B
2 false
At least two numbers C
3 false
Second number _ _rst number C
3 true
Second number < _rst number D
4 zero
time Exactly two numbers D
4 once
Exactly three numbers E
4 more
than once At least four numbers H
5 true
Third number < current minimum E
5 false
Third number _ current minimum F
6 true
Third number _ current minimum and < second
least F
6 false
Third number _ current minimum and _ second
least G
7
The
corresponding input data sets might be:
Input
data set Contents Expected output Actual output
A (no
numbers) No numbers No numbers
B 17 17 17
C 27 29
27 29 27 0
D 39 37
37 39 37 39
E 49 48
47 47 48 47 48
F 59 57
58 57 58 57 58
G 67 68
69 67 68 67 0
H 77 78
79 76 76 77 76 77
Running
the program with these test data, it turns out that data set C produces wrong
results:
27 and 0. Looking at the program text, we see that this is because variable mi2
retains
its initial value, namely, 0. The program must be _xed by inserting an
assignment
mi2 =
obs just before the line labelled 3. We do not need to change the white-box
test,
because
no choice statements were added or changed. The corrected program produces the
expected
output for all input data sets A{H.
Note that
if the variable declaration had not been initialized with mi2 = 0, the
Java
compiler
would have complained that mi2 might be used before its _rst
assignment. If so, the
error
would have been detected even without testing.
This is not the case
in many other current programming languages (e.g. C, C++, Fortran),
where one
may well use an uninitialized variable | its value is just whatever happens to
be
at that
location in the computer's memory. The error may even go undetected by testing,
when the
value of mi2 equals the expected answer by accident. This is more likely than it may
sound, if
one runs the same (C, C++, Fortran) program on several input data sets, and the
same data
values are used in several data sets. Therefore it is a good idea to choose
di_erent
data
values in the data sets, as done above.
2.3 Summary, white-box
testing
Program
statements should be tested as follows:
Statement
Cases to test
if Condition
false and true
for Zero,
one, and more than one iterations
while Zero,
one, and more than one iterations
do-while
One, and more than one, iterations
switch Every case and default
branch must be executed
try-catch-finally
The try clause, every catch clause, and the finally
clause
must be
executed
A
conditional expression such as (x != 0 ? 1000/x : 1) must be
tested for the con-
dition (x !=
0) being true and being false, so that both alternatives have been
evaluated.
8
Short-cut
logical operators such as (x != 0) && (1000/x
> y) must be tested for all
possible
combinations of the truth values of the operands. That is,
(x !=
0) && (1000/x > y)
false
true
false
true true
Note that
the second operand in a short-cut (lazy) conjunction will be computed only if
the
_rst
operand is true (in Java, C#, C, and C++). This is important, for instance,
when the
condition
is (x != 0) && (1000/x > y), where the second operand cannot
be computed if
the _rst
one is false, that is, if x == 0. Therefore it makes no sense to
require that the
combinations
(false, false) and (false, true) be tested.
In a
short-cut disjunction (x == 0) || (1000/x > y) it holds,
dually, that the second
operand
is computed only if the _rst one is false. Therefore, in this case too there
are only
three
possible combinations:
(x ==
0) || (1000/x > y)
true
false
false
false
true
Methods
The test suite must make sure that all methods have been executed. For
recursive
methods
one should test also the case where the method calls itself.
The
test data sets are presented conveniently by two tables, as
demonstrated in this
section.
One table presents, for each statement, what data sets are used, and which
property
of the
input is demonstrated by the test. The other table presents the actual contents
of the
data
sets, and the corresponding expected output.
9
3
Black-box testing
The goal
of black-box testing is to make sure that the program solves the problem it is
supposed
to solve; to make sure that it works. Thus one must have a fairly precise idea
of
the problem
that the program must solve, but in principle one does not need the
program
text when
designing a black-box test. Test data sets (with corresponding expected
outputs)
must be
created to cover `typical' as well as `extreme' input values, and also inputs
that are
described
as exceptional cases or illegal cases in the problem statement. Examples:
_ In a
program to compute the sum of a sequence of numbers, the empty sequence will
be an
extreme, but legal, input (with sum 0).
_ In a
program to compute the average of a sequence of numbers, the empty sequence
will be
an extreme, and illegal, input. The program should give an error message for
this
input, as one cannot compute the average of no numbers.
One
should avoid creating a large collection of input data sets, `just to be on the
safe side'.
Instead,
one must carefully consider what inputs might reveal problems in the program,
and
use
exactly those. When preparing a black-box test, the task is to _nd errors in
the program;
thus
destructive thinking is required. As we shall see below, this is just as
demanding as
programming,
that is, as constructive thinking.
3.1 Example 1 of
black-box testing
Problem:
Given a (possibly empty) sequence of numbers, _nd the smallest and the greatest
of these
numbers.
This is
the same problem as in Section 2.1, but now the point of departure is the above
problem
statement, not any particular program which claims to solve the problem.
First we
consider the problem statement. We note that an empty sequence does not
contain a
smallest or greatest number. Presumably, the program must give an error message
if
presented with an empty sequence of numbers.
The
black-box test might consist of the following input data sets: An empty
sequence (A).
A
non-empty sequence can have one element (B), or two or more elements. In a
sequence with
two
elements, the elements can be equal (C1), or di_erent, the smallest one _rst
(C2) or the
greatest
one _rst (C3). If there are more than two elements, they may appear in increasing
order
(D1), decreasing order (D2), with the greatest element in the middle (D3), or
with the
smallest
element in the middle (D4). All in all we have these cases:
Input
property Input data set
No
numbers A
One
number B
Two
numbers, equal C1
Two numbers,
increasing C2
Two
numbers, decreasing C3
Three
numbers, increasing D1
Three
numbers, decreasing D2
Three
numbers, greatest in the middle D3
Three
numbers, smallest in the middle D4
10
The
choice of these input data sets is not arbitrary. It is inuenced by our own
ideas about
how the
problem might be solved by a program, and in particular how it
might be solved the
wrong
way. For instance, the programmer might have forgotten that the sequence
could be
empty, or
that the smallest number equals the greatest number if there is only one
number,
etc.
The
choice of input data sets may be criticized. For instance, it is not obvious
that data
set C1 is
needed. Could the problem really be solved (wrongly) in a way that would be
discovered
by C1, but not by any of the other input data sets?
The data
sets C2 and C3 check that the program does not just answer by returning the
_rst (or
last) number from the input sequence; this is a relevant check. The data sets
D3 and
D4 check
that the program does not just compare that _rst and the last number; it is
less
clear
that this is relevant.
Input
data set Contents Expected output Actual output
A (no
numbers) Error message
B 17 17
17
C1 27 27
27 27
C2 35 36
35 36
C3 46 45
45 46
D1 53 55
57 53 57
D2 67 65
63 63 67
D3 73 77
75 73 77
D4 89 83
85 83 89
3.2 Example 2 of
black-box testing
Problem:
Given a (possibly empty) sequence of numbers, _nd the greatest di_erence
between
two
consecutive numbers.
We shall
design a black-box test for this problem. First we note that if there is only
zero or
one number, then there are no two consecutive numbers, and the greatest
di_erence
cannot be
computed. Presumably, an error message must be given in this case. Furthermore,
it is
unclear whether the `di_erence' is signed (possibly negative) or absolute
(always non-
negative).
Here we assume that only the absolute di_erence should be taken into account,
so
that the
di_erence between 23 and 29 is the same as that between 29 and 23.
This
gives rise to at least the following input data sets: no numbers (A), exactly
one
number
(B), exactly two numbers. Two numbers may be equal (C1), or di_erent, in
increasing
order
(C2) or decreasing order (C3). When there are three numbers, the di_erence
may be
increasing
(D1) or decreasing (D2). That is:
Input
property Input data set
No
numbers A
One
number B
Two
numbers, equal C1
Two
numbers, increasing C2
Two
numbers, decreasing C3
Three
numbers, increasing di_erence D1
Three
numbers, decreasing di_erence D2
11
The data
sets and their expected outputs might be:
Input
data set Contents Expected output Actual output
A (no
numbers) Error message
B 17
Error message
C1 27 27
0
C2 36 37
1
C3 48 46
2
D1 57 56
59 3
D2 69 65
67 4
One might
consider whether there should be more variants of each of D1 and D2, in which
the
three
numbers would appear in increasing order (56,57,59), or
decreasing (59,58,56), or
increasing
and then decreasing (56,57,55), or decreasing and then
increasing (56,57,59).
Although
these data sets might reveal errors that the above data sets would not, they do
appear
more contrived. However, this shows that black-box testing may be carried on
inde_-
nitely:
you will never be sure that all possible errors have
been detected.
3.3 Example 3 of
black-box testing
Problem:
Given a day of the month day and a month mth, decide
whether they determine a
legal
date in a non-leap year. For instance, 31/12 (the 31st day of the 12th month)
and 31/8
are both
legal, whereas 29/2 and 1/13 are not. The day and month are given as integers,
and
the
program must respond with Legal or Illegal.
To
simplify the test suite, one may assume that if the program classi_es e.g. 1/4
and
30/4 as
legal dates, then it will consider 17/4 and 29/4 legal, too. Correspondingly,
one may
assume
that if the program classi_es 31/4 as illegal, then also 32/4, 33/4, and so on.
There
is no
guarantee that the these assumptions actually hold; the program may be written
in a
contorted
and silly way. Assumptions such as these should be written down along with the
test
suite.
Under
those assumptions one may test only `extreme' cases, such as 0/4, 1/4, 30/4,
and
31/4, for
which the expected outputs are Illegal, Legal, Legal, and Illegal.
12
Contents
Expected output Actual output
0 1
Illegal
1 0
Illegal
1 1
Legal
31 1
Legal
32 1
Illegal
28 2
Legal
29 2
Illegal
31 3
Legal
32 3
Illegal
30 4
Legal
31 4
Illegal
31 5
Legal
32 5
Illegal
30 6
Legal
31 6
Illegal
31 7
Legal
32 7
Illegal
31 8
Legal
32 8
Illegal
30 9
Legal
31 9
Illegal
31 10
Legal
32 10
Illegal
30 11
Legal
31 11
Illegal
31 12
Legal
32 12
Illegal
1 13
Illegal
It is
clear that the black-box test becomes rather large and cumbersome. In fact it
is just as
long as a
program that solves the problem! To reduce the number of data sets, one might
consider
just some extreme values, such as 0/1, 1/0, 1/1, 31/12 and 32/12;
some exceptional
values
around February, such as 28/2, 29/2 and 1/3, and a few typical
cases, such as 30/4,
31/4,
31/8 and 32/8. But that would weaken the test a little: it would not discover
whether
the
program mistakenly believes that June (not July) has 31 days.
13
4
Practical hints about testing
_ Avoid
test cases where the expected output is zero. In Java and C#, static and non-
static
_elds in classes automatically get initialized to 0. The actual output may
therefore
equal the
expected output by accident.
_ In
languages such as C, C++ and Fortran, where variables are not initialized
automat-
ically,
testing will not necessarily reveal uninitialized variables. The accidental
value of
an
uninitialized variable may happen to equal the expected output. This is not
unlikely,
if one
uses the same input data in several test cases. Therefore, choose di_erent
input
data in
di_erent test cases, as done in the preceding sections.
_ Automate
the test, if at all possible. Then it can conveniently be rerun whenever the
program
has been modi_ed. This is usually done as so-called unit tests. For Java,
the JUnit
framework from www.junit.org is a widely used tool, well
supported by
integrated
development environments such as BlueJ and Eclipse. For C#, the NUnit
framework
from www.nunit.org is widely used. Microsoft's Visual Studio Team
System
also
contains unit test facilities.
_ As
mentioned in Section 3 one should avoid creating an excessively large test
suite that
has
redundant test cases. Software evolves over time, and the test suite must
evolve
together
with the software. For instance, if you decide to change a method in your
software
so that it returns a di_erent result for certain inputs, then you must look at
all test
cases for that method to see whether they are still relevant and correct; in
that
situation
it is unpleasant to discover that the same functionality is tested by 13
di_erent
test
cases. A test suite is a piece of software too, and should have no superuous
parts.
_ When
testing programs that have graphical user interfaces with menus, buttons, and
so on,
one must describe carefully step by step what actions | menu choices, mouse
clicks,
and so on | the tester must perform, and what the program's expected reactions
are.
Clearly, this is cumbersome and expensive to carry out manually, so
professional
software
houses use various tools to simulate user actions.
14
5
Testing in perspective
_ Testing
can never prove that a program has no errors, but it can considerably improve
the
con_dence one has in its results.
_ Often it
is easier to design a white-box test suite than a black-box one, because one
can
proceed systematically on the basis of the program text. Black-box testing
requires
more
guesswork about the possible workings of the program, but can make sure that
the
program does what is required by the problem statement.
_ It is a
good idea to design a black-box test at the same time you write the program.
This
reveals unclarities and subtle points in the problem statement, so that you can
take
them into
account while writing the program | instead of having to _x the program
later.
_ Writing
the test cases and the documentation at the same time is also valuable. When
attempting
to write a test case, one often realizes what information users of a method
or class
will be looking for in the documentation. Conversely, when one makes a claim
(`when n+i>arr.length, then
FooException is thrown') about the behaviour of a class
or method
in the documentation, that should lead to one or more test cases that check
this
claim.
_ If you
further use unit test tools to automate the test, you can actually implement
the tests
before you implement the corresponding functionality. Then you can more
con_dently
implement the functionality and measure your implementation progress by
the
number of test cases that succeed. This is called test-driven development.
_ From the
tester's point of view, testing is successful if it does _nd
errors in the program;
in this
case it was clearly not a waste of time to do the test. From the programmer's
point of
view the opposite holds: hopefully the test will not _nd
errors in the program.
When the
tester and the programmer are one and the same person, then there is a
psychological
conict: one does not want to admit to making mistakes, neither when
programming
nor when designing test suites.
_ It is a
useful exercise to design a test suite for a program written by someone else.
This
is a kind
of game: the goal of the programmer is to write a program that contains no
errors;
the goal of the tester is to _nd the errors in the program anyway.
_ It takes
much time to design a test suite. One learns to avoid needless choice
statements
when
programming, because this reduces the number of test cases in the white-box
test. It
also leads to simpler programs that usually are more general and easier to
understand.2
_ It is not
unusual for a test suite to be as large as the software it tests. The C5
Generic
Collection
Library for C#/.NET (http://www.itu.dk/research/c5) implementation has
27,000
lines of code, and its unit test has 28,000 lines.
_ How much
testing is needed? The e_ort spent on testing should be correlated with the
consequences
of possible program errors. A program used just once for computing one's
taxes
need no testing. However, a program must be tested
if errors could a_ect the
safety of
people or animals, or could cause considerable economic losses. If scienti_c
conclusions
will be drawn from the outputs of a program, then it must be tested too.
2A program
may be hard to understand even when it has no choice statements; see Exercises
10 and 11.
15
6
Exercises
1.
Problem: Given a sequence of integers, _nd their average.
Use
black-box techniques to construct a test suite for this problem.
2. Write
a program to solve the problem from Exercise 1. The program should take its
input
from the command line. Run the test suite you made.
3. Use
white-box techniques to construct a test suite for the program written in
Exercise 2,
and run
it.
4.
Problem: Given a sequence of numbers, decide whether they are sorted in
increasing
order.
For instance, 17 18 18 22 is sorted, but 17 18 19 18 is not. The result must be
Sorted or Not
sorted.
Use
black-box techniques to construct a test suite for this problem.
5. Write
a program that solves the problem from Exercise 4. Run the test suite you made.
6. Use
white-box techniques to construct a test suite for the program written in
Exercise 5.
Run it.
7. Write
a program to decide whether a given (day, month) pair in a non-leap year is
legal,
as
discussed in Section 3.3. Run your program with the (black-box) test suite
given
there.
8. Use
white-box techniques to construct a test suite for the program written in
Exercise 7.
Run it.
9.
Problem: Given a (day, month) pair, compute the number of the day in a non-leap
year. For
instance, (1, 1) is number 1; (1,2), which means 1 February, is number 32,
(1,3) is
number 60; and (31,12) is number 365. This is useful for computing the distance
between
two dates, e.g. the length of a course, the duration of a bank deposit, or the
time from
sowing to harvest. The date and month can be assumed legal for a non-leap
year.
Use
black-box techniques to construct a test suite for this problem.
10. We
claim that this Java method solves the problem from Exercise 9.
static
int dayno(int day, int mth)
{
int m =
(mth+9)%12;
return
(m/5*153+m%5*30+(m%5+1)/2+59)%365+day;
}
Test this
method with the black-box test suite you made above.
11. Use
white-box techniques to construct a test suite for the method shown in Exercise
10.
This
appears trivial and useless, since there are no choice statements in the
program
at all.
Instead one may consider jumps (discontinuities) in the processing of data.
In
particular, integer division (/) and remainder (%) produce
jumps of this sort. For
mth
< 3 we have m = (mth + 9) mod 12 = mth + 9, and
for mth _ 3 we have
m = (mth + 9) mod 12 = mth 3. Thus
there is a kind of hidden choice when going
from mth = 2 to mth = 3.
Correspondingly for m / 5 and (m % 5
+ 1) / 2. This can
be used
for choosing test cases for white-box test. Do that.
12.
Consider a method String toRoman(int n) that is
supposed to convert a positive
integer
to the Roman numeral representing that integer, using the symbols I = 1,
V = 5, X = 10, L = 50, C = 100, D = 500 and
M= 1000. The following rules determine
the Roman
numeral corresponding to a positive number:
16
_ In
general, the symbols of a Roman numeral are added together from left to right,
so II = 2, XX = 20, XXXI = 31, and
MMVIII = 2008.
_ The
symbols I, X and C may appear up to three times in a row; the symbol M may
appear
any number of times; and the symbols V, L and D cannot be
repeated.
_ When a
lesser symbol appears before a greater one, the lesser symbol is subtracted,
not
added. So IV = 4, IX = 9, XL = 40 and CM = 900.
The
symbol I may appear once before V and X; the
symbol X may appear once
before L and C; the
symbol C may appear once before D and M; and the
symbols V,
L and D cannot
appear before a greater symbol.
So 45 is
written XLV, not VL; and 49 is written XLIX, not IL; and
1998 is written
MCMXCVIII, not IIMM.
Exercise:
use black-box techniques to construct a test suite for the method toRoman.
This can
be done in two ways. The simplest way is to call toRoman(n) for
suitably chosen
numbers n and
checking that it returns the expected string. The more ambitious way
is to
implement (and test!) the method fromRoman described
in Exercise 12 below, and
use that
to check Roman.
13.
Consider a method int fromRoman(String s) with this
speci_cation: The method
checks
that string s is a well-formed Roman numeral according to the rules in Exer-
cise 12,
and if so, returns the corresponding number; otherwise throws an exception.
Use
black-box techniques to construct a test suite for this method. Remember to
include
also some
ill-formed Roman numerals.
17
No comments:
Post a Comment