Difference between revisions of "Chapter 4: Software Testing"

From SWEBOK
Jump to: navigation, search
(Software Testing Fundamentals)
Line 117: Line 117:
  
 
==== Definitions of Testing and Related Terminology ====
 
==== Definitions of Testing and Related Terminology ====
 +
 +
Definitions of testing and testing-related terminology
 +
are provided in the cited references and
 +
summarized as follows.
  
 
==== Faults vs. Failures ====
 
==== Faults vs. Failures ====
 +
 +
Many terms are used in the software engineering
 +
literature to describe a malfunction: notably ''fault'',
 +
''failure'', and ''error'', among others. This terminology
 +
is precisely defined in [3, c2]. It is essential
 +
to clearly distinguish between the cause of a malfunction
 +
(for which the term fault will be used
 +
here) and an undesired effect observed in the system’s
 +
delivered service (which will be called a
 +
failure). Indeed there may well be faults in the
 +
software that never manifest themselves as failures
 +
(see Theoretical and Practical Limitations
 +
of Testing in section 1.2, Key Issues). Thus testing
 +
can reveal failures, but it is the faults that can
 +
and must be removed [3]. The more generic term
 +
defect can be used to refer to either a fault or a
 +
failure, when the distinction is not important [3].
 +
 +
However, it should be recognized that the cause
 +
of a failure cannot always be unequivocally identified.
 +
No theoretical criteria exist to definitively
 +
determine, in general, the fault that caused an
 +
observed failure. It might be said that it was the
 +
fault that had to be modified to remove the failure,
 +
but other modifications might have worked just
 +
as well. To avoid ambiguity, one could refer to
 +
failure-causing inputs instead of faults—that is,
 +
those sets of inputs that cause a failure to appear.
  
 
=== Key Issues ===
 
=== Key Issues ===
  
 
==== Test Selection Criteria / Test Adequacy Criteria (Stopping Rules) ====
 
==== Test Selection Criteria / Test Adequacy Criteria (Stopping Rules) ====
 +
 +
A test selection criterion is a means of selecting
 +
test cases or determining that a set of test cases
 +
is sufficient for a specified purpose. Test adequacy
 +
criteria can be used to decide when sufficient
 +
testing will be, or has been accomplished
 +
[4] (see Termination in section 5.1, Practical
 +
Considerations).
  
 
==== Test Effectiveness / Objectives for Testing ====
 
==== Test Effectiveness / Objectives for Testing ====
 +
 +
Testing effectiveness is determined by analyzing
 +
a set of program executions. Selection of tests to
 +
be executed can be guided by different objectives:
 +
it is only in light of the objective pursued that the
 +
effectiveness of the test set can be evaluated.
  
 
==== Testing for Defect Discovery ====
 
==== Testing for Defect Discovery ====
 +
 +
In testing for defect discovery, a successful test
 +
is one that causes the system to fail. This is quite
 +
different from testing to demonstrate that the
 +
software meets its specifications or other desired
 +
properties, in which case testing is successful if
 +
no failures are observed under realistic test cases
 +
and test environments.
  
 
==== The Oracle Problem ====
 
==== The Oracle Problem ====
 +
 +
An oracle is any human or mechanical agent that
 +
decides whether a program behaved correctly
 +
in a given test and accordingly results in a verdict
 +
of “pass” or “fail.” There exist many different
 +
kinds of oracles; for example, unambiguous
 +
requirements specifications, behavioral models,
 +
and code annotations. Automation of mechanized
 +
oracles can be difficult and expensive.
  
 
==== Theoretical and Practical Limitations of Testing ====
 
==== Theoretical and Practical Limitations of Testing ====
 +
 +
Testing theory warns against ascribing an unjustified
 +
level of confidence to a series of successful
 +
tests. Unfortunately, most established results of
 +
testing theory are negative ones, in that they state
 +
what testing can never achieve as opposed to what
 +
is actually achieved. The most famous quotation
 +
in this regard is the Dijkstra aphorism that “program
 +
testing can be used to show the presence of
 +
bugs, but never to show their absence” [5]. The
 +
obvious reason for this is that complete testing is
 +
not feasible in realistic software. Because of this,
 +
testing must be driven based on risk [6, part 1]
 +
and can be seen as a risk management strategy.
  
 
==== The Problem of Infeasible Paths ====
 
==== The Problem of Infeasible Paths ====
 +
 +
Infeasible paths are control flow paths that cannot
 +
be exercised by any input data. They are a significant
 +
problem in path-based testing, particularly
 +
in automated derivation of test inputs to exercise
 +
control flow paths.
  
 
==== Testability ====
 
==== Testability ====
 +
 +
The term “software testability” has two related
 +
but different meanings: on the one hand, it refers
 +
to the ease with which a given test coverage
 +
criterion can be satisfied; on the other hand, it
 +
is defined as the likelihood, possibly measured
 +
statistically, that a set of test cases will expose
 +
a failure if the software is faulty. Both meanings
 +
are important.
  
 
=== Relationship of Testing to Other Activities ===
 
=== Relationship of Testing to Other Activities ===
 +
 +
Software testing is related to, but different from,
 +
static software quality management techniques,
 +
proofs of correctness, debugging, and program
 +
construction. However, it is informative to consider
 +
testing from the point of view of software
 +
quality analysts and of certifiers.
 +
 +
* Testing vs. Static Software Quality Management Techniques (see Software Quality Management Techniques in the Software Quality KA [1*, c12]).
 +
* Testing vs. Correctness Proofs and Formal Verification (see the Software Engineering Models and Methods KA [1*, c17s2]).
 +
* Testing vs. Debugging (see Construction Testing in the Software Construction KA and Debugging Tools and Techniques in the Computing Foundations KA [1*, c3s6]).
 +
* Testing vs. Program Construction (see Construction Testing in the Software Construction KA [1*, c3s2]).
  
 
== Test Levels ==
 
== Test Levels ==

Revision as of 01:42, 24 August 2015

Acronyms
API
Application Program Interface
TDD
Test-Driven Development
TTCN3
Testing and Test Control Notation Version 3
XP
Extreme Programming
Introduction

Software testing consists of the dynamic verification that a program provides expected behaviors on a finite set of test cases, suitably selected from the usually infinite execution domain. In the above definition, italicized words correspond to key issues in describing the Software Testing knowledge area (KA):

  • Dynamic: This term means that testing always implies executing the program on selected inputs. To be precise, the input value alone is not always sufficient to specify a test, since a complex, nondeterministic system might react to the same input with different behaviors, depending on the system state. In this KA, however, the term “input” will be maintained, with the implied convention that its meaning also includes a specified input state in those cases for which it is important. Static techniques are different from and complementary to dynamic testing. Static techniques are covered in the Software Quality KA. It is worth noting that terminology is not uniform among different communities and some use the term “testing” also in reference to static techniques.
  • Finite: Even in simple programs, so many test cases are theoretically possible that exhaustive testing could require months or years to execute. This is why, in practice, a complete set of tests can generally be considered infinite, and testing is conducted on a subset of all possible tests, which is determined by risk and prioritization criteria. Testing always implies a tradeoff between limited resources and schedules on the one hand and inherently unlimited test requirements on the other.
  • Selected: The many proposed test techniques differ essentially in how the test set is selected, and software engineers must be aware that different selection criteria may yield vastly different degrees of effectiveness. How to identify the most suitable selection criterion under given conditions is a complex problem; in practice, risk analysis techniques and software engineering expertise are applied.
  • Expected: It must be possible, although not always easy, to decide whether the observed outcomes of program testing are acceptable or not; otherwise, the testing effort is useless. The observed behavior may be checked against user needs (commonly referred to as testing for validation), against a specification (testing for verification), or, perhaps, against the anticipated behavior from implicit requirements or expectations (see Acceptance Tests in the Software Requirements KA).

In recent years, the view of software testing has matured into a constructive one. Testing is no longer seen as an activity that starts only after the coding phase is complete with the limited purpose of detecting failures. Software testing is, or should be, pervasive throughout the entire development and maintenance life cycle. Indeed, planning for software testing should start with the early stages of the software requirements process, and test plans and procedures should be systematically and continuously developed—and possibly refined—as software development proceeds. These test planning and test designing activities provide useful input for software designers and help to highlight potential weaknesses, such as design oversights/contradictions, or omissions/ambiguities in the documentation.

For many organizations, the approach to software quality is one of prevention: it is obviously much better to prevent problems than to correct them. Testing can be seen, then, as a means for providing information about the functionality and quality attributes of the software and also for identifying faults in those cases where error prevention has not been effective. It is perhaps obvious but worth recognizing that software can still contain faults, even after completion of an extensive testing activity. Software failures experienced after delivery are addressed by corrective maintenance. Software maintenance topics are covered in the Software Maintenance KA.

In the Software Quality KA (see Software Quality Management Techniques), software quality management techniques are notably categorized into static techniques (no code execution) and dynamic techniques (code execution). Both categories are useful. This KA focuses on dynamic techniques.

Software testing is also related to software construction (see Construction Testing in the Software Construction KA). In particular, unit and integration testing are intimately related to software construction, if not part of it.

Breakdown of Topics for Software Testing

The breakdown of topics for the Software Testing KA is shown in Figure 4.1. A more detailed breakdown is provided in the Matrix of Topics vs. Reference Material at the end of this KA.

The first topic describes Software Testing Fundamentals. It covers the basic definitions in the field of software testing, the basic terminology and key issues, and software testing’s relationship with other activities.

The second topic, Test Levels, consists of two (orthogonal) subtopics: the first subtopic lists the levels in which the testing of large software is traditionally subdivided, and the second subtopic considers testing for specific conditions or properties and is referred to as Objectives of Testing. Not all types of testing apply to every software product, nor has every possible type been listed.

The test target and test objective together determine how the test set is identified, both with regard to its consistency—how much testing is enough for achieving the stated objective—and to its composition—which test cases should be selected for achieving the stated objective (although usually “for achieving the stated objective” remains implicit and only the first part of the two italicized questions above is posed). Criteria for addressing the first question are referred to as test adequacy criteria, while those addressing the second question are the test selection criteria.

Several Test Techniques have been developed in the past few decades, and new ones are still being proposed. Generally accepted techniques are covered in the third topic.

Test-Related Measures are dealt with in the fourth topic, while the issues relative to Test Process are covered in the fifth. Finally, Software Testing Tools are presented in topic six.

1 Software Testing Fundamentals

1.1 Testing-Related Terminology

1.1.1 Definitions of Testing and Related Terminology

Definitions of testing and testing-related terminology are provided in the cited references and summarized as follows.

1.1.2 Faults vs. Failures

Many terms are used in the software engineering literature to describe a malfunction: notably fault, failure, and error, among others. This terminology is precisely defined in [3, c2]. It is essential to clearly distinguish between the cause of a malfunction (for which the term fault will be used here) and an undesired effect observed in the system’s delivered service (which will be called a failure). Indeed there may well be faults in the software that never manifest themselves as failures (see Theoretical and Practical Limitations of Testing in section 1.2, Key Issues). Thus testing can reveal failures, but it is the faults that can and must be removed [3]. The more generic term defect can be used to refer to either a fault or a failure, when the distinction is not important [3].

However, it should be recognized that the cause of a failure cannot always be unequivocally identified. No theoretical criteria exist to definitively determine, in general, the fault that caused an observed failure. It might be said that it was the fault that had to be modified to remove the failure, but other modifications might have worked just as well. To avoid ambiguity, one could refer to failure-causing inputs instead of faults—that is, those sets of inputs that cause a failure to appear.

1.2 Key Issues

1.2.1 Test Selection Criteria / Test Adequacy Criteria (Stopping Rules)

A test selection criterion is a means of selecting test cases or determining that a set of test cases is sufficient for a specified purpose. Test adequacy criteria can be used to decide when sufficient testing will be, or has been accomplished [4] (see Termination in section 5.1, Practical Considerations).

1.2.2 Test Effectiveness / Objectives for Testing

Testing effectiveness is determined by analyzing a set of program executions. Selection of tests to be executed can be guided by different objectives: it is only in light of the objective pursued that the effectiveness of the test set can be evaluated.

1.2.3 Testing for Defect Discovery

In testing for defect discovery, a successful test is one that causes the system to fail. This is quite different from testing to demonstrate that the software meets its specifications or other desired properties, in which case testing is successful if no failures are observed under realistic test cases and test environments.

1.2.4 The Oracle Problem

An oracle is any human or mechanical agent that decides whether a program behaved correctly in a given test and accordingly results in a verdict of “pass” or “fail.” There exist many different kinds of oracles; for example, unambiguous requirements specifications, behavioral models, and code annotations. Automation of mechanized oracles can be difficult and expensive.

1.2.5 Theoretical and Practical Limitations of Testing

Testing theory warns against ascribing an unjustified level of confidence to a series of successful tests. Unfortunately, most established results of testing theory are negative ones, in that they state what testing can never achieve as opposed to what is actually achieved. The most famous quotation in this regard is the Dijkstra aphorism that “program testing can be used to show the presence of bugs, but never to show their absence” [5]. The obvious reason for this is that complete testing is not feasible in realistic software. Because of this, testing must be driven based on risk [6, part 1] and can be seen as a risk management strategy.

1.2.6 The Problem of Infeasible Paths

Infeasible paths are control flow paths that cannot be exercised by any input data. They are a significant problem in path-based testing, particularly in automated derivation of test inputs to exercise control flow paths.

1.2.7 Testability

The term “software testability” has two related but different meanings: on the one hand, it refers to the ease with which a given test coverage criterion can be satisfied; on the other hand, it is defined as the likelihood, possibly measured statistically, that a set of test cases will expose a failure if the software is faulty. Both meanings are important.

1.3 Relationship of Testing to Other Activities

Software testing is related to, but different from, static software quality management techniques, proofs of correctness, debugging, and program construction. However, it is informative to consider testing from the point of view of software quality analysts and of certifiers.

  • Testing vs. Static Software Quality Management Techniques (see Software Quality Management Techniques in the Software Quality KA [1*, c12]).
  • Testing vs. Correctness Proofs and Formal Verification (see the Software Engineering Models and Methods KA [1*, c17s2]).
  • Testing vs. Debugging (see Construction Testing in the Software Construction KA and Debugging Tools and Techniques in the Computing Foundations KA [1*, c3s6]).
  • Testing vs. Program Construction (see Construction Testing in the Software Construction KA [1*, c3s2]).

2 Test Levels

2.1 The Target of the Test

2.1.1 Unit Testing

2.1.2 Integration Testing

2.1.3 System Testing

2.2 Objectives of Testing

2.2.1 Acceptance / Qualification Testing

2.2.2 Installation Testing

2.2.3 Alpha and Beta Testing

2.2.4 Reliability Achievement and Evaluation

2.2.5 Regression Testing

2.2.6 Performance Testing

2.2.7 Security Testing

2.2.8 Stress Testing

2.2.9 Back-to-Back Testing

2.2.10 Recovery Testing

2.2.11 Interface Testing

2.2.12 Configuration Testing

2.2.13 Usability and Human Computer Interaction Testing

3 Test Techniques

3.1 Based on the Software Engineer's Intuition and Experience

3.1.1 Ad Hoc

3.1.2 Exploratory Testing

3.2 Input Domain-Based Techniques

3.2.1 Equivalence Partitioning

3.2.2 Pairwise Testing

3.2.3 Boundary-Value Analysis

3.2.4 Random Testing

3.3 Code-Based Techniques

3.3.1 Control Flow-Based Criteria

3.3.2 Data Flow-Based Criteria

3.3.3 Reference Models for Code-Based Testing

3.4 Fault-Based Techniques

3.4.1 Error Guessing

3.4.2 Mutation Testing

3.5 Usage-Based Techniques

3.5.1 Operational Profile

3.5.2 User Observation Heuristics

3.6 Model-Based Testing Techniques

3.6.1 Decision Tables

3.6.2 Finite-State Machines

3.6.3 Formal Specifications

3.6.4 Workflow Models

3.7 Techniques Based on the Nature of the Application