Test case similarity based fuzzing
Information technology is growing rapidly. Along with the rapid advancements, a large number of software security violations are taking place, which are causing an overwhelming impact on the organizations and the individuals. In the past few years many methods have been proposed to identify and prevent weaknesses in software programs. “Fuzzing was first proposed by Miller et.al in the year 1990 to detect software vulnerabilities” (Zhang, Liu, Lei, Kung, Csallner, Nystrom & Wang, 2012, p.102). In the process of detecting vulnerabilities the program inputs are changed to form different inputs to identify the various possible paths present in the program. The run time behavior of the programs are monitored on the different inputs to detect exceptions. If any exceptions are found then it can be said that weaknesses are present in the program and the software program is vulnerable.
There are two different kinds of fuzzing namely black box and white box fuzzing. Black box testing does not take the program source code into consideration. It is only used to identify weaknesses in the different inputs that can be given to the program. Whereas, white box testing is used to test all the different possible paths of a program. However many challenges exist for both the categories of fuzzing. According to Zhang et.al, (2012) white box testing fails in identifying the paths that contain complex data structures and unsolvable branch conditions and black box testing fails in testing complex program semantics which are deeper (p.103).
Therefore, to address the challenges of the two kinds of testing Zhang et.al, (2012) proposed a two stage fuzzing process to effectively test complex program semantics (p.103). The aim of the process is to generate test cases that discover new paths which are close to the paths generated by the well-formed inputs but not identified by them. “This fuzzing process uses the techniques of black box fuzzing, code analysis and combination testing to test deep program semantics.”(Zhang et al., 2012, p.103).
In the first stage of fuzzing the incremental fuzzing is done where a part of the input is exactly copied and attached to the actual input to form a new input. Then the new input is given to the program for testing, and the execution of the program is recorded to find the test case similarity between the new input and the actual well-formed input. If the value of the test case similarity is high, then it can be observed that, most part of the execution path followed by the new input is similar to the actual well-formed input. Whereas, if the value of the test case similarity is low then the other parts of the actual well-formed input has to be changed to form a different new input. In the second stage of fuzzing the test cases with high test case similarity values are chosen and are combined to form new test inputs.
Zhang et.al, (2012) says that the test case similarity of two test cases can be...