Stylometry is a quantitative investigation into the characteristics of an author’s style. Lann (1995) defines the term as a technique “to grasp the often elusive character of an author's style, or at least part of it, by quantifying some of its features” (1995:271). Matthews and Merriam (1993) agree claiming “Stylometry attempts to capture quantitatively the essence of an individual’s use of language” (1993:203). To put it simply, stylometric analysis is an approach to the investigation of characteristics within literary works through numerical quantitative methods. The relationship between quantitative aspects and literary phenomena is very old. Numerous studies have attempted to explain the stylistic and linguistic properties of authors in terms of quantitative methods and these have been more developed with the availability of computational methods since these methods are accepted by many as more accurate than non-computational ones.
Many scholars (Jockers et al., 2008; Altintas et al., 2007; Burrows, 2007; Burrows, 2005; Paton and Can, 2004; Burrows, 2003; Holmes, 1998; Holmes and Forsyth, 1995; Burrows, 1987) agree that the development of computational methods has enhanced the efficiency and accuracy of stylometric studies since computer systems have capacities for analyzing large quantities of data. In turn, Stylometry is often met by objections from many critics. They argue that the computational approach of Stylometry can never give results that can be universally accepted as definitive. (Delcourt, 1992; Smith, 1992; Smith, 1985). Holmes (1998; 1994) argues that there are two main problems about Stylometry that inhibit its acceptance within humanities scholarship. First, there is no consensus as to correct methodology or technique (Holmes, 1998). Second, “no stylometrist has managed to establish a methodology which is better able to capture the style of a text than that based on lexical items (Holmes, 1994: 87). In the face of such contradictory views, the present study agrees with the results of the many studies indicating that stylometric studies-aided by modern computational tools- have had reasonable success in identifying the linguistic and stylistic characteristics of many authors and even in confirming the results of conventional criticism.
Stylometry & Multivariate Analysis
Stylometric studies have begun to draw on multivariate analysis techniques for analysis (Binongo and Smith, 1999). The applications of multivariate analysis in stylometric studies date back to the sixties of the 20th century. This was when Mosteller and Wallace (1964), two American statisticians, employed statistical analysis to investigate the mystery of authorship of the Federalist papers in the early 1960s, using function words as discriminators. The approach succeeded in identifying the writers of the Federalist papers who tried to persuade the New York citizens to ratify the constitution (Mosteller and Wallace, 1984; Mosteller and Wallace, 1964)....