Wednesday, June 18, 2014

Literary Detection: Part 2

Oversimplifying the Complex

Regarding the authorship of the Homeric poems, A. Q. Morton wrote:
“. . . his role is not that of the composer but, at least in part, that of the compiler. The role of the compiler, as of the modern editor, can range from literally using scissors and paste to make a narrative from fragments of other sources without adding one word of original composition, to the other extreme of taking material and completely digesting it so that even when the source is known and exists separately the version included in the compilation cannot be distinguished from an original composition of the redactor. . . . “Between these extremes of altering nothing or transforming everything, there lies a complete spectrum. . . . So before anyone can ask the question, Did Homer create the Iliad and/or the Odyssey? must come a preliminary question--Who do you define Homer to be?” p. 159
Questions of style changes due to author/editor/scribe interactions is immensely relevant to Book of Mormon stylometry. Instead of asking who we define Homer to be, we are faced with defining, Who wrote the Book of Mormon? Just Joseph Smith doesn't match the stylometric evidence. Scribes can't explain differences within the Book of Mormon, but can they explain differences between the Book of Mormon, Joseph's personal writings, and other LDS scripture? If we accept the Book of Mormon's own narrative, how did scribes and abridgements affect the style? The questions for Homer are very relevant, although the answers are quite different.

In discussing authorship of the Illiad and the Odyssey, Morton illustrates how the mode of composition can constrain stylometric features so that two authors appear to be the same in some features simply because they are using the same mode of composition (e.g. iambic pentameter, listing ships and armies, etc.). This observation strengthens the assertion that Book of Mormon language appears 19th century because Joseph was dictating with pseudo-biblical language. It also suggests an explanation for why the war chapters would have so many superficial similarities with a book on the war of 1812 written imitating the same biblical style. Certain features are going to match. Discovering that they do is not a grand revelation, but a confirmation of what would have been predicted from earlier stylometric studies.

But getting back to Morton's conclusion regarding the Homeric poems: Statistically there is not evidence of multiple authorship for the majority of the books in the Homeric poems. There are two exceptions. The first (Iliad Book 2) is a list of ships and armies involved in the war, and is believed to predate Homer. The second (Iliad Book 11), it is not possible to decide through statistics if it was a later addition or an undigested source.
“It may seem a very negative attitude to take, to say that the problem of authorship and integrity in Homer is unique and so a scientific answer to either question is unlikely to emerge, but this is the case. The poems are composed in a manner quite different from any others which have survived, and without comparable material the differences which are detectable cannot be separated into those due to the mode of composition and those due to a difference in authorship or of origin. The best service which an investigation of this kind may do is to show that many scholars have been too simple-minded in their approach to Homer. The poems contain many more problems than some people have supposed.
“It may well be that one of the oldest arguments, one with some scientific basis, is the best. No one knows what the population of ‘Greece’ in Homer’s day was, but that the country should nourish two geniuses of such stature at much the same time is a coincidence beyond acceptance. Against this can be set the view that the poems have no author, but are the result of generations of accretion and adaptation. Of this hypothesis it can be said that it is a scholar’s dream. It can be endlessly argued but never resolved.” p. 164
From stylometry alone, with minimal reference to content, value, or date of creation, we have a similar choice with the Book of Mormon. Was it translated from the writings of multiple ancient authors with their varied styles preserved, or was it constructed in a matter of months by a genius young man who absorbed and synthesized in some unexplained way ideas from a vast variety of sources and then dictated without revision in several self-consistent authorship styles? If the second hypothesis weren't competing with people's disbelief in Mormonism, there would be no debate between the two.

Jane Austen and the Other Lady 

“. . . the foundations of stylometry are habits shown not to change under the circumstances of the particular problem being investigated.” p. 189

In the 1970s, an unknown writer and great fan of Jane Austen completed a book which Austen had left unfinished. Stylometry was able to clearly distinguish between the portions written by Austen and those written by the "Other Lady". After reading how relatively easily stylometric measurements were fooled through imitation of a different author in the adversarial authorship studies I reviewed, I wondered how stylometry could so confidently distinguish Jane Austen's writing from a skilled writer carefully crafting an imitation. In the adversarial studies, most of the authors were minimally trained (college education, but not typically in writing), and they only had a short time to write the passages. How could they fool stylometry while the "Other Lady" failed? I can identify a few differences:
  • The Austen study made absolute comparisons rather than forcing a choice of the closest match. In other words it was an open-set analysis which asked, "Is this passage like Jane Austen or not?" rather than a closed-set analysis which asked, "Which of these five authors is this passage most like?"
  • The numbers of words used in the test samples were probably a few thousand words written by the "Other Lady" rather than the 1,000 words written by the imitators in the other study. This meant that fewer meaningful stylometric features could be extracted from the adversarial authorship tests than from the Austen texts. 
  • Noncontextual word pairings were analyzed in the Austen study. The automated programs tested in the adversarial authorship study did not use word pairings. 
  • The adversarial authorship study selected an author with an unusually distinct prose style from which the untrained imitators extracted obvious traits. The obvious traits included long words and descriptive language (I'm interpreting a little). Word and sentence lengths and vocabulary richness were among the features measured, and all would be affected by such an attempt at imitation. The Austen study did not use such intuitively reproducible features for analysis.
In other words, the adversarial authorship study was in some ways designed to give the imitators the greatest chance of success. They didn’t have to match the target author’s style, they just had to get closer to that style than to their own. They only had to write 1000 words, so stylometric features that might have given them away in 5000 words would not have been an issue. The study employed Method 1: word lengths, letter usage, and punctuation; Method 2: number of different words, lexical density, Gunning-Fog readability index, character count without whitespace, average syllables per word, sentence count, average sentence length, and an alternative readability measure; Method 3: a measure of vocabulary richness. None of these methods employed the noncontextual, function words or word pairings because such pairings typically require larger text sizes to be statistically significant, and they are not as easily automated. The “Other Lady” did match some characteristic punctuation habits and other features, but missed enough noncontextual word pair frequencies to be clearly identified as different.

Imitating Sherlock Holmes

Another example where imitation was distinguishable from the original was in the attempt of two writers and Sherlock Holmes fans to write a new Sherlock Holmes story. Many knowledgeable readers agreed that the two imitators had successfully copied the style of Holmes, yet Morton reports the following results of stylometric examination of the imitation:
“. . . when perpetrating a counterfeit which a number of people well qualified in literary criticism regard as being very like Holmes, neither imitator can reproduce the habits of Holmes which are of value to the stylometrist nor can he apparently suppress his own habits.” p. 193 
It was known from the beginning that the text was an imitation, and even that the first part was written by one author and the second by another. However, noncontextual word pairings easily differentiated the imitators from the original, and were also able to differentiate between the two different imitators. The study was further able to assign the portions of the text to the appropriate imitators through comparison with stories published by one of the imitators. As might be expected, Morton touts the value of careful statistical work and criticizes those who are overconfident in their non-statistical literary analysis--even if they are highly skilled at it:
“It is not what we do not know that causes us to remain ignorant, it is what we assume that we know when we do not know. In the resolution of cases of disputed authorship a feeling for literature is not without value but it comes a long way after a few well chosen experiments.” p. 194

The Book of Mormon Fraud Debate

The Book of Mormon, it seems, falls firmly with the Austen and Holmes studies rather than with the adversarial authorship studies in a number of regards. It is an open-set question, and the LDS researchers have treated it as such. Text sizes are large enough to extract features that appear very hard to imitate, including noncontextual word pairings, and LDS researchers have examined these features and thoroughly reported their results. Additional factors further support our confidence in stylometric findings regarding the Book of Mormon. In many ways it is not unique linguistically. We have examples of hundreds of texts written in the early 19th century using pseudo-biblical language, so we have great potential to control for how using pseudo-biblical language affects shifts in stylometric features. Joanna Southcott failed to produce such a wide range of styles, but the potential is there for a critic to show that stylistic imitation could explain the variety of styles evident in the Book of Mormon. I personally think, based solely on objective stylometric measurements and nothing else, that it's generous to give the chances of authorial fraud at one in a million, but show me statistics that disagree and I will reconsider.

No comments:

Post a Comment