Wednesday, March 5, 2014

Fraudulent Authorship Signatures and Mormon Scripture

Summary
  • Under certain conditions, it is possible for authors uninformed regarding linguistics and statistical authorship attribution to obscure their own writing style, or imitate the style of another author.
  • Without intent to obscure or change authorship style, many current stylometric methods are able to assign authorship with high accuracy, even with large sets of candidate authors and relatively small text samples.
  • Machine translation through as many as two additional languages does not obscure authorship styles (i.e. write in English and translate through German and Japanese before returning to English).
  • The conditions for successful obfuscation and imitation are different in a number of ways from those necessary for Joseph Smith to have copied, forged, imitated, or invented Mormon scripture.
  • The differences represent the stylometric challenge needed to establish the plausibility of fraud on the part of Joseph Smith.
Rachel Greenstadt's research group at Drexel University has been doing research into the ability of stylometric methods to identify authors of texts when the authors are attempting to obscure their own style or imitate the style of another author. I looked at two of her published papers to better understand how easy it is to commit 'authorship fraud', since that is the implicit (and sometimes explicit) claim regarding Joseph Smith's production of Mormon scripture.
Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12 (November 2012).
Michael Brennan and Rachel Greenstadt. Practical Attacks Against Authorship Recognition Techniques in Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, California, July 2009.
The papers aren't too technical, and are pretty fun reads, if you are interested. Most of the technical parts are in other, referenced papers, since the authors employed already established methods. I'm just going to give you the bottom line results and the details of the writing data sets which are relevant to thinking about authorship of Mormon scripture.

The text samples

  • Authors submitted 5000 or 6500 words of previously written academic text, like essays, papers, articles, business letters, etc.
  • Authors were told to write 500 words describing their neighborhood to a friend who hadn't seen it before, but to disguise their style.
  • Authors were then given a 2500 word passage from a very distinctive author, and told to describe their day, from the time they woke up, in the style of this author. They were to use third person, and again, they each wrote approximately 500 words.
  • The texts from 15-45 authors were then split up randomly into many smaller groups of 2-40 authors. Things were randomized quite thoroughly, and lots of different combinations were tried. Sometimes every possible combination, depending on the stylometric test being used and computational feasibility.

The results

  • All of the stylometric measures tried were able to assign normal texts to the correct authors with high probability (>90%, typically).
  • Almost none of the tests did better than random guessing when trying to assign texts where the author disguised his or her style (some authors did better than others, and some authors had disguised styles assigned to them more often than others).
  • All of the tests were very likely to assign the imitation styles to the author being imitated.

What does this mean for studies of Book of Mormon authorship? From Holmes's data, some type of fraud seems to be the most scientifically defensible alternative if one rules out claims involving some sort of divine intervention or other magical explanations. Thus, our ability to detect obfuscation or imitation in Mormon scripture texts would lend support to this conclusion, while finding that fraud is unlikely lends support to selecting among the various 'miraculous' or 'supernatural' claims. Here is my tentative comparison and estimate of what pursuing the hypothesis of fraud will mean for Mormon scripture.

Comparison of Adversarial Authorship Studies with Mormon Scripture Historical and Stylometric Studies
Fooling Stylometric Measures
Mormon Scripture
Favors/Disfavors Fraud
6500 word reference samples
Multiple 10000 word reference samples
Disfavors. Longer reference texts give more information regarding authors' styles.
500 word samples for classification
10000 word samples for classification
Disfavors. It is presumably harder to hide your style over longer texts.
Simple, familiar topics
Complex, unfamiliar topics
Disfavors. Greenstadt and her coauthors assume it is harder to concentrate on obfuscation or imitation while inventing or remembering complex, new material.
Written in short times
Written or dictated in short times


'Dumbed down' to obscure personal style
Less rich vocabulary in Mormon scripture than in Joseph Smith's personal papers. The same is seen for Joanna Southcott's prophetic voice, although to a lesser degree.
Favors. This was the most common technique to obscure a personal style.
Distictive authorial style for imitation
No historically verified texts being imitated
Disfavors. Joseph Smith apparently created distinct and consistent authorial styles for Nephi, the Doctrine and Covenants, Moroni, and Alma, and nearly consistent for Mormon—all without having any known reference authors to copy. (A rigorous stylometric comparison with The Late War could suggest a text from which one or more of the Book of Mormon styles was imitated. This is an obvious test to be attempted by those interested in Book of Mormon authorship.)
Closed set of authors
Open set of authors
Favors. If we can't detect fraud within a closed set, how can we hope to in an open set?
Adversarial authorship attack known
No direct evidence of fraud
Disfavors. Stylometric methods are demonstrably highly effective at identifying authors when sample sizes are as large as those from Mormon scripture. The only time this is known to be untrue is when authors are deliberately disguising their style or copying another.
Machine translation doesn't disguise style
Claimed to be translations
Disfavors. Authors' styles are preserved through multiple machine translations, consistent with Joseph Smith having 'translated' texts by multiple authors.

Nearly every one of my above observations is debatable to differing degrees. Many of them could be partially tested, strengthening or weakening the hypothesis of fraud. Of course, progress either way is unlikely to be convincing to all parties. Without direct historical evidence of fraud, all that could be shown in its favor is that Mormon scripture has very similar styles to some proposed source text or texts. And for those who refuse to entertain possibilities like actual ancient texts, communication with God or angels, inexplicable inhuman genius, or inexplicable access to obscure information on the part of Joseph Smith, all evidence against fraud will do is require that the fraud be more ingenious and conspiratorial. Still, some of the questions seem worth investigating, since the methods are becoming more and more automated.

No comments:

Post a Comment