Friday, March 14, 2014

Finding Darwin's God--a structural summary

Kenneth R. Miller has been a hero of mine ever since I heard one of his rebuttal's of Michael Behe's example of a mousetrap for Intelligent Design. The idea is, a mousetrap has five parts. Take any one away and it doesn't work. This is evidence that all five parts had to be made and put together simultaneously, or you would never have a mousetrap. By analogy to biochemistry, an eye has lots of parts needed to see. Take away one part, and the eye can't detect light correctly, and you won't be able to see. None of these parts do anything else, so it's unbelievably improbable that they were created and assembled into an eye through the processes of Natural Selection or other evolutionary mechanisms.

Ken Miller's very visual response to the debate tactics of the mousetrap imagery is to start taking apart the mousetrap. First he shows how you can make a functioning mousetrap with only 4 parts. Then with 3 parts. Then he starts using the parts for different things, like a tie clip or a paperweight. In the end, he has used every piece, and shown how they might have come about for different reasons than just being put in a mousetrap. I loved it.

You see, I first encountered Michael Behe through his book, Darwin's Black Box. It was on deep discount in the textbook section of the BYU bookstore (now, I wonder what class it was for and am a little bothered by it), it was about Evolution and God, and I had no idea what Intelligent Design was, so I picked it up. I never finished it. I didn't have a good response for it at the time, but somehow it didn't sit well with me. This wasn't my God. Eventually, Ken Miller and others like him gave me words for my impressions, and that's how he became one of my heroes.

I don't really know who Ken Miller is. I know he's a Nobel Prize winning biologist, an award winning teacher at Brown University, and a believing Christian of some mainstream denomination. I don't know what his scientific claim to fame is, but I share his love of both science and faith.

I recently borrowed Ken Miller's book, Finding Darwin's God: a scientist's search for common ground between God and Evolution. I'm not reading it in detail, because so little of it is new to me. His style is engaging, so I've read parts I thought would be really boring, anyway. I recommend the book highly for anyone just entering into this debate on Evolution and faith in God. Here I want to lay out the structure of his arguments so that I have it to refer back to in the future.

  1. Darwin's Apple: How the author found out about the conflict between Evolution and religion.
  2. Eden's Children: What the Theory of Evolution is and a number of striking evidences supporting it.
  3. God the Charlatan: Young Earth Creationism
    • Young Earth Creationists make God a charlatan. Not only must they reject the evidences for evolution of species, but fundamental methods and discoveries of geology, physics, and astronomy. The entire scientific process must be doubted to the point of near uselessness.
    • Various observations from radioisotope dating are the primary evidence used in Ken Miller's response.
  4. God the Magician: Some critics of evolution look for evidences of problems or incompleteness within scientific explanations.
    • Punctuated equilibrium becomes evidence that God must be making the sudden changes in species.
    • This is really about casting doubt on the conclusions of scientists to allow room for a particular conception of an interventionist God (I believe in a different kind of interventionist God).
    • Response: rapid evolutionary change has now been measured both in the lab and in the field. Scientific discovery continuously reduces the incompleteness and resolves the problems. 
    • "In the final analysis, God is not a magician who works cheap tricks. Rather, His magic lies in the fabric of the universe itself. The fossil record is not a series of sequential tricks fabricated for no purpose other than to mislead. The fossil record represents, with all of its imperfections, an epic of evolutionary change, the history of life on this planet grand in its range and diversity and magnificent in its detail. It is the record of the historical process that led to us. It is the real thing, and so are we.
  5. God the Mechanic: God's handiwork is pushed back to one of the distant, hard problems of life, like putting together a vast array of complex cells and biochemical structures. Evolution can happen now that those exist, but a designer must have put together the basic parts.
    • "irreducible complexity" is a biochemical repackaging of the 19th century "argument from design". Complexity is evidence of design.
    • Miller refers to a powerful example against irreducible complexity found in Richard Dawkins's The Blind Watchmaker. Echolocation in bats.
    • cilium taken apart by reference to 'incomplete', but functioning cilia found in nature.
    • combinatorial, random mutation used to 'design' proteins.
    • evolution of new germs through enforced changes in environment.
    • evolution of different isocitrate dehydrogenase enzymes has been successfully explained.
    • evolution of antifreeze proteins has been explained
    • evolution of cytochrome c oxidase proton pump has been explained with evolutionary mechanisms
    • Kreb's cycle machinery shown to have individual, independent functions.
    • etc.
  6. The Gods of Disbelief: In presenting Evolution as antithetical to religious worldviews, scientists and the scientific establishment are in part responsible for creating the backlash against acceptance of Evolution and the scepticism regarding scientific evidence and reasoning.
    • Richard Dawkins, Stephen Jay Gould, William Provine, Daniel Dennett, E. O. Wilson quoted espousing views that science is antithetical to God. That life is inherently meaningless.
    •  Academia can deal with religion in many ways. It can't deal with belief.
    • Stephen Jay Gould has produced a Harvard PhD who is now a young earth creationist, because of the bleak picture of life that some scientists have espoused.
    • This polarization is beyond science, is a problem, and Miller thinks it is also wrong.
  7.  Beyond Materialism
    • ". . . . The Western Deity, God of the Jews, the Christians, and the Muslims, has always been regarded more as the architect of the universe than the magician of nature. . .
    • "This very Western idea of God as supreme lawgiver and cosmic planner helped to give the scientific enterprise its start. . . . [the Westerner could not avoid the] feeling that the workings of nature might reflect the glories of the Lord. . . . and this is one of the reasons we can say . . . that true, empirical, experimental science developed first in the West. . . . Western scholars, inspired by the one true God of Moses and Muhammad, developed algebra, calculated the movements of the stars, and explained the cycle of the seasons.
    • "In the past, the idea that nature was a complete, functional, self-sufficient system was seldom thought to be an argument against the existence of God. Quite the contrary, it was regarded as proof of the wisdom and skill and care of that great architect."
    • Nature is fundamentally non-determisist. This is the reality shown by quantum mechanical experiments.
    • Schoedinger 
      • argued that our bodies must be so big to insulate us from the unpredictability of atomic-level events.
      • predicted an aperiodic crystal (DNA) 'code-script' that would maintain the fidelity of the genetic code.
      • predicted that atomic-level unpredictability would cause mutation in this script, despite its relative stability
    • Science only incompletely describes nature--and seemingly will always remain in this state.
    • "the science-versus-religion argument has been framed within the classical view of physics that shaped such dialogues in the last century. Such misunderstandings persist because the field of biology has not yet fully reacted to the revolution in modern physics. It hasn't had to, because nearly all experimental biology takes place in a realm where quantum effects are averaged out into statistical laws."
    • indeterminate is not a synonym for random.
    • strict, predictable determinism is the only alternative to unpredictability.
    • "What matters is the straightforward, factual, strictly scientific recognition that matter in the universe behaves in such a way that we can never achieve complete knowledge of any fragment of it, and that life itsel is structured in a way that allows biological history to pivot directly on these uncertainties."
    • If one believes God can do anything now, and we can also understand the events in materialistic ways, why would that believer insist God work in a non-materialistic way in the past?
    • Scientific materialism is pervasively self-limiting.
    • "It could be just a puzzling, curious fact about the nature of the universe. Or it could be the clue that allows us to bind everything, including evolution, into a worldview in which science and religion are partners, not rivals, in extending human understanding a step beyond the bounds of mere materialism."
  8. The Road Back Home
    • "When I tell my students I believe in God, they suppose that I couldn't possibly mean anything traditional, but rather something smart, modern, and sophisticated. Something subtle. Maybe I mean that God is love, or God is the universe itself, or, being a scientist, maybe I mean that God is the laws of nature.
    • "Well, I don't. Such views, however carefully stated, dilute religion to the point of meaninglessness."
    • Three common beliefs:
      • the primacy of God in the universe
      • we exist as the direct result of God's will
      • God has revealed Himself to us
    • "Any God worthy of the name has to be capable of miracles. . ."
    • "God has fashioned a self-consistent reality in nature, and He allows us to work within it."
    • "Ian Barbour claims that human freedom would be impossible without God's willingness to limit His actions."
    • And more
  9. Finding Darwin's God
    •  "Over the years I have struggled to come up with a simple but precise answer to [the question 'what kind of God do you believe in']. Eventually I found it. . . 
    • [from On the Origin of Species] "'There is a grandeur in this view of life; with its several powers having been originally breathed by the Creator into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most wonderful and most beautifil have been, and are being evolved.' What kind of God do I believe in? The answer is in those words. I believe in Darwin's God."
Miller has many good things to say, and many helpful ways to understand how God and Nature can seamlessly fit together. I like many of them. I don't think some of them are helpful, because I don't believe in all of the same characteristics of God as the Christian creeds. I gave up on trying to summarize his points, because with chapter 7 they start ranging far and wide. In the final analysis, I believe in the God of Evolution, too. I believe in the God who is present in our lives and present in all we see around us, who organized and governs it, who lets it all act with agency. I believe in Darwin's God.

Wednesday, March 12, 2014

Book of Mormon and The Late War similarities

With what likelihood can the narrative element similarities between the Book of Mormon and The Late War be explained by similarities in style and genre?

Duane and Chris Johnson have proposed statistical tools to show direct literary dependence of texts on earlier texts based on stylometric features. The Johnsons use this to argue naturalistic, 19th century sources for the Book of Mormon--specifically that the Late War is one of them. It looks like a solid data set, but it is a naive application of stylometry. From my reading on authorship attribution, I don't get the sense that anyone claims stylometric similarities imply derivation. They may reveal common authorship, similarity of style or genre, imitation, or plagiarism, but not derivation. The Johnsons have, in fact, provided a piece of data that is a clear counterexample to their derivation hypothesis. The Book of Nullification, published the same year as the Book of Mormon, can be neither a source for, nor a derivative of the Book of Mormon. Despite this, the Book of Nullification is nearly twice as similar to the Book of Mormon as the Book of Mormon is to the Late War. It's rather a stretch in this context to say that derivation from a common source will result in two things more similar to each other than either is to the source. Evidences beyond stylistic similarity are needed to make claims regarding derivation.

The Johnsons' data may provide insight into many other questions. I agree with the analysis of Christopher Smith (here and here) regarding a more likely significance of linguistic similarities between the Book of Mormon, the Late War, and other early 19th century texts. In essence, the Johnsons' study shows that Book of Mormon English falls cleanly within a context of early 19th century pseudo-biblical language. When this language is imitated in other books with similar genres, unusual numbers of similarities appear between various texts. Such an understanding doesn't require that Joseph Smith ever have read books for which there is no evidence of possession or even connection. Joseph would only have had to participate in the quite common practice of imitating Biblical style in his dictation--whether original or a translation.

A second approach to explain the Book of Mormon in naturalistic terms, as a 19th century product, is to demonstrate specific sources for Book of Mormon content. The idea is that if enough Book of Mormon content can be shown to match 19th century sources and ideas, then the number of apparently ancient elements in the Book of Mormon decreases, or at least becomes more questionable. A smaller number of ancient elements can more plausibly be attributed to coincidence, imagination, stylistic imitation, and blind luck--all origins acceptable to naturalistic sensibilities. This story is most believable if the number of sources can be limited to a small number of sources that Joseph Smith might have had access to with some non-zero probability.

Some people find a more general explanation of Joseph Smith as a sponge of ideas, styles, stories, and ancient myths, followed by an impressively coherent, imaginative dictation, a sufficient explanation of the Book of Mormon. To me, this kind of explanation is as magical as Joseph's own--angels, gold plates, and revelation. No one can hope to test it, scientifically. Consequently, the most rational conclusion from this competition of untestable ideas is agnosticism--not rejection of Joseph's story. For a 'naturalistic' explanation of the Book of Mormon to claim the place of rationality (if a naturalism devoid of angels is not to be your predetermined starting place), it needs to provide more direct evidence of 19th century sources that can adequately replace Joseph's own explanations.

On Patheos, RT has attempted to do just that. RT has argued that too many similar narrative elements appear in both the Book of Mormon and the Late War for Joseph Smith to not have read and absorbed the Late War. He candidly recognizes enough significant differences that he turns to the following explanation of influence:

"A more appealing explanation than strict literary borrowing is that the LW account had made a strong impression on Joseph Smith at an earlier point at some remove from his work on the Book of Mormon and that elements of the story had had time to percolate in his mind. Eventually, these elements were called forth when circumstances necessitated, perhaps even semi-consciously, and then adapted in a unique way."

Again, this relies on pure speculation that Joseph ever encountered the Late War. There are some additional problems with this view. If the influence of books like the Late War on the Book of Mormon is too indirect and soft, then it amounts to little more than claiming that the Book of Mormon language (and possibly some narrative elements) was influenced by the cultural milieu of Joseph Smith--an argument I have no problem with, since I expect it to be the case for a translator of an ancient record. Such a relationship is already established by the stylometric similarities between the Book of Mormon and many other pseudo-biblical, 19th century writings. It is also predicted by Alma's dictum that God speaks to us according to our language and understanding. RT seems to be claiming a stronger reliance on the Late War. RT points out a large number of similar narrative elements in the Late War and the Book of Mormon. In several passages these elements are found in the same order in both books. For example, one passage in the Late War might have 5 independent elements in close succession. These same narrative elements might be found in close succession in the Book of Mormon. RT claims, in essence, that Joseph Smith absorbed and remembered those plot elements, and recalled them in order in the Book of Mormon. Here are possible explanations I can imagine:
  1. A real, ancient record with genre similarities to the Late War produced parallels in sequence by chance and/or similar orders of real events.
  2. Joseph Smith invented pseudo-ancient narrative elements which fell into the same sequence as elements of the Late War by chance.
  3. Joseph Smith copied sequences of elements from the Late War, consciously or unconsciously.
For anyone wishing to claim direct sources for Book of Mormon content, similar alternatives could be proposed. Is there a way to test the likelihood of these various scenarios without resorting to appeals to ideology? I'm not sure, but here is a proposal:
  1. Identify the total number of independent textual elements in each book (e.g. the Late War might have 5000 elements, and the Book of Mormon 10000). Elements which could not logically appear in any other order are not independent.
  2. Identify the numbers of repeat textual elements (two elements that appear 20 times each in both texts are much more likely to appear in close sequence in both books. Elements appearing only once are much less likely to appear in specific orders).
  3. Identify the number of common textual elements between the two books.
  4. Simulate the creation of two books with the respective numbers of textual elements (e.g. rearrange the 5000 elements of the Late War and the 10000 elements of the Book of Mormon several thousand times).
  5. See how many common elements line up in the same orders.
At least two tests could be performed from these data. From 3, the question can be asked if the number of common textual elements is unusual for books of similar genre. As noted by RT, none of the parallels are compelling, individually. Can similar numbers of commonalities be identified between the Book of Mormon and other books about war? If not, that lowers the probability that Joseph Smith invented the war plots independently without even addressing the element of sequence. As regards the Late War, specifically, the remaining choices are an ancient war with real events, some of which paralleled the War of 1812, or direct borrowing (through memory) of plot elements from the Late War. If the numbers of common elements can be explained simply by similarities of genre, then RT's assertion of direct influence needs to be supported by the similarities of sequence.

This test would be difficult to perform, directly. It would require defining narrative elements and labeling them in many different war books. It would require identifying war books which are similar in genre, but not in style to the Book of Mormon. Alternatively, it could be done with books that match other parts of Book of Mormon genre, but are not proposed as sources for the Book of Mormon. Any way you look at it, this would be a painstaking endeavor, however, until it is done, no proper control has been performed on RT's analysis.

A second test could be performed from the data generated by 4 and 5. We can see if RT has observed a real pattern, or if the parallels are likely to arrange in common sequences just by chance. If independent elements arrange in common sequences by chance, in numbers roughly equal to those observed by RT, then claims of direct influence--even distant influence--are weakened, and claims of either a more general cultural influence or an ancient origin are strengthened.

The Johnson study has revealed a handful of other books with high stylistic similarity to the Book of Mormon, including one published in 1830 which could be neither a source for, nor derivative of, the Book of Mormon. I suspect that these other books will also show large numbers of common narrative elements with the Book of Mormon. If this is true, then justifying claims that we can identify Book of Mormon sources through similarity of narrative elements will require comparison of sequences. If plot elements are arranged in common sequences with unlikely frequency, then claims of 19th century sources are strengthened. If not, 19th century source claims are returned to the state of their evidences on historical origins, namely, psychomagical claims regarding the extensive knowledge and genius of Joseph Smith.

Problems with conducting this study
  1. We need a measure of textual elements and of their independence. For example, how many independent elements are in a battle at a river with lots of blood, rallied troops, vanquishing of a more numerous enemy, and casualty counts afterward? It seems like lots of battles have happened at rivers, blood doesn't usually precede a battle, rallying troops happens after a battle has started and lots of soldiers have been wounded, more numerous foes are sometimes vanquished (and always after these other events), and casualty counts are rarely made before a battle is over. From this perspective it would appear to be a single element. However, these same events could have occurred on a plain, or in a forest, but you probably wouldn't rally from inside a fort. Troops might not have been rallied at all after initial setbacks. The foes could have won the battle. Casualties might not have been reported at all. Other observations could have been inserted within these events. Defining narrative elements and their independence seems prohibitively difficult for making any sort of statistical claims of derivation, unless you are trying to show plagiarism.
  2. If there are multiple sources for the Book of Mormon, the Late War might have only been a primary source for the war chapters. Thus, comparing similarities with the whole Book of Mormon will introduce systematic errors. Most likely, these errors would dilute the concentration of similarities and make the probabilities of similar sequences appear much less likely than they really are.
Ideas to manage these difficulties
  1. Choose a stylometric feature, such as word n-grams, to represent similarity. Use these features to approximate a percent similarity for the two texts. Estimate a length for narrative elements. Assume that plot elements will have the same percent similarity between the texts as the stylometric feature, since both style and content are believed to be unusually similar for these two texts. Use this narrative element length to estimate the total numbers of independent narrative elements as well as the numbers and frequency of elements shared between the two texts.
  2. Perform the comparison using different portions of the Book of Mormon. Do it once with the full text. Repeat it with only the war chapters. Repeat in with only the chapters with the highest degree of similarity. The Johnsons have identified four other books with unusually high degrees of stylometric similarity to the Book of Mormon. Take only the chapters which are more similar to the Late War than they are to the other texts. See if the results change in ways that would be expected for similarities resulting by chance as the result of similar genres, or if the likelihood of RT's observed narrative element sequence similarities remains as small as RT asserts.
My suspicion is that one set of numbers will show the probability of sequence similarities to be small, and another will show the probability of sequence similarities to be large enough to be explicable simply through coincidence. We will be left to debate the assumptions behind the analyses without a practical way to decide which assumptions are most accurate. In the end we will once again be left to choose whether we prefer psychomagic or angels/revelation/golden plates--or we will have to remain rationally agnostic as to Book of Mormon origins until we have weighed other evidence from other sources.

Additional Notes:

Observations made through the Johnsons' data and other comparisons with the Late War make a few things clear:
  • Pseudo-biblical style was common enough in the early 19th century to explain the stylistic dissimilarity between Joseph Smith's personal writings and his scriptural writings.
  • Pseudo-biblical style is sufficient to explain the existence of many, short Hebraisms in the Book of Mormon text. I have doubts as to its ability to explain all, when examined in detail. I think searches for extended chiasm in the Late War confirms the improbability of accidental insertion of Alma 36, since no meaningful extended chiasms have been found in the Late War, despite someone's having taken the time to identify a huge, meaningless chiasm in the Late War.
  • The uniqueness and complexity of story in the Book of Mormon, which was noted by RT, still requires explanation. The proposed candidates still remain Joseph's imagination or revelation/ancient translation.
  • Other pseudo-biblical texts (many of which are identified by the Johnsons' extended data set) could be tested for variety in their stylometric features. Do any of them come close to the variety of styles in the Book of Mormon? Do authors just accidentally shift styles multiple times when imitating Biblical style?

Wednesday, March 5, 2014

Fraudulent Authorship Signatures and Mormon Scripture

Summary
  • Under certain conditions, it is possible for authors uninformed regarding linguistics and statistical authorship attribution to obscure their own writing style, or imitate the style of another author.
  • Without intent to obscure or change authorship style, many current stylometric methods are able to assign authorship with high accuracy, even with large sets of candidate authors and relatively small text samples.
  • Machine translation through as many as two additional languages does not obscure authorship styles (i.e. write in English and translate through German and Japanese before returning to English).
  • The conditions for successful obfuscation and imitation are different in a number of ways from those necessary for Joseph Smith to have copied, forged, imitated, or invented Mormon scripture.
  • The differences represent the stylometric challenge needed to establish the plausibility of fraud on the part of Joseph Smith.
Rachel Greenstadt's research group at Drexel University has been doing research into the ability of stylometric methods to identify authors of texts when the authors are attempting to obscure their own style or imitate the style of another author. I looked at two of her published papers to better understand how easy it is to commit 'authorship fraud', since that is the implicit (and sometimes explicit) claim regarding Joseph Smith's production of Mormon scripture.
Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12 (November 2012).
Michael Brennan and Rachel Greenstadt. Practical Attacks Against Authorship Recognition Techniques in Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, California, July 2009.
The papers aren't too technical, and are pretty fun reads, if you are interested. Most of the technical parts are in other, referenced papers, since the authors employed already established methods. I'm just going to give you the bottom line results and the details of the writing data sets which are relevant to thinking about authorship of Mormon scripture.

The text samples

  • Authors submitted 5000 or 6500 words of previously written academic text, like essays, papers, articles, business letters, etc.
  • Authors were told to write 500 words describing their neighborhood to a friend who hadn't seen it before, but to disguise their style.
  • Authors were then given a 2500 word passage from a very distinctive author, and told to describe their day, from the time they woke up, in the style of this author. They were to use third person, and again, they each wrote approximately 500 words.
  • The texts from 15-45 authors were then split up randomly into many smaller groups of 2-40 authors. Things were randomized quite thoroughly, and lots of different combinations were tried. Sometimes every possible combination, depending on the stylometric test being used and computational feasibility.

The results

  • All of the stylometric measures tried were able to assign normal texts to the correct authors with high probability (>90%, typically).
  • Almost none of the tests did better than random guessing when trying to assign texts where the author disguised his or her style (some authors did better than others, and some authors had disguised styles assigned to them more often than others).
  • All of the tests were very likely to assign the imitation styles to the author being imitated.

What does this mean for studies of Book of Mormon authorship? From Holmes's data, some type of fraud seems to be the most scientifically defensible alternative if one rules out claims involving some sort of divine intervention or other magical explanations. Thus, our ability to detect obfuscation or imitation in Mormon scripture texts would lend support to this conclusion, while finding that fraud is unlikely lends support to selecting among the various 'miraculous' or 'supernatural' claims. Here is my tentative comparison and estimate of what pursuing the hypothesis of fraud will mean for Mormon scripture.

Comparison of Adversarial Authorship Studies with Mormon Scripture Historical and Stylometric Studies
Fooling Stylometric Measures
Mormon Scripture
Favors/Disfavors Fraud
6500 word reference samples
Multiple 10000 word reference samples
Disfavors. Longer reference texts give more information regarding authors' styles.
500 word samples for classification
10000 word samples for classification
Disfavors. It is presumably harder to hide your style over longer texts.
Simple, familiar topics
Complex, unfamiliar topics
Disfavors. Greenstadt and her coauthors assume it is harder to concentrate on obfuscation or imitation while inventing or remembering complex, new material.
Written in short times
Written or dictated in short times


'Dumbed down' to obscure personal style
Less rich vocabulary in Mormon scripture than in Joseph Smith's personal papers. The same is seen for Joanna Southcott's prophetic voice, although to a lesser degree.
Favors. This was the most common technique to obscure a personal style.
Distictive authorial style for imitation
No historically verified texts being imitated
Disfavors. Joseph Smith apparently created distinct and consistent authorial styles for Nephi, the Doctrine and Covenants, Moroni, and Alma, and nearly consistent for Mormon—all without having any known reference authors to copy. (A rigorous stylometric comparison with The Late War could suggest a text from which one or more of the Book of Mormon styles was imitated. This is an obvious test to be attempted by those interested in Book of Mormon authorship.)
Closed set of authors
Open set of authors
Favors. If we can't detect fraud within a closed set, how can we hope to in an open set?
Adversarial authorship attack known
No direct evidence of fraud
Disfavors. Stylometric methods are demonstrably highly effective at identifying authors when sample sizes are as large as those from Mormon scripture. The only time this is known to be untrue is when authors are deliberately disguising their style or copying another.
Machine translation doesn't disguise style
Claimed to be translations
Disfavors. Authors' styles are preserved through multiple machine translations, consistent with Joseph Smith having 'translated' texts by multiple authors.

Nearly every one of my above observations is debatable to differing degrees. Many of them could be partially tested, strengthening or weakening the hypothesis of fraud. Of course, progress either way is unlikely to be convincing to all parties. Without direct historical evidence of fraud, all that could be shown in its favor is that Mormon scripture has very similar styles to some proposed source text or texts. And for those who refuse to entertain possibilities like actual ancient texts, communication with God or angels, inexplicable inhuman genius, or inexplicable access to obscure information on the part of Joseph Smith, all evidence against fraud will do is require that the fraud be more ingenious and conspiratorial. Still, some of the questions seem worth investigating, since the methods are becoming more and more automated.

Sunday, March 2, 2014

Prophetic Voice and Holmes Tries New Methods

Summary
  • Separation of data points by measures of vocabulary richness is useful for distinguishing different authors. Clustering of data points is uninformative for determining authorship. Thus, a cluster may represent multiple authors.
  • Joanna Southcott's prophetic prose has a vocabulary richness signal that is approximately an average of her personal writings (diary) and her prophetic verse. Thus, by changing genre, an author is able to significantly change her stylometric signal according to vocabulary richness measures. Therefore, genre must be controlled for when using vocabulary richness for author identification.
  • The Book of Revelation is more similar in vocabulary richness to both Southcott's prophetic voice and to the Book of Isaiah than Southcott's own different styles
  • For Joseph Smith to have written all of his personal writings, the Doctrine and Covenants, the Book of Mormon, and the Book of Abraham, he would have had to employ several times larger variation in vocabulary richness than any single author studied by Holmes--even including Southcott's entire variation.
  • Holmes used methods of noncontextual word analysis to determine authorship of contested Federalist Papers, and found that these methods were more successful and informative than vocabulary richness.
  • Holmes's data continue to support multiple authorship for the Book of Mormon, and strongly suggest that Joseph Smith was not the sole author of Mormon scripture.

Papers

Holmes, David I. (1991), "Vocabulary Richness and the Prophetic Voice," Literary and Linguistic Computing, 6(4):259-268.
Holmes, David I. and R. S. Forsyth, "The Federalist Revisited: New Directions in Authorship Attribution," Literary and Linguistic Computing, 10(2):111-127
(I can help interested people obtain copies for personal or educational use.)

The Prophetic Voice 

Holmes continued to study statistical methods for authorship attribution for at least several more years after examining authorship of Mormon scripture. The immediate follow-up to his 1990 paper, which I discussed in a previous post, was to look at how varied an author's styles could be. He investigated the vocabulary richness of Joanna Southcott, a moderately prolific author who kept personal diaries, wrote extensive inspired verse, and recorded many prophetic sayings. Holmes made three collections from her writings:

I think it's important to note, here, the three very different literary styles: diaries, verse, and prophecy. Let's get right to the main figure now. In fact, this is the only figure, aside from a dendrogram, that Holmes included in this paper, justifying my identification of the equivalent figure in the previous paper as the most important. Here it is:
I've done the same type of normalization as before. I measured the greatest distance between Isaiah samples (I1 and I2) and set that as 1.0. I measured other distances in terms of this unit length. Let's make two comparisons. First, let's take Holmes's position that Southcott (SD, SP, and SV) is representative of how varied an author can be when employing a prophetic voice. Southcott still only manages to change her style by 3.2 units, or 6.1 units if you include prophetic verse. Joseph Smith's writings (J1-3 and D1-3) change by as much as 8.8 units between his personal writings and his revelations in the Doctrine and Covenants. Adding the data from Holmes's last paper to this one, we can make the following observations regarding the variability in vocabulary richness in the first two principle components:


Sources
1st Principle Component Variability
2nd Principle Component Variability
Isaiah
1.0
0.4
Joanna Southcott
4.8
3.8
Joseph Smith (personal)
2.1
1.2
Joseph Smith and Doctrine and Covenants
8.7
2.4
Joseph Smith and Mormon Scripture
9
6

In this analysis I measured the greatest variability in the first and second principle components separately. In the first principle component (PC1), Isaiah varies by 1.0, setting our unit. In the direction of PC2, Isaiah shows almost no variability in signature. Joseph Smith approximately doubles Isaiah's signature in both PC1 and PC2. Each of Southcott's changes in genre is a little larger than the variation within the personal writings of Joseph Smith. Joseph Smith and the Doctrine and Covenants, combined, show a variability almost twice that of Southcott in PC1 (the most significant component of authorial signature), and a still large variation in PC2 (6 times Isaiah, 2 times Smith's personal writings, 2/3 of Southcott's total change, and 1 1/3 times each of Southcott's individual shifts in genre).

Holmes's interpretation of Southcott's prophetic voice as an imitation of Revelation is very plausible, both from the perspective of vocabulary richness and history. Southcott was, apparently, trying to imitate biblical style. Joseph Smith, on the other hand, had his own, original prophetic voice unlike the biblical styles which Smith often quoted in Mormon scripture.

Vocabulary Richness in Mormon Scripture Revisited

Adding in the rest of Mormon Scripture from the Holmes 1990 paper (this is necessarily approximate, but justified by the same variability ratios existing for Isaiah, Smith's personal writings, and the Doctrine and Covenants), writings authored and dictated by Joseph Smith vary by 9 units in PC1 and 6 units in PC2. Variation in the Book of Mormon alone is approximately 2.5 in PC1 and 5.5 in PC2. Claiming that the mind of Joseph Smith was alone responsible for all of these authorial signatures requires not just large, but believable, changes in genre as illustrated by Southcott, but changes at least 1 1/2 times that large.

Now let's examine the ability of vocabulary richness measures to distinguish among known authors. The differences between Revelation and Isaiah, and Revelation and Southcott's prophetic prose are 1.2 and 1.5 units, respectively, and almost exclusively in PC1. By the descriminatory standards of Holmes's conclusions, Isaiah, Revelation, and some of Southcott's writings are indistinguishable, while J1-3 and D1-3 are clearly distinct. This evidence proves conclusively that Holmes's conclusion of single authorship for all of Mormon scripture is completely unjustified by the data. The most he can claim is that Mormon scripture was written with a very wide range of authorial stylometric signatures by an unknown number of authors.
I'm going to add a few additional observations:
  • Nephi, Alma, and Moroni each show reasonable sizes of variability in authorial signature based on variation in Smith's personal writings and Southcott's prose (personal and prophetic), and each of these is distinguishable from the others.
  • Lehi and Abraham are as believably different from Nephi, Alma, Moroni, and the Doctrine and Covenants as Revelation is from Isaiah and Southcott, especially if you add in the third principle component.
  • This multi-author grouping of Mormon scripture removes N1 and D1 as  extreme outliers--a reality unexplained by Holmes in his 1990 paper.
  • Mormon (M2-M5) are believably a single author, but M1 still remains as an outlier. A major genre shift, not even as extreme as the shift from Southcott's diaries to her prophetic verse, could explain this. Unfortunately, in lumping all of Mormon scripture together, Holmes failed to examine or report information that might have helped us understand M1 as an outlier. 
  • It is also possible that single outliers of the degree of M1 may just occur randomly, from time to time. However, believing this renders Holmes's analysis meaningless, because he only has single samples for each of Southcott's voices, and for Revelation.

The Federalist Papers

Holmes reported a study of the Federalist Papers in 1995. He employed various authorship attribution methods to the Federalist Papers of both known and disputed authorship. The paper seems interesting in its own right, with much more than relevance to the question of Book of Mormon authorship. I, however, will focus only on one part relevant to the discussion of Book of Mormon authorship.

Holmes compared his vocabulary richness measures with noncontextual word use measures. Here they are:
You'll notice in the top left panel that vocabulary richness was only able to distinguish among four known authors with some overlap and a fair amount of gerrymandering. The noncontextual word analysis was able to cleanly distinguish all four known authors (top right). In assigning authorship for the disputed Federalist Papers, vocabulary richness incorrectly assigned one text, and left several ambiguous. Noncontextual words also incorrectly assigned one text, but much more cleanly assigned all of the remaining 11. These data confirm that vocabulary richness is a useful measure for selecting among known authors, that it is a problematic measure for making really clean distinctions, and that noncontextual word analyses are at least sometimes significantly better. It should also be noted that there are no changes in genre for the Federalist Papers. We know from Southcott that changes in genre can adversely effect the ability of vocabulary richness to distinguish among authors. We don't know how it effects noncontextual word use.

Authorship possibilities for Mormon scripture


Holmes's analysis allows for several possibilities to explain the authorship of Mormon scripture. I can imagine these categories of explanations:
  1. Joseph Smith wrote and dictated his personal writings in his 'normal' voice. He dictated Mormon scripture as revealed by God or 'translated' from ancient texts written by multiple authors. This is Joseph's explanation.
  2. As number 1, but rejecting divine or miraculous ancient origins. Possible explanations include: the Devil, hallucinations, psychotic breaks, aliens, mundane translations from ancient texts, individual genius, or whatever you want to imagine that can't be explicitly tested without magic or a time machine.
  3. The genres of Joseph Smith's personal writings, revelations, and other Mormon scripture are different, and this accounts for the vocabulary richness differences.
  4. Some other modern author (or authors) really wrote the Doctrine and Covenants, the Book of Mormon, and the Book of Abraham, rather than their being translated or dictated by Joseph Smith.
  5. Scribal influence changed the vocabulary signal.
  6. Joseph Smith was trying to disguise his signal to appear like different beings were giving the texts when it was really his imagination and invention.
(1) Mixed divine and ancient origins for Mormon scripture are not contradicted by any statistical evidence (yet examined). The very large variability in vocabulary richness found in texts dictated by Joseph Smith supports a multiple authorship hypothesis. Establishing who the authors actually were, in this scenario, will likely never be conclusive due at least to incomplete historical documentation.
(2) Alternative unexplained or supernatural origins are equally possible. These would include explanations claiming that Joseph Smith had access to obscure, detailed, but extant knowledge and documents regarding any number of topics (first century Christianity, ancient Mesoamerica, ancient temple rites, etc.), despite the lack of historical evidence supporting such assertions. They would also include explanations like, Joseph was a genius sponge, soaking up ideas around him that he might have heard about and constructing them into a coherent set of scriptures. Other naturalistic explanations involving mental aberrations in Joseph Smith would fall into this category, as would other, non-divine, supernatural explanations.
(3) It is perhaps possible that changes in genre could completely account for the variety of vocabulary richness measurements in Mormon scripture. Further statistical analysis of the works of single authors who wrote in many different genres could speak to the likelihood of this explanation. The example of Joanna Southcott is insufficient to explain the variety of styles in Mormon scripture. Do any other authors show a change of 8.8 units simply through changes in genre or through imitation of other styles? How many have done it? How successfully? Superficially it seems unlikely to be the sole source of variation, but further testing would be required for any statistical certainty.
(4) I am unaware of any historical evidence claiming that Joseph Smith did not dictate essentially all of Mormon Scripture (of course there are a couple of other-authored and perhaps co-authored sections in the Doctrine and Covenants, but I'm assuming Holmes correctly excluded those from his analysis). I am unaware of any contemporary historical evidence showing Joseph to have used external, modern sources for Mormon scripture. Third parties who were not present at the events of translation have claimed Joseph used the work of other authors. For these claims to fit the data, Joseph would have had to memorize long passages of unknown documents by these other authors. Alternatively, if Joseph didn't simply copy the work of other 19th century authors, you are back at (2), with Joseph's genius creating new content and new styles--even if based on documents extant in 19th century New England. We will have to keep these requirements in mind when we examine various authorship claims, after having looked at all of the stylometry papers.
(5) The effects of scribes on authorship styles is worth serious consideration, but at first glance seems unlikely as the explanation for the vocabulary richness variation in Mormon scripture. Almost everything Joseph Smith authored was dictated to scribes. The Book of Mormon largely to one scribe. His diaries and revelations to many scribes, with a small portion in his own hand. The Doctrine and Covenants and Book of Abraham to other scribes. Despite this, there appear to be clearly separate signals--even if broad compared to Isaiah--for Joseph Smith's personal writings, for the Doctrine and Covenants, and for much of the Book of Mormon. Further controls might establish this with greater certainty, but it appears that scribal influence does not erase the vocabulary richness signature
(6) With (1) and (2) only partially testable (i.e. it may be possible to clearly establish multiple authorship without identifying the authors), (3) and (5) unlikely at first glance, and (4) contradicted by the best historical evidence, some form of fraud seems the most statistically and historically tenable alternative to (1) and (2). Joseph Smith could have disguised his vocabulary richness signature multiple times for multiple different authors. Some evidence, although using different metrics for authorship attribution, suggests that authors can intentionally disguise their own stylistic signal, and even copy another stylistic signal, if they are intending to deceive (or imitate. Southcott's prophetic voice is, according to Holmes, an example of imitation). I will look at the above paper in a future post. This topic is worthy of further examination, because it is subject to a degree of objective verification. There are a number of conditions necessary for Joseph Smith to have intentionally and successfully created documents with fraudulent authorship signatures, so it may be possible to assign a statistical probability (or improbability as the case may be) to this scenario. Various authors throughout history have attempted to create new voices for their narrators within various books. In theory, these are subject to statistical examination, and I will look to see if I can find what is known about these shifts in narrative voice. As with (1) and (2), fraud of this kind can't be proven through statistics, but finding another author who successfully changed his or her vocabulary richness signature to the same degree as found in Mormon scripture would strengthen the statistical case for (6) while weakening it for (1) and (2). Failing to find such an author does the opposite for anyone who has not ruled out a priori the alternatives in (1) and (2).