Word-print studies and Jockers

MCB · June 14, 2010

Bruce, if you discount NSC when applied to authorship attribution, how do account for Dale's results with word-strings, and Delta, which both came up with similar results? I have run a chi-square, comparing Dale's results with the NSC, and found that, although both are imperfect, the results could not have been due to chance. (I used word-count, since they used different methods for determining samples)

I am very interested in taking a look at those chapters for which Jocker's study did not come up with a distinct authorship attribution. My methodology would be this:

t could give us more information if we would take the top assignment for each chapter. Then we take what would be expectation for each word for that author in the given section, and how far it deviates from expectation. Add them together, and divide by the number of words used for the author's sample. We would also have to control for length of chapter. Plot the resultant numbers, and see how many group close to the 0 point. Those which group far away are likely to actually be unassigned chapters. Such a procedure would be very laborious with the resources I have now.

However, we may never find an author for some of them. I believe that some of the letters, for example, may have been old Revolutionary War letters that Spalding adapted and plugged into the text.

I am also concerned about Rigdon's samples, because he so often wrote "scripture salad"

Glenn101 · June 14, 2010

Bruce, if you discount NSC when applied to authorship attribution, how do account for Dale's results with word-strings, and Delta, which both came up with similar results? I have run a chi-square, comparing Dale's results with the NSC, and found that, although both are imperfect, the results could not have been due to chance. (I used word-count, since they used different methods for determining samples)
................................

I am not speaking for Bruce, but my understanding of his arguments is not that he discounts NSC, but rather some of the methodology, results, and conclusions that were drawn from the study.

There has been a corpus of wordprint studies going back several years which have been used with good results to deteremine authorship of various texts. When applied to the Book of Mormon, the NSC study came up with results that were rather strikingly different from the results of most of the other studies. They all could not be correct, but which one?

The Federalist papers are generally accepted as a test bed for authorship attribution because the authors of most of them are known. They were written by Alexander Hamilton, James Madison, and John Kay. The author of all but twelve are known.

Bruce applied the NSC method to the 51 Federalist papers known to be written by Hamilton using Sidney Rigdon, Quoting form the paper "Rigdon was falsely classified as the author of 28 of the 51 Hamilton texts, with posterior probabilities ranging as high as 0.9999 (see Fig. 2). Pratt was classified as the author of 12 of the papers, and Cowdery was classified as the author of the remaining 11 papers."

Hamilton was not included as one of the test authors. We are pretty confident that None of the test authors had anything to do with writing any of the Federalist papers, naturally.

You can find the paper here.

This would indicate to me that the NSC method needs some work.

Glenn

MCB · June 14, 2010

Sure, when you force NSC to mis-ascribe authors, it will come up with results, due to chance. That is given.

USU78 · June 14, 2010

You all know that the chapters are a later editorial innovation, right? And that there have been other editorial changes from time-to-time over the years?

Wouldn't using the original publication's editorial and lexical choices be a better source document for comparison?

Glenn101 · June 14, 2010

Sure, when you force NSC to mis-ascribe authors, it will come up with results, due to chance. That is given.

Correct. If the real author is not present in the set of authors tested, the results will be skewed. That is one of the criticisms of the Jockers study. You need to read the paper. When Bruce adjusted the procedure a bit, using the first 25 Hamilton papers as training texts, and added Hamilton to the mix, the NSC method picked Hamilton as the author every time. There were no false positives.

Edited to add, Bruce did extend the NSC methods to account for the possibility that the real author was not in the mix. It worked for the test "authors" very well.

Glenn

MCB · June 14, 2010

I agree with you, USU. It would have made it less laborious to compare those results with Dale's.

cdowis · June 14, 2010

Bruce, if you discount NSC when applied to authorship attribution, how do account for Dale's results with word-strings, and Delta, which both came up with similar results? I have run a chi-square, comparing Dale's results with the NSC, and found that, although both are imperfect, the results could not have been due to chance.

I discussed this in detail with Dale, that he did no validation study of his methodology, and he admitted that his results may not be valid.

As you well know, running statistical calculations on invalid data yields invalid results. "Garbage in, Garbage out"

You have to validate your methodology before paying any attention to the results. I find it curious that you do not know this.

MCB · June 14, 2010

However, NSC and Delta have validated his results. At the time of his study, there was no other word-print analysis method available. So it was a stand-alone study, and could not be validated until other methods were developed. This is the reason why he was reasonably comfortable within the RLDS church for so many years.

Glenn101 · June 14, 2010

However, NSC and Delta have validated his results. At the time of his study, there was no other word-print analysis method available. So it was a stand-alone study, and could not be validated until other methods were developed. This is the reason why he was reasonably comfortable within the RLDS church for so many years.

I don't think that Dale is as comfortable with the NSC validation as you might expect, after his exchanges with Bruce.

Glenn

MCB · June 14, 2010

I don't think that Dale is as comfortable with the NSC validation as you might expect, after his exchanges with Bruce.
Glenn

I am not a mind-reader, especially when it come to Dale. He likes to get along with people.

Glenn101 · June 14, 2010

I am not a mind-reader, especially when it come to Dale. He likes to get along with people.

So do I. I do know that Dale has received some information from Bruce and is doing whatever with it. As for the NSC, it would be good for you to read Bruce's paper.

Glenn

MCB · June 14, 2010

I have read Bruce's posts, and, honestly, I don't understand it. I am not that accomplished at statistics, although I have taken graduate coursework in the area. I leave that for others, which is what Dale is also doing. The more complex statistics becomes, the more one can obscure (intentionally or unintentionally) incorrect assumptions.

Simply because one is an expert in the field doesn't necessarily mean that they are right. Particularly when they are working for a university which has a vested interest in preserving a peculiar belief system.

Glenn101 · June 14, 2010

I have read Bruce's posts, and, honestly, I don't understand it. I am not that accomplished at statistics, although I have taken graduate coursework in the area. I leave that for others, which is what Dale is also doing. The more complex statistics becomes, the more one can obscure (intentionally or unintentionally) incorrect assumptions.
Simply because one is an expert in the field doesn't necessarily mean that they are right. Particularly when they are working for a university which has a vested interest in preserving a peculiar belief system.

MCB. I cannot follow the math either, but I can look at the results of those tests. The Jockers NSC study used a closed set of authors. Using any set of authors will produce "winners" and "losers", even if the real author is not included. That was demonstrated very succinctly when Bruce used Rigdon, Cowdery, Spaulding, et al as test authors for the Hamilton Federalist papers using the Jockers NSC classification technique.

Please refrain from the cop out that Bruce's work is somehow tainted because he is an LDS, working for and LDS university. That is impugning his integrity without any justification whatsoever just because of his beliefs.

Of course Bruce's work needs to be checked. It will be checked. He has presented it for publication in the Literary and Linguistic Computing journal.

But at the same time, his work is available right now. The problems that have been identified with the Joskers study need to be addressed. Once some corrections are made that will eliminate proposed people that are not authors, the study will have much more validity.

There are other statistical methods that have gained acceptance by scholars using the Federalist papers as a test bed. These methods reject false authors with a high degree of probability. The NSC method must do that to be really valid.

Glenn

Uncle Dale · June 14, 2010

You all know that the chapters are a later editorial innovation, right? And that there have been other editorial changes from time-to-time over the years?
Wouldn't using the original publication's editorial and lexical choices be a better source document for comparison?

I have long advocated using the pagination of the 1830 BoM

as the text segments for computerized examination. Or, if for

some reason the number of words on those pages are too few

for proper data quantification, then perhaps we could join

each set of two pages, in order, throughout the book, for

our experimental text segments.

I think that would make more sense, than using modern chapter

divisions (when the 1830 text is already being used).

For example, if I were asked "Which part of the Book of Mormon

best matches Solomon Spalding's use of vocabulary?" I would not

point my questioners to an entire chapter, but rather to the

one page in the 1830 edition where the Spalding-words-used

percentage is numerically the highest.

But, nobody ever listens to me. I'm like the mother who raises

her son all his life to be a doctor -- and he opens a deli. Oy!

UD

Uncle Dale · June 14, 2010

...the Federalist papers
...

As I understand it, scholars had already determined a

closed set of potential authors for those texts. In their

recent paper Jockers and Witten used pca charts to also

demonstrate that certain authors word-prints plotted out

in close relationship to the texts in question.

Thus, already possessed of the information that one or

more of their chosen potential authors was at least the

author of PART of the texts to be examined by computers,

the NSC and Delta methods were appropriate.

Bruce argues (if I can paraphrase him correctly) that it

is not known that Spalding, Rigdon, etc. wrote ANY of the

Book of Mormon -- and, furthermore, pca charting excludes

them as potential authors. Therefore it is useless and

inappropriate to subject the BoM, Spalding, Rigdon, etc.

to NSC and Delta methods, and the results are meaningless.

At least I think I got that argument right.

UD

ERayR · June 14, 2010

Simply because one is an expert in the field doesn't necessarily mean that they are right. Particularly when they are working for a university which has a vested interest in preserving a peculiar belief system.

Why does it always come back to this argument? When you can't answer the argument you resort to impugning some ones honesty and integrity. Really it makes me wonder about your own integrity.

ERayR · June 14, 2010

How does the Book of Mormon as a translation from ancient authors affect this process? It seems that to be valid the original authors would have to be included in the test and this would be impossible because there is no other samples of their work.

MCB · June 14, 2010

How does the Book of Mormon as a translation from ancient authors affect this process? It seems that to be valid the original authors would have to be included in the test and this would be impossible because there is no other samples of their work.

That is one suggestion I have for further research. Use Moroni 1-8 as one author's sample, for example. Then find other sections of the text that might have been written by that particular author. If such sections were to be found in, say Jacob, then we would have to say that the authorship that the book claims for itself is not true.

mfbukowski · June 14, 2010

You all know that the chapters are a later editorial innovation, right? And that there have been other editorial changes from time-to-time over the years?
Wouldn't using the original publication's editorial and lexical choices be a better source document for comparison?

Don't confuse us with facts- cheesh. It might not work as well that way!

ERayR · June 14, 2010

That is one suggestion I have for further research. Use Moroni 1-8 as one author's sample, for example. Then find other sections of the text that might have been written by that particular author. If such sections were to be found in, say Jacob, then we would have to say that the authorship that the book claims for itself is not true.

It would seem to me that the Book of Mormon would not lend itself well to that kind of analysis because you have the original authors then you have Mormon's abridgement Then Joseph Smiths translation of Mormon's abridgement. It seems like a lot of dilution going on here.

Glenn101 · June 14, 2010

That is one suggestion I have for further research. Use Moroni 1-8 as one author's sample, for example. Then find other sections of the text that might have been written by that particular author. If such sections were to be found in, say Jacob, then we would have to say that the authorship that the book claims for itself is not true.

This has been done in the Berkely Group study, if I recall correctly and the internal authorship claims did stand up, i.e. Nephi did not line up with Alma, Alma did not line up with Moroni, and Moroni did not line up with Nephi, which leads to the conclusion that Nephi, Moroni, and Alma had different authors.

I think Rencher, Layton, et al did a more exhaustive comparison because they used smaller chunks of text and were able to identify more authors. I would have to go back and dig up all of the reports to confirm this.

Glenn

Uncle Dale · June 14, 2010

This has been done in the Berkely Group study, if I recall correctly and the internal authorship claims did stand up, i.e. Nephi did not line up with Alma, Alma did not line up with Moroni, and Moroni did not line up with Nephi, which leads to the conclusion that Nephi, Moroni, and Alma had different authors.
I think Rencher, Layton, et al did a more exhaustive comparison because they used smaller chunks of text and were able to identify more authors. I would have to go back and dig up all of the reports to confirm this.
Glenn

Could you reproduce their BoM chapter attributions?

Or, at least point us to a publication in which each

of the BoM chapters is identified by author?

Unless we can get down to that level of inspection,

it means very little to say that Nephi's writings are

unlike Moroni's writings.

First of all, we need to see data demonstrating how

consistent Nephi's sections of text really are --

and the same for Moroni. Only then can we effectively

compare the various text sections of the two authors.

UD

.

cdowis · June 14, 2010

How does the Book of Mormon as a translation from ancient authors affect this process? It seems that to be valid the original authors would have to be included in the test and this would be impossible because there is no other samples of their work.

That is part of the validation process.

You take a translation of a text, and test it against a set of authors, one of whom is the translator. You test whether the translator is identified by your method, or whether the results are random.

ERayR · June 14, 2010

That is part of the validation process.
You take a translation of a text, and test it against a set of authors, one of whom is the translator. You test whether the translator is identified by your method, or whether the results are random.

Thanks for the explanation.

cdowis · June 14, 2010

That is one suggestion I have for further research. Use Moroni 1-8 as one author's sample, for example. Then find other sections of the text that might have been written by that particular author. If such sections were to be found in, say Jacob, then we would have to say that the authorship that the book claims for itself is not true.

You are exactly correct. You test for multiple authors, which is the claim of the BOM.

But, again, you must first validate that you method is valid BEFORE you look at your data from the BOM. If you have a bad methodology, your conclusions are invalid. If you find the "same" author in both selections, it may be the result of invalid methodology.

This is not rocket science. This is basic stuff, folks.

This is precisely the problem when you do statistical analysis BEFORE you validate. Garbage in, garbage out. Statistical analysis does not fix the problem of invalid methodology, bad data. It only serves to impress the uneducated.

Word-print studies and Jockers

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived

Recently Browsing 0 members