The Vocabulary Statistics Fallacy

I'll be working my way back into the rotation here now that I've completed some obligations. For today, here's an archive item related to a bad argument I still see resurrected today.

***

In terms of the authorship of the New Testament documents, an argument that is used in many contexts for the letters of Paul (but also Peter and others) makes much over alleged vocabulary differences that indicate a difference in author. In this entry, I’d like to generally discuss this sort of argument, which I consider to be of little to no use in determining the authorship of ancient documents. 

My first misgiving about such arguments is that statistically, they are generally without merit. This is often illustrated by modern comparative examples. In one instance, R. C. Sproul relates the following story in at least two of his books: One of the least scientific methods used to criticize authorship is the study of what is called the incidence of hapax legomena. The phrase hapax legomena refers to the appearance of words in a particular book that are found nowhere else in the author's writings. For example, if we find 36 words in Ephesians that are found nowhere else in Paul's writings, we might contend that Paul could not have written Ephesians. 

The folly of putting too much stock in hapax legomena came home to me when I had to learn the Dutch language in a hurry for my graduate work in the Netherlands. I studied Dutch by the "inductive method." I was assigned several volumes of theology written by G.C. Berkouwer. I started my study by reading his volume on The Person of Christ which was in Dutch. I started on the first page with the first word and looked it up in the dictionary. I wrote the Dutch word on one side of a card and the English word on the other side and set about the task of learning Berkouwer's vocabulary. After doing this on every page of The Person of Christ, I had over 6,000 words on cards. The next volume I studied was Berkouwer's The Work of Christ. I found over 3,000 words in that book that were not found in the first one. That's significant evidence that The Work of Christ was not written by Berkouwer! Note that Berkouwer wrote The Work of Christ only one year after he wrote The Person of Christ. He was dealing with the same general theme (Christology) and writing to the same general audience, yet there were thousands of words found in the second volume that were not found in the first.

Note also that the quantity of Berkouwer's writing in the first volume far exceeds the total quantity of writing that survives from the pen of the Apostle Paul. Paul's letters were much more brief. They were written to a wide variety of audiences, covering a wide diversity of subjects and issues, and were written over a long period of time. Yet people get excited when they find a handful of words in a given Epistle that are found nowhere else. Unless Paul had the vocabulary of a six-year-old and had no literary talent whatsoever, we should pay little attention to such unbridled speculation. 

While Sproul gives us excellent food for thought, his analogy is not as good as we might like. The reason for this is that modern languages like Dutch and English may have upwards of a million words. In contrast, an ancient language like Hebrew or Koine Greek, may have only a few thousand or tens of thousands. According to several sources, the New Testament itself has a vocabulary of only about 5000 words, and one source claims that around 300 of those, account for 80% of the words in the New Testament. This would tend to accord with our own use of vocabulary; though English may have a million or more words, most of us use only a few thousand on a regular basis. 

Thus in a sense, ancient people did “have the vocabulary of a six year old” – a modern one – because they had so few words to begin with. This does not mean they were more ignorant, of course: In their high context society, a single word might have multiple “duties” assigned according to its context. This is one reason why it is oversimplified to stroll through a concordance and attempt to ascertain the meaning of a NT word across the entirety of the NT based on one or two uses. Skeptics frequently engage in this sort of erroneous practice (what one scholar calls “illegitimate totality transfer”). 

Still: Does the fact that Koine Greek may have only a few thousand words in any sense make the arguments about vocabulary more likely? Statistically, yes – but only more likely in the sense that it is more likely that an asteroid will hit a particular state or nation than that it will hit a particular person. The general logic still holds: The nature of vocabulary is such that a certain small percentage of words will appear most frequently as our most used words. For the remainder, it is foolish to use these as a basis for deciding authorship, especially given the rather narrow window we have into the literary lives of the NT authors (a point on which Sproul remains manifestly correct). 

This would be the case even if the NT authors were indeed the writers of their own works – that’s another point that needs developing:

As I have related in other contexts before, the role of a scribe in antiquity makes arguments based on vocabulary highly questionable. As authors like Richards have shown in The Secretary in the Letters of Paul, a scribe could be assigned responsibility for a work along a wide range of potentialities: They might serve purely as receivers of dictation, or they might be full-fledged authors who are merely told what to do in very general terms, with the credited author simply reviewing the work and signing off. 

My research for Trusting the New Testament indicates that many Biblical scholars simply do not give this aspect of ancient composition enough consideration. Many simply dismiss it, with a trace of impatience, as some sort of excuse manufactured by those who wish to preserve the authority of the New Testament. But this is a non-response to a genuine phenomenon of ancient composition. It seems rather that these authors do not wish to accept that scribal activity renders many of their carefully crafted arguments essentially useless. 

Indeed the resulting “chaos” for deniers of the authorship of NT documents could be considerable: They have almost unanimously accepted Romans as Paul’s work, yet it is also one of his letters where the work of a scribe is most apparent in terms of testimonial evidence: "I Tertius, who write the epistle, salute you in the Lord" (Rom 16:22). If Romans, the letter among those most certainly ascribed to Paul, was influenced by a scribe, what will this do to using Romans as a guide whereby other letters like Ephesians might be judged?
In the end, the hands of scribes make the burden much heavier on those who would deny authority (which, for ancient people, also amounted to authorship) to any New Testament book. Beyond such elements as anachronisms, factors like vocabulary simply become useless as tools for making judgments, as do numerous other factors associated with writing style. One can readily understand why critics would be hesitant to abandon what they would consider one of their “star players”.

There is one final factor we might discuss related to vocabulary as a determining factor in reckoning authorship, and that is the frequency of quotations and allusions. Since ancient writers didn’t have quote marks, we can often only recognize these occasionally. But it could be a larger factor than we realize. The classic case is that of Ephesians and Colossians, which both seem to use a great deal of creedal and hymnal material, which would obviously fudge any attempt to argue against their Pauline authority based on vocabulary. Since this was a high context society, though, it also seems likely that there would be a great many other allusions in the NT texts that we would be unable to recognize – especially if they were allusions to something a person once said, or something in a document not available to us. 

In summary: Tests based on vocabulary are questionable enough as is, and require some stringent rules to be valid. However, even more stringent rules would be needed for an evaluation of ancient documents – and the use of vocabulary tests to determine authorship is therefore far less effective than many critics are willing to concede.

Comments

Anonymous said…
The flip side is, of course, that we cannot use the vocabulary test to establish confidence that any of the letters were by the same author.

Pix
im-skeptical said…
The phrase hapax legomena refers to the appearance of words in a particular book that are found nowhere else in the author's writings. For example, if we find 36 words in Ephesians that are found nowhere else in Paul's writings, we might contend that Paul could not have written Ephesians.

- That definition is not correct. If a word appears a dozen times in Ephesians, but nowhere else in the works of the author, that is not an example of hapax legomenon, which actually refers to a singular occurrence. But it may indeed be useful in distinguishing between two different authors whose vocabularies are different.
Differing vocabulary is a valid argent for authorship, Bit it doesn't prove a given author didn't write a given work, it maybe a good probability argument.
The Pixie said…
Joe: Differing vocabulary is a valid argent for authorship, Bit it doesn't prove a given author didn't write a given work, it maybe a good probability argument.

Sure, but "a good probability argument" is all we can have about events that happened 2000 years ago.
J. P Holding said…
>>>- That definition is not correct. If a word appears a dozen times in Ephesians, but nowhere else in the works of the author, that is not an example of hapax legomenon, which actually refers to a singular occurrence. But it may indeed be useful in distinguishing between two different authors whose vocabularies are different.

Good parody of a dumb fundy atheist as usual, IMS! You misread my explanation exactly as an idiot in over his head would, and the offered a "corrected" definition that was exactly what I said. Brilliant!
J. P Holding said…
>>>Sure, but "a good probability argument" is all we can have about events that happened 2000 years ago.

Sure, up until you decide it is an argument you favor, then it's 100% definite, hypocrite.

>>>>The flip side is, of course, that we cannot use the vocabulary test to establish confidence that any of the letters were by the same author.

And who gave you permission to fantasize thusly? Wand-wavers like you have no systematic tests for authorship; you just stick your finger in the air (or up your nose, or...) and accept as valid whatever gets you the results you want for that particular moment, even if you have to embrace contradictory epistemologies in the process. Which is why in the end, Richard Carrier's asinine nihilism in which he denies everything for the sake of denying the New Testament is, while stupid, at least ethically consistent. Far be it for you to emerge from your moral sewer and adopt such consistency.
im-skeptical said…
Good parody of a dumb fundy atheist as usual, IMS! You misread my explanation exactly as an idiot in over his head would, and the offered a "corrected" definition that was exactly what I said. Brilliant!
- While I agree that hapax legomena (as defined by scholars) do not provide a sound basis for declaring that a work must have come from a different author, I still maintain that your explanation of what the term means is incorrect. And furthermore, it seems that you have conflated two different things, because most of the critical analysis of Paul's writing that concludes different authorship for certain books is based on vocabulary differences - not on hapax legomena. Sorry, JP, but it is you who are confused.
Anonymous said…
JPH: Sure, up until you decide it is an argument you favor, then it's 100% definite, hypocrite.

So you are calling me a hypocrite based on your stereotyping of atheists, rather than what I have actually done? Classy.

JPH: And who gave you permission to fantasize thusly? Wand-wavers like you have no systematic tests for authorship...

Perhaps you can explain the "systematic tests for authorship" you have. I will not be holding my breath...

Pix
David Madison said…
"The flip side is, of course, that we cannot use the vocabulary test to establish confidence that any of the letters were by the same author."

So it is as likely that we have letters written by 13 people pretending to be Paul as it is that at least some of them are authentic?

David Madison
The Pixie said…
DM: So it is as likely that we have letters written by 13 people pretending to be Paul as it is that at least some of them are authentic?

Are you using the vocabulary test to determine probabilities? If not (and I cannot imagine how you can be), how is your comment relevant?
David Madison said…
Well, consider those two options. Letters written by 13 people pretending to be Paul and at least some letters written by Paul. The first option is utterly unparsimonious. So that leaves the second option. We should start by assuming that at least some of the letters are authentically Pauline. Now, if we can't rely on analyses of vocabulary to identify fakes then we have no choice but to assume Pauline authorship of all the epistles unless we can find some other reason to doubt it.

DM
One major way of deciding authorship is style. Word choice is not all there is to style, vocabulary and word choice may be part of style but it's not all there is.

Other matters include, contradiction to established positions, Logistical considerations, and historical attestations.
Anonymous said…
DM: Well, consider those two options. Letters written by 13 people pretending to be Paul and at least some letters written by Paul. The first option is utterly unparsimonious. So that leaves the second option. We should start by assuming that at least some of the letters are authentically Pauline. Now, if we can't rely on analyses of vocabulary to identify fakes then we have no choice but to assume Pauline authorship of all the epistles unless we can find some other reason to doubt it.

So that is the game - denigrate every way of judging ancient manuscripts, then declare yourself the winner by saying your position is the default!

Personally, I think the default position is "We do not know". When you remove the vocabulary test, the result is that we have no way to tell if the letters were written by one man, two or a dozen.

Pix
Anonymous said…
JH: One major way of deciding authorship is style. Word choice is not all there is to style, vocabulary and word choice may be part of style but it's not all there is.

Other matters include, contradiction to established positions, Logistical considerations, and historical attestations.


Fair comment. But I wonder how the OP feels about style as a test? I suppose it has the great advantage that it is subjective, so you can always convince yourself if is just the opinion of nasty atheist/liberal scholars who want to destroy Christianity.

The big problem you have is that we KNOW people back then wrote texts that they pretended were written by someone else. The vast majority never made it into the canon, but we cannot be sure about the texts that did.

Prove Paul wrote any of them!

Pix
First in terms of style as a test I see that used a lot more than vocabulary although I see that too.If you look at scholars writing about authorship of NT they more foten argue style than anything else.

The big problem you have is that we KNOW people back then wrote texts that they pretended were written by someone else. The vast majority never made it into the canon, but we cannot be sure about the texts that did.

You can;t charge a text with being Pseudepigrapha merely because there was a lot of it going around in the day. There has to be some other reason for thinking it...The strongest case against Pauline authorship is made against the pastoral epistles especially the 1 and 2 Tim. The major reason there is the apparent church structure seems of a period latter than Paul lived,especial the keeping of a widows list. The office of church widow was totally second century.

Neverthekess there are strong defences for Pauline authorship for all the books



Undisputed Pauline books
First Epistle to the Thessalonians
Epistle to the Galatians
First Epistle to the Corinthians
Second Epistle to the Corinthians
Epistle to the Philippians
Epistle to Philemon
Epistle to the Romans


disputed Pauline books
Deutero-Pauline: may be authyentic
Epistle to the Ephesians
Epistle to the Colossians
Second Epistle to the Thessalonians

Pastoral epistles:probably not authentic

First Epistle to Timothy
Second Epistle to Timothy
Epistle to Titus

you only have thee epistles that are really suspected of being ingenue and they can be defended
DM I wrote that in answer to your statement above:

Blogger Unknown said...
"Well, consider those two options. Letters written by 13 people pretending to be Paul and at least some letters written by Paul. The first option is utterly unparsimonious. So that leaves the second option. We should start by assuming that at least some of the letters are authentically Pauline. Now, if we can't rely on analyses of vocabulary to identify fakes then we have no choice but to assume Pauline authorship of all the epistles unless we can find some other reason to doubt it."


Really on;y three have a strong case against them
David Madison said…
Right, Joe. The idea that *none* of the epistles are Pauline is far out even by the usual standard of those seeking to debunk Christianity. But let's consider it for a moment; let's suppose that Acts is a work of fiction and that the letters were faked to give Acts authenticity. It was certainly a cunning plan!

The forger doesn't go too far in making the letters tie in with Acts; instead, he contents himself with subtle allusions. So in Galatians “Paul” subtly implies that his conversion happened in or near Damascus. The forger also notes a fleeting reference in Acts (24:17) to the bringing of alms to Jerusalem and decides to make this a major issue in several of the letters.

He then includes a very subtle reference to part of the story in Acts involving Apollos. We can see in Acts that Apollos went to Corinth after Paul had been there to do his own missionary work. So the forger has Paul telling the Corinthians that he “planted the seed and Apollos watered it” (Cor. 3:6).

There are numerous correspondences of this nature between Acts and the epistles. The obvious implication is that Paul was a real person whose story is recounted in Acts and who actually wrote the letters purporting to be from him.
The Pixie said…
JH: You can;t charge a text with being Pseudepigrapha merely because there was a lot of it going around in the day. There has to be some other reason for thinking it...The strongest case against Pauline authorship is made against the pastoral epistles especially the 1 and 2 Tim. The major reason there is the apparent church structure seems of a period latter than Paul lived,especial the keeping of a widows list. The office of church widow was totally second century.

That they was a lot of pseudographa around is in itself reason to doubt any text. It certainly does not prove pseudographa in any specific instance, but it does lead to the conclusion "We do not know".

Of course, there are other indicators, one way or another, that do allow some certainty either for or against. I fully accept Paul authored some of the epistles attributed to him. Equally there is good reason to think some were not.
J. P Holding said…
>>> I still maintain that your explanation of what the term means is incorrect. And furthermore, it seems that you have conflated two different things, because most of the critical analysis of Paul's writing that concludes different authorship for certain books is based on vocabulary differences - not on hapax legomena.

Again, excellent parody of a dumb fundy atheist here! No actual answer, just assertion, and an incomprehensible one at that. Brilliant!
J. P Holding said…
This comment has been removed by the author.
J. P Holding said…
>>>So you are calling me a hypocrite based on your stereotyping of atheists, rather than what I have actually done?

Nope. It's based on what you've done. You're always a bag of wind when it comes to your favorite ideas.

>>>Perhaps you can explain the "systematic tests for authorship" you have.

I wrote a whole book called Trusting the New Testament analyzing such tests and applying them to the NT. I won't hold my breath waiting for you to answer it.
J. P Holding said…
>>>Personally, I think the default position is "We do not know".

Yes, it is quite like you to pretend that it is a virtue to posture as though stupid and ignorant. It helps you in your laziness and gives you an excuse to avoid addressing arguments that are over your head.
J. P Holding said…
>>>That they was a lot of pseudographa around is in itself reason to doubt any text.

What a mind-numbingly stupid remark. There were also a variety of "pseudo" documents among secular texts. Congratulations, you just implemented a scorched earth policy for all authorship issues as a way of again avoiding arguments too difficult for your minimal mind to engage. It's like debating big brain Trump with his attention span of a 2 year old.
im-skeptical said…
First in terms of style as a test I see that used a lot more than vocabulary although I see that too.If you look at scholars writing about authorship of NT they more foten argue style than anything else.
- Of course that's correct. My comment was meant as a comparison between vocabulary and hapax legomenon. I could have stated it a little better.
im-skeptical said…
Again, excellent parody of a dumb fundy atheist here! No actual answer, just assertion, and an incomprehensible one at that. Brilliant!
- And you, sir, are a moron. Read the definition here. Note that it says "The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works, but more than once in that particular work." But that fits your own incorrect understanding of the term: "The phrase hapax legomena refers to the appearance of words in a particular book that are found nowhere else in the author's writings."

Too bad they didn't have a dictionary in that prison library where you gained your professional experience.
Anonymous said…
JPH (previously): Sure, up until you decide it is an argument you favor, then it's 100% definite, hypocrite.

JPH: Nope. It's based on what you've done. You're always a bag of wind when it comes to your favorite ideas.

Right, so originally you said I was a hypocrite based on what you expected me to do. Now you have been called on it it is based on some vague claim about what I have done in the past. As say, classy.

JPH: I wrote a whole book called Trusting the New Testament analyzing such tests and applying them to the NT. I won't hold my breath waiting for you to answer it.

So you should be able to state with ease your "systematic tests for authorship". However, and as predicted, you failed to do so.

However, you are correct about me not reading your book. A whole book of your substance-free vitriol really does not appeal.

JPH: Yes, it is quite like you to pretend that it is a virtue to posture as though stupid and ignorant. It helps you in your laziness and gives you an excuse to avoid addressing arguments that are over your head.

I think you have misunderstood my point (perhaps t went over your head!). What I am saying is that the certainty of any claim must be judged on its merits. If there are two competing claims, then it would be wrong to say that we know that one is true and the other false, even if we judge one to be 60% likely.

The default position when the answer is not known is "We do not know", it is not whatever your pet theory would like it to be. That is the lazy way.

I would certainly prefer to know for sure than not to, but to just cling to a claim because we want it to be true is intellectually dishonest - you are just fooling yourself. Far better to look at the arguments on both sides, to look at the evidence, and determine just how sure you can be about a claim. If it is not at all sure, then the default is "We do not know".

But I am curious... What argument do you think you have given that is over my head?

JPH: What a mind-numbingly stupid remark. There were also a variety of "pseudo" documents among secular texts. Congratulations, you just implemented a scorched earth policy for all authorship issues as a way of again avoiding arguments too difficult for your minimal mind to engage. It's like debating big brain Trump with his attention span of a 2 year old.

The authorship of any ancient text has to be considered suspect at best. It is only persons who need to convince themselves that authorship is certain that would say otherwise. But that does not mean they have no value. Even pseudographa give an insight into the thinking and beliefs at the date (even if the date is not what we first thought), and there are techniques that allow some confidence for many ancient documents (I would not dispute that many of the Pauline Epistles were written by Paul, for example, because I think the evidence supports that).

Pix
David Madison said…
Right, so originally you said I was a hypocrite based on what you expected me to do. Now you have been called on it it is based on some vague claim about what I have done in the past.

Not to be pedantic, Pixie, but if you expect someone to do something, isn't it likely that your expectation would be based on that person's past behaviour?

DM
Anonymous said…
DM: Not to be pedantic, Pixie, but if you expect someone to do something, isn't it likely that your expectation would be based on that person's past behaviour?

How much of my past behaviour has JPH seen? He said: Sure, up until you decide it is an argument you favor, then it's 100% definite, hypocrite. He is guessing what I will do in the future, and I am guessing that is based on a stereotype, rather than something I have actually done, hey, if anyone things I have done that, do please point it out.

Pix
im-skeptical said…
JP doesn't base his judgment of atheists on their past behavior. He has a stereotyped image of atheists in his mind, and that serves as justification for his behavior toward them. The actual arguments and statements they make don't figure into JP's assessment. But his own past behavior is well-known, and has been noted by many. See here for some examples.
J. P Holding said…
>>>- Of course that's correct. My comment was meant as a comparison between vocabulary and hapax legomenon. I could have stated it a little better.

The brilliance of this parody of a dumb fundy atheist is astonishing at times! Raising definitions that are so undifferent as to make no difference is indeed their normal tactic. It has to do with the mental damage caused by reading Chick tracts.
J. P Holding said…
>> Now you have been called on it it is based on some vague claim about what I have done in the past. As say, classy.

Spoken like a true guilt-ridden bloviator. Your sample size is enormous, unlike your honesty and your intelligence quotient.

>>>So you should be able to state with ease your "systematic tests for authorship". However, and as predicted, you failed to do so.

The patent stupidity of the likes of you is that you idiotically believe that everything is reducible to "easy" statement. This is the mental deficiency of the Wikipedia generation and only demonstrates how little you actually know.

>>>A whole book of your substance-free vitriol really does not appeal.

Frightened? What a shame.

>>>I think you have misunderstood my point (

I have understood that you HAVE no point and are merely blowing slogans out of your backside as a way of not actually engaging the data and arguments and proving even more conclusively that you are a fraudulent bag of wind.

>>>But I am curious... What argument do you think you have given that is over my head?

The better question is which one has NOT been over your head.

>>>The authorship of any ancient text has to be considered suspect at best.

Scorched earth policy predicted and fulfilled. What a shame those who actually are experts in the field aren't blowing the same bubbles out of their own backsides.

Popular posts from this blog

How Many Children in Bethlehem Did Herod Kill?

The Bogus Gandhi Quote

Where did Jesus say "It is better to give than receive?"

Discussing Embryonic Stem Cell Research

Tillich, part 2: What does it mean to say "God is Being Itself?"

Revamping and New Articles at the CADRE Site

The Folded Napkin Legend

A Botched Abortion Shows the Lies of Pro-Choice Proponents

Do you say this of your own accord? (John 18:34, ESV)

A Non-Biblical Historian Accepts the Key "Minimum Facts" Supporting Jesus' Resurrection