The Most Important Person in the Bible

Today’s guest blogger is Sean Boisen, senior information architect at Logos.

Logos Bible Software iscontinually undertaking new projects to expand our tools for Bible study. Many of these involve wading through data, usually lots and lots of data.

For example, the Biblical People feature (described in this previous post) provides Bible references, family relationships, social roles, and other information for every person mentioned in the Bible, some 3000 different individuals in all.

I’m currently working to enrich this data set much further to include place names, other named entities (like ethnic groups and languages), and an even richer set of relationships: people who knew each other or collaborated together, places they lived or visited, their beliefs, and many other kinds of information.

But too many projects chasing too little time means you have to prioritize. This raises an interesting question: how to prioritize development for our people data so we spend the most effort on the names that will matter most to those studying the Bible?

Since I’m inherently a data-driven, quantitative type of guy, my practical answer is to:

  • assign a numeric weight to each name
  • start at the top and work my way down the list in order
  • stop when when the available resources, enthusiasm, or both are exhausted

Since we’ve got the data that connects people to the passages that refer to them, a good starting place is simply to go through and count how many times each person is mentioned in the Scriptures. There’s an important technical detail here:I really do mean references to people, not just names (as strings). To see why this matters, consider:

  • the same person can be known by several different names (Peter, Simon, Simeon and Cephas are all names used in the New Testament for Jesus’ disciple)
  • the same name can be used for several different people, or even different kinds of things

As an example of this second point, it’s not enough to find the string “Judah” in a verse: you want to know when it’s Judah the person, as opposed to a cover term for Israel or the Southern Kingdom. For hard cases like Judah, the only way to know is to go through verse by verse by hand and decide. (This investment of effort is one thing that makes Logos’ Biblical People data such a uniquely valuable resource.)

For many other cases, while the name is only used to refer to people, there are numerous individuals with the same name. Zechariah is the toughest case here: there are 30 distinct ones in our database. So just counting occurrences of the string “Zechariah” doesn’t get it right: you need to know whether it’s the prophet Zechariah (from the Old Testament book of the same name), the father of John the Baptist, or one of the 28 others (most of which are only mentioned oncein the entire Bible). So some pretty detailed data is required to do a reasonable job with this computation.

There are many different ways you could count and compute weights on a per-person basis. Here’s one (there are other reasonable possibilities too):

  • Let frequencybe a count of the number of verses that mention a given individual (only counting one for verses like Luke 22:31, “Simon, Simon, Satan has desired to sift you like wheat”, which shouldn’t really count as two observations of Simon’s significance as a Biblical character).
  • Let book dispersionbe the number of books of the Bible that mention the individual. The intuition here is that, for two individuals with the same frequency, the one that’s mentioned in more books is probably more important, broadly speaking.
  • Let chapter dispersionsimilarly be the number of chapters in which a mention occurs. This helps distinguish people mentioned frequently but within a relatively shorter range of verses.
  • Normalize these values by their maximums (frequency=1370, book mentions=31, chapter mentions=258) just to scale things more nicely
  • Assign a weight to each of these three factors (I used 0.6 for frequency, 0.2 for book dispersion, and 0.2 for chapter dispersion: clearly this choice affects the outcome).
  • Multiply each factor by its weight, and add the results to get a number between 1 and 0.

Here’s a graph that shows this metric for the top 50 people, along with the individual factors. (The image is linked to a larger version where the names can be read.)

While the top names (Jesus, David, Moses, Jacob, Abraham) are no surprise, there are some interesting observations farther down.

First, the composite metric really does change the rankings: Levi is #15 by this method, but #52 if you only ranked by frequency. Likewise, King Saul would be #51 if you only ranked by book mentions, because he’s mentioned in just a few books: but he’s clearly one of the most important characters in those books, and so it seems fitting that incorporating frequency and chapter dispersion boosts him up to #10 in the composite metric rank.

Graphically, the places where the lines approach each other are the cases where the various factors are more equal, and places where they’re farthest apart (Judah’s a good example) where they’re most skewed. Back to the previous point about counting genuine person name instances versus strings: only 99 of the approximately 780 occurrences of “Judah” actually refer to Jacob and Leah’s son, so counting strings would be highly misleading here.

Since names, like many linguistic phenomena, typically follow a Zipfian Distribution(sometimes called a “long tail” or power law distribution), it’s no surprise that the majority (1634 of the 2987) of these names occur exactly once in the Bible, and the 59 most frequent names account for about half of all the name mentions in the Bible. So clearly these top names deserve much more attention than the long tail. Important disclaimer:I’m not making any claims here about theological or historical importance. That’s a subjective matter, and you’d get different answers depending on your perspective.

One advantage of making ideas explicit and quantifiable is that you can compare their predictions against your intuitions and see how they compare. Some other factors that might improve the estimate even further (and remember, this is just an estimate):

  • Though we value the whole of Scripture, there’s a sense in which certain sections are broader in their implications. For example, anyone mentioned in the first chapters of Genesis should probably get an extra measure of importance: these are the foundational stories of Hebrew and Christian history.
  • We’re only counting proper names here: other descriptions and pronouns would help refine these measurements even further (we don’t have this data yet, however)
  • External sources (like Bible dictionaries) are a rich and quantifiable source of judgments about importance: the more words or sentences used to describe an individual, the more important they’re likely to be. By consulting several dictionaries, you can overcome the biases of an individual work or editorial slant. The key feature here is making the connection between the described individual (often in a numbered paragraph) and the Biblical character: we don’t have that data yet, but it’s in our plans for the future, and an approximation with
    a bit of programming ought to be possible at better than 90% accuracy.

Postscripts

  • Some of this material was previously posted hereat my Blogos weblog. Unfortunately, as of this writing, some problems with my service provider have made these posts unavailable.
  • This post at OpenBible.info is a response to the original series, with some interesting thoughts about alternative ways to rank names.

Related Posts

Follow-up posts here at the Logos Blogusing Many Eyes to further analyze and visualize the data:

Update 5/25 — Chris Anderson, author of the best-selling book The Long Tail and editor-in-chief at Wired magazine wrote about this post on his blog! Check it out: The Long Tail of Bible People (AKA Jesus is #1!)

Comments

  1. What would be cool at some point is to tag pronominal and implicit references to individuals. This would pose some considerable challenges, but I think it would be an awesome addition to the data sets. An individual may be called by name at first, but then all subsequent times he may be referred to as “he” or “him” or “she” or “her.” Do you think this is ever possible at some point down the road?

  2. That’s just what i meant by my comment on descriptions and pronouns near the end. As a concrete example, in Acts 24:1-2, we have both “the governor” (a description) and “when he had been summoned” (a pronoun), both of which refer all the way back to the proper name first introduced in Acts 23:24, “Felix the governor”.
    As you say, creating this data is hard work (this example shows it’s not nearly as easy as “find the last mentioned name”), but it would be really valuable. While we don’t have a specific timetable or plan, I hope we’ll do this at some point.

  3. I guess I didn’t read quite carefully enough! I totally missed that comment. Thanks for pointing me to it.
    Your desire to do this kind of hard work excites me. It sounds like you’re a great fit for Logos. I’m looking forward to more of your contributions here on the blog and in future versions of Logos.

  4. Rob Sutphen says:

    Interesting project, but you need to find a way to discount people who are primarily named in a “son of _____” formula. That’s clearly why Jesse and Nun made your list. Also, there should be a way to get Elisha in the top 50. So maybe almost all his appearances are in one book, and he is called the “man of God” in many of his appearances. He is still much important in the scheme of things than Sihon or Uzziah.

  5. Lee Bradshaw says:

    How about the most important 100 women in the Bible?

  6. This seems a fantastic facility.
    But how does it work? How do I find out all the relatives of (e.g.) Aaron?
    Can I find out who all the “Philips” are and distinguish between them? Similarly for “James” and “Mary”.
    God bless,
    Noel.

  7. You’ve got a good point about “son of ___” constructions: this probably accounts for 7 or 8 of the 43 mentions of Jesse. This goes back to the earlier comment about also needing to label descriptions and pronouns: though Jesse’s name gets mentioned in such a phrase, clearly the primary semantics of “the son of Jesse” is a reference to David (and several instances don’t even mention David at all: another good reason why you need semantic information, not just words). Even more so for Nun: of all the mentions of his name, none (get it?) seem to be anything other than a reference to Joshua.
    As for Elisha, there are a number of ways one might attempt to capture your intuition of “more important in the scheme of things”:
    * his social roles (prophets and kings probably ought to get ranked more highly than farmers)
    * his relationships with other important figures like Elijah ought to be figured in as well
    These kinds of thing only becomes possible when you have a richer set of semantic data: that’s one reason we’re headed in this direction.

  8. Noel:
    To use the Biblical People feature, go to Tools > Bible Data > Biblical People, and then type a name into the search box at the top. The Biblical People Add-in is included in all current versions of Logos Software except Christian Home Library and Original Languages Library (see the comparison chart at http://www.logos.com/products/info/comparison).
    If you do this for Philip, the first (default) person you’ll see is Jesus’ apostle: then under “See Also” there are links (they’ll change color if you hover over them) to two other Philips, the brother of Herod and Philip the evangelist.
    Note that the ovals in the relationship diagrams, and the person names under “Related People” are also clickable: so if you bring up the data for Aaron, you’ll get a complex diagram that includes his uncle Uzziel. If you click on Uzziel, now you’ll see all of his information (including the fact that Aaron is his nephew, the same relationship going the other direction).

  9. Good Day,
    We’re writing to you from the Christian Blog Awards, the first UK award ceremony designed to celebrate Christian websites and blogs. We’ve noticed your site and think you’d be perfect to enter.
    To check the rules of entry or find out about prizes and the awards ceremony on the 21st September, check out site, http://www.christianblogawards.com or to enter your site straight away, email christianblogawards@premier.org.uk
    We hope to hear from you soon!