Bible Taxonomy – Start Off Right (Minimized XML)

Note: This post is series explaining how I created the new Bible Taxonomy tool as seen on DiscipleShare. To see it in action, or to find great, free curriculum to use in churches, visit: http://www.discipleshare.net/

To start, I looked for data objects that already existed for the bible on the internet. I looked at using existing APIs for popular online Bible services, but didn’t find any that offered the backend database support I’d need for relational tables in MySQL.

I googled “NRSV xml” and found some good stuff, including a file that’s no doubt a copyright violation and might be taken down at any time.

(If starting over, I’d use the SBL GNT or a KJV version that’s now public domain)

The first bottleneck I stumbled on when building the plugin was the bible structure. How could I get an XML file that was currently > 5000 KB to the user without them leaving the site first?

Answer 1: Strip the verses out.

The second bottleneck: once I’d stripped the verses (the bulk of the data) out of the file, was there a way to compress it even further?

Answer 2: All of the data needed are book names + number of chapters + number of verses per chapter. At that point, the XML resembles a nested array of strings + integers — at least more than it resembles a dictionary or catalog of multi-layered data objects.

I was enough of a newbie at data objects in PHP, so I chose to use C# and the Visual Studio IDE so I could debug and troubleshoot quicker.

Here’s the resulting file (just kept it all in solution’s default doc: Program.cs)

using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
using System.Xml.Linq;
using System.Dynamic;
using System.Reflection;

namespace BibleXmlStructurizer
{
    class Program
    {

        static void Main(string[] args)
        {
            elementid = 0;
			
			string filepath = &quot;&quot;;
            XDocument doc = XDocument.Load(filepath);

            // load Bible
            XElement bible = doc.Descendants(&quot;bible&quot;).FirstOrDefault();
            
            //oldtestament for a tag that helps the JS parse it into separate visual elements
            bool oldtestament = true;

			// Generic to tell what types of data for keys, values
            Dictionary&lt;string, string&gt; bookDictionary = new Dictionary&lt;string, string&gt;();

            foreach (var book in bible.Elements(&quot;book&quot;))
            {
                // Get name from the attribute value
                var name = book.Attribute(&quot;name&quot;).Value;
                
                // calls method to get XML data
                string bookXml = GetBookXML(book, oldtestament); 
                
                // Add to dictionary with name as the key.
                bookDictionary[name] = bookXml;
            }

			// Output
            StringBuilder sb = new StringBuilder();
            sb.Append(&quot;&lt;bible&gt;&quot;);
            foreach (var book in bookDictionary)
            {
                sb.Append(book.Value);
            }
            sb.Append(&quot;&lt;/bible&gt;&quot;);

            System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
            byte[] txt = encoding.GetBytes(sb.ToString());
            outputfilepath = &quot;&quot;;
            FileStream fs = new FileStream(outputfilepath, FileMode.Create, FileAccess.ReadWrite);
            BinaryWriter bw = new BinaryWriter(fs);
            bw.Write(txt);
            bw.Close();
            
        }
        
        static string GetBookXML(XElement book, bool oldtestament)
        {
            var sb = new StringBuilder();
            XmlTextWriter output = new XmlTextWriter(new StringWriter(sb));
            output.WriteStartElement(&quot;book&quot;);
            
            // splits attributes based on &quot; character ; for newbies, \ is escape character
            string splitter = &quot;\&quot;&quot;;
            string[] splitAttribute;
  
            splitAttribute = book.Attribute(&quot;name&quot;).ToString().Split(splitter.ToCharArray());
            output.WriteAttributeString(&quot;name&quot;, splitAttribute[1]);

            if (oldtestament)
                output.WriteAttributeString(&quot;section&quot;, &quot;OT&quot;);
            else
                output.WriteAttributeString(&quot;section&quot;, &quot;NT&quot;);

            int chaptercount = book.Elements(&quot;chapter&quot;).Count();
			output.WriteAttributeString(&quot;chaptercount&quot;, chaptercount.ToString());

            foreach (var chapter in book.Elements(&quot;chapter&quot;))
            {
                int versecount = 0;

                output.WriteStartElement(&quot;chapter&quot;); // &lt;chapter&gt;
                splitAttribute = chapter.Attribute(&quot;name&quot;).ToString().Split(splitter.ToCharArray());
                output.WriteAttributeString(&quot;name&quot;, splitAttribute[1]);

                output.WriteAttributeString(&quot;id&quot;, elementid.ToString());

                foreach (var verse in chapter.Elements(&quot;verse&quot;))
                {
                    versecount++;

                    output.WriteStartElement(&quot;verse&quot;); // &lt;verse&gt;
                    output.WriteAttributeString(&quot;name&quot;, versecount.ToString());
                    output.WriteAttributeString(&quot;id&quot;, elementid.ToString());
                    output.WriteString(verse.Value.ToString());
                    output.WriteEndElement(); // &lt;/verse&gt;

                }

                output.WriteEndElement(); // &lt;/chapter&gt;
            }

            if (book.Attribute(&quot;name&quot;).ToString() == &quot;name=\&quot;Malachi\&quot;&quot;)
            {
                oldtestament = false;
            }

            
            output.WriteEndElement();
            output.Close();

            return sb.ToString();
        }
    }
}

The third bottleneck: for the database structure, I needed easier references to each bible element (so that I didn’t have to keep parsing out on every GET or PUT to figure out what it referenced).

Answer 3: add this to the class

 public static int elementid;

and then for every element write it out as an attribute:

output.WriteAttributeString(&quot;id&quot;, elementid.ToString());

then, after every time you’ve written to the XML file, add incrementing code

element++;

While it’s not the most elegant code, it got the job done — and it gave me the resulting lightweight XML file I needed.

Full Bible: >5000 KB
Bible_min_with_IDs: 983 KB
Bible_min: 49KB

49 KB is acceptable to store in a user’s browser cache so they have a good user experience every time. Also, I’m not too worried about the Bible_min_with_IDs because 983 KB will be loaded locally by the PHP code — not too big of an issue.

Next up: Javascript + CSS to make this look pretty.

Bible Taxonomy – Start Off Right (Minimized XML)

Speak Your Mind Cancel reply

Adam Frieberg

How to Find Me

Where I Focus

Near to My Heart

Categories

Posts by Month