To start, I looked for data objects that already existed for the bible on the internet.   I looked at using existing APIs for popular online Bible services, but didn’t find any that offered the backend database support I’d need for relational tables in MySQL.  
I googled “NRSV xml” and found some good stuff, including a file that’s no doubt a copyright violation and might be taken down at any time.
(If starting over, I’d use the SBL GNT or a KJV version that’s now public domain)
The first bottleneck I stumbled on when building the plugin was the bible structure. How could I get an XML file that was currently > 5000 KB to the user without them leaving the site first?
Answer 1: Strip the verses out.
The second bottleneck: once I’d stripped the verses (the bulk of the data) out of the file, was there a way to compress it even further?
Answer 2: All of the data needed are book names + number of chapters + number of verses per chapter. At that point, the XML resembles a nested array of strings + integers — at least more than it resembles a dictionary or catalog of multi-layered data objects.
I was enough of a newbie at data objects in PHP, so I chose to use C# and the Visual Studio IDE so I could debug and troubleshoot quicker.
Here’s the resulting file (just kept it all in solution’s default doc: Program.cs)
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
using System.Xml.Linq;
using System.Dynamic;
using System.Reflection;
namespace BibleXmlStructurizer
{
    class Program
    {
        static void Main(string[] args)
        {
            elementid = 0;
			
			string filepath = "";
            XDocument doc = XDocument.Load(filepath);
            // load Bible
            XElement bible = doc.Descendants("bible").FirstOrDefault();
            
            //oldtestament for a tag that helps the JS parse it into separate visual elements
            bool oldtestament = true;
			// Generic to tell what types of data for keys, values
            Dictionary<string, string> bookDictionary = new Dictionary<string, string>();
            foreach (var book in bible.Elements("book"))
            {
                // Get name from the attribute value
                var name = book.Attribute("name").Value;
                
                // calls method to get XML data
                string bookXml = GetBookXML(book, oldtestament); 
                
                // Add to dictionary with name as the key.
                bookDictionary[name] = bookXml;
            }
			// Output
            StringBuilder sb = new StringBuilder();
            sb.Append("<bible>");
            foreach (var book in bookDictionary)
            {
                sb.Append(book.Value);
            }
            sb.Append("</bible>");
            System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
            byte[] txt = encoding.GetBytes(sb.ToString());
            outputfilepath = "";
            FileStream fs = new FileStream(outputfilepath, FileMode.Create, FileAccess.ReadWrite);
            BinaryWriter bw = new BinaryWriter(fs);
            bw.Write(txt);
            bw.Close();
            
        }
        
        static string GetBookXML(XElement book, bool oldtestament)
        {
            var sb = new StringBuilder();
            XmlTextWriter output = new XmlTextWriter(new StringWriter(sb));
            output.WriteStartElement("book");
            
            // splits attributes based on " character ; for newbies, \ is escape character
            string splitter = "\"";
            string[] splitAttribute;
  
            splitAttribute = book.Attribute("name").ToString().Split(splitter.ToCharArray());
            output.WriteAttributeString("name", splitAttribute[1]);
            if (oldtestament)
                output.WriteAttributeString("section", "OT");
            else
                output.WriteAttributeString("section", "NT");
            int chaptercount = book.Elements("chapter").Count();
			output.WriteAttributeString("chaptercount", chaptercount.ToString());
            foreach (var chapter in book.Elements("chapter"))
            {
                int versecount = 0;
                output.WriteStartElement("chapter"); // <chapter>
                splitAttribute = chapter.Attribute("name").ToString().Split(splitter.ToCharArray());
                output.WriteAttributeString("name", splitAttribute[1]);
                output.WriteAttributeString("id", elementid.ToString());
                foreach (var verse in chapter.Elements("verse"))
                {
                    versecount++;
                    output.WriteStartElement("verse"); // <verse>
                    output.WriteAttributeString("name", versecount.ToString());
                    output.WriteAttributeString("id", elementid.ToString());
                    output.WriteString(verse.Value.ToString());
                    output.WriteEndElement(); // </verse>
                }
                output.WriteEndElement(); // </chapter>
            }
            if (book.Attribute("name").ToString() == "name=\"Malachi\"")
            {
                oldtestament = false;
            }
            
            output.WriteEndElement();
            output.Close();
            return sb.ToString();
        }
    }
}
The third bottleneck: for the database structure, I needed easier references to each bible element (so that I didn’t have to keep parsing out on every GET or PUT to figure out what it referenced).
Answer 3: add this to the class
public static int elementid;
and then for every element write it out as an attribute:
output.WriteAttributeString("id", elementid.ToString());
then, after every time you’ve written to the XML file, add incrementing code
element++;
While it’s not the most elegant code, it got the job done — and it gave me the resulting lightweight XML file I needed.
Full Bible: >5000 KB
Bible_min_with_IDs: 983 KB
Bible_min: 49KB
49 KB is acceptable to store in a user’s browser cache so they have a good user experience every time. Also, I’m not too worried about the Bible_min_with_IDs because 983 KB will be loaded locally by the PHP code — not too big of an issue.
Next up: Javascript + CSS to make this look pretty.








Speak Your Mind
You must be logged in to post a comment.