To start, I looked for data objects that already existed for the bible on the internet. I looked at using existing APIs for popular online Bible services, but didn’t find any that offered the backend database support I’d need for relational tables in MySQL.
I googled “NRSV xml” and found some good stuff, including a file that’s no doubt a copyright violation and might be taken down at any time.
(If starting over, I’d use the SBL GNT or a KJV version that’s now public domain)
The first bottleneck I stumbled on when building the plugin was the bible structure. How could I get an XML file that was currently > 5000 KB to the user without them leaving the site first?
Answer 1: Strip the verses out.
The second bottleneck: once I’d stripped the verses (the bulk of the data) out of the file, was there a way to compress it even further?
Answer 2: All of the data needed are book names + number of chapters + number of verses per chapter. At that point, the XML resembles a nested array of strings + integers — at least more than it resembles a dictionary or catalog of multi-layered data objects.
I was enough of a newbie at data objects in PHP, so I chose to use C# and the Visual Studio IDE so I could debug and troubleshoot quicker.
Here’s the resulting file (just kept it all in solution’s default doc: Program.cs)
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.Linq;
using System.Text;
using System.Xml;
using System.IO;
using System.Xml.Linq;
using System.Dynamic;
using System.Reflection;
namespace BibleXmlStructurizer
{
class Program
{
static void Main(string[] args)
{
elementid = 0;
string filepath = "";
XDocument doc = XDocument.Load(filepath);
// load Bible
XElement bible = doc.Descendants("bible").FirstOrDefault();
//oldtestament for a tag that helps the JS parse it into separate visual elements
bool oldtestament = true;
// Generic to tell what types of data for keys, values
Dictionary<string, string> bookDictionary = new Dictionary<string, string>();
foreach (var book in bible.Elements("book"))
{
// Get name from the attribute value
var name = book.Attribute("name").Value;
// calls method to get XML data
string bookXml = GetBookXML(book, oldtestament);
// Add to dictionary with name as the key.
bookDictionary[name] = bookXml;
}
// Output
StringBuilder sb = new StringBuilder();
sb.Append("<bible>");
foreach (var book in bookDictionary)
{
sb.Append(book.Value);
}
sb.Append("</bible>");
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] txt = encoding.GetBytes(sb.ToString());
outputfilepath = "";
FileStream fs = new FileStream(outputfilepath, FileMode.Create, FileAccess.ReadWrite);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(txt);
bw.Close();
}
static string GetBookXML(XElement book, bool oldtestament)
{
var sb = new StringBuilder();
XmlTextWriter output = new XmlTextWriter(new StringWriter(sb));
output.WriteStartElement("book");
// splits attributes based on " character ; for newbies, \ is escape character
string splitter = "\"";
string[] splitAttribute;
splitAttribute = book.Attribute("name").ToString().Split(splitter.ToCharArray());
output.WriteAttributeString("name", splitAttribute[1]);
if (oldtestament)
output.WriteAttributeString("section", "OT");
else
output.WriteAttributeString("section", "NT");
int chaptercount = book.Elements("chapter").Count();
output.WriteAttributeString("chaptercount", chaptercount.ToString());
foreach (var chapter in book.Elements("chapter"))
{
int versecount = 0;
output.WriteStartElement("chapter"); // <chapter>
splitAttribute = chapter.Attribute("name").ToString().Split(splitter.ToCharArray());
output.WriteAttributeString("name", splitAttribute[1]);
output.WriteAttributeString("id", elementid.ToString());
foreach (var verse in chapter.Elements("verse"))
{
versecount++;
output.WriteStartElement("verse"); // <verse>
output.WriteAttributeString("name", versecount.ToString());
output.WriteAttributeString("id", elementid.ToString());
output.WriteString(verse.Value.ToString());
output.WriteEndElement(); // </verse>
}
output.WriteEndElement(); // </chapter>
}
if (book.Attribute("name").ToString() == "name=\"Malachi\"")
{
oldtestament = false;
}
output.WriteEndElement();
output.Close();
return sb.ToString();
}
}
}
The third bottleneck: for the database structure, I needed easier references to each bible element (so that I didn’t have to keep parsing out on every GET or PUT to figure out what it referenced).
Answer 3: add this to the class
public static int elementid;
and then for every element write it out as an attribute:
output.WriteAttributeString("id", elementid.ToString());
then, after every time you’ve written to the XML file, add incrementing code
element++;
While it’s not the most elegant code, it got the job done — and it gave me the resulting lightweight XML file I needed.
Full Bible: >5000 KB
Bible_min_with_IDs: 983 KB
Bible_min: 49KB
49 KB is acceptable to store in a user’s browser cache so they have a good user experience every time. Also, I’m not too worried about the Bible_min_with_IDs because 983 KB will be loaded locally by the PHP code — not too big of an issue.
Next up: Javascript + CSS to make this look pretty.








Speak Your Mind
You must be logged in to post a comment.