To start, I looked for data objects that already existed for the bible on the internet. I looked at using existing APIs for popular online Bible services, but didn’t find any that offered the backend database support I’d need for relational tables in MySQL.
I googled “NRSV xml” and found some good stuff, including a file that’s no doubt a copyright violation and might be taken down at any time.
(If starting over, I’d use the SBL GNT or a KJV version that’s now public domain)
The first bottleneck I stumbled on when building the plugin was the bible structure. How could I get an XML file that was currently > 5000 KB to the user without them leaving the site first?
Answer 1: Strip the verses out.
The second bottleneck: once I’d stripped the verses (the bulk of the data) out of the file, was there a way to compress it even further?
Answer 2: All of the data needed are book names + number of chapters + number of verses per chapter. At that point, the XML resembles a nested array of strings + integers — at least more than it resembles a dictionary or catalog of multi-layered data objects.
I was enough of a newbie at data objects in PHP, so I chose to use C# and the Visual Studio IDE so I could debug and troubleshoot quicker.
Here’s the resulting file (just kept it all in solution’s default doc: Program.cs)
using System; using System.Collections.Generic; using System.Collections.Concurrent; using System.Linq; using System.Text; using System.Xml; using System.IO; using System.Xml.Linq; using System.Dynamic; using System.Reflection; namespace BibleXmlStructurizer { class Program { static void Main(string[] args) { elementid = 0; string filepath = ""; XDocument doc = XDocument.Load(filepath); // load Bible XElement bible = doc.Descendants("bible").FirstOrDefault(); //oldtestament for a tag that helps the JS parse it into separate visual elements bool oldtestament = true; // Generic to tell what types of data for keys, values Dictionary<string, string> bookDictionary = new Dictionary<string, string>(); foreach (var book in bible.Elements("book")) { // Get name from the attribute value var name = book.Attribute("name").Value; // calls method to get XML data string bookXml = GetBookXML(book, oldtestament); // Add to dictionary with name as the key. bookDictionary[name] = bookXml; } // Output StringBuilder sb = new StringBuilder(); sb.Append("<bible>"); foreach (var book in bookDictionary) { sb.Append(book.Value); } sb.Append("</bible>"); System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding(); byte[] txt = encoding.GetBytes(sb.ToString()); outputfilepath = ""; FileStream fs = new FileStream(outputfilepath, FileMode.Create, FileAccess.ReadWrite); BinaryWriter bw = new BinaryWriter(fs); bw.Write(txt); bw.Close(); } static string GetBookXML(XElement book, bool oldtestament) { var sb = new StringBuilder(); XmlTextWriter output = new XmlTextWriter(new StringWriter(sb)); output.WriteStartElement("book"); // splits attributes based on " character ; for newbies, \ is escape character string splitter = "\""; string[] splitAttribute; splitAttribute = book.Attribute("name").ToString().Split(splitter.ToCharArray()); output.WriteAttributeString("name", splitAttribute[1]); if (oldtestament) output.WriteAttributeString("section", "OT"); else output.WriteAttributeString("section", "NT"); int chaptercount = book.Elements("chapter").Count(); output.WriteAttributeString("chaptercount", chaptercount.ToString()); foreach (var chapter in book.Elements("chapter")) { int versecount = 0; output.WriteStartElement("chapter"); // <chapter> splitAttribute = chapter.Attribute("name").ToString().Split(splitter.ToCharArray()); output.WriteAttributeString("name", splitAttribute[1]); output.WriteAttributeString("id", elementid.ToString()); foreach (var verse in chapter.Elements("verse")) { versecount++; output.WriteStartElement("verse"); // <verse> output.WriteAttributeString("name", versecount.ToString()); output.WriteAttributeString("id", elementid.ToString()); output.WriteString(verse.Value.ToString()); output.WriteEndElement(); // </verse> } output.WriteEndElement(); // </chapter> } if (book.Attribute("name").ToString() == "name=\"Malachi\"") { oldtestament = false; } output.WriteEndElement(); output.Close(); return sb.ToString(); } } }
The third bottleneck: for the database structure, I needed easier references to each bible element (so that I didn’t have to keep parsing out on every GET or PUT to figure out what it referenced).
Answer 3: add this to the class
public static int elementid;
and then for every element write it out as an attribute:
output.WriteAttributeString("id", elementid.ToString());
then, after every time you’ve written to the XML file, add incrementing code
element++;
While it’s not the most elegant code, it got the job done — and it gave me the resulting lightweight XML file I needed.
Full Bible: >5000 KB
Bible_min_with_IDs: 983 KB
Bible_min: 49KB
49 KB is acceptable to store in a user’s browser cache so they have a good user experience every time. Also, I’m not too worried about the Bible_min_with_IDs because 983 KB will be loaded locally by the PHP code — not too big of an issue.
Next up: Javascript + CSS to make this look pretty.
Speak Your Mind
You must be logged in to post a comment.