Tuesday, December 18, 2012

Implementing my Useful Resources Page part 1

Introduction

So I'm going to go over some of how I implemented my Useful Resources page. This time I'm going to focus on the editing portion. This is how I add/remove/modify the structure of links and categories. I use an XML file for this. XML is a great way to represent hierarchical data, and is very easy to modify. I have a MySQL database which keeps track of shared data, for example if I want to use a sub-category's information multiple times. This is mainly done so I don't have to write the description multiple times if I happen to have say a Java sub-category in the Documentations and Manuals section as well as the User Groups section. I won't discuss too much about the MySQL portion this time.

Defining an XML Structure

Now there are ways to define an XML schema and structure rules. You can do this using DTD's, or using XML Schemas. However, for my purposes I just decided to parse the structure implicitly.

I have 2 different items in my useful links xml: categories and links.

A category can contain any number of sub categories and/or links. The category is identified by its tag. This allows my program to re-use certain information such as titles and descriptions. Because I'm lazy I shortened the actual tag name to cat.

A link contains a few attributes. There's an href attribute for the target link and a title attribute. The textual information between the start and end link tag is a description of the link. The tag name is link.

All other tags are ignored.

<page>
<cat tag="doc">
 <cat tag="cs">
  <cat tag="c">
   <link href="http://cdecl.org/" title="cdecl">Translates C code to and from english.</link>
  </cat>
 </cat>
</cat>
</page>

Parsing XML in Java

There are two ways to read/parse XML in Java. The first way is a streaming/notification method where you parse the XML in a linear-like fashion. The Java API to accomplish this is the Simple API for XML, or SAX. The second method builds an object structure for the XML. This is known as Document Object Model, or DOM. Each method has its benefits and drawbacks.

A DOM is great if you want to perform multiple random-access operations. It only has to parse the XML once and then you have a useful structure for doing anything to the structure such as out of order tree navigation, adding/removing nodes, etc. Indeed, basically all modern web browsers use a DOM since the assumption is made that a webpage is more likely to be performing one or more of these operations vs. loading new HTML/XML data. There are 2 drawbacks of the DOM parser. The first down side is that you have the structure loaded into memory. This takes time and memory.

A SAX is great when the operations you want to perform are mostly sequential/streamed. You parse elements as they come so it's quite fast, and you only have to keep track of a limited amount of information so there is a small memory footprint. However, since you're only allowed a forward linear navigation of the XML structure, performing out of order operations really isn't possible, or is computationally expensive.

A SAX is quite good for building a custom DOM-like structure. I use this parsing method to build a custom DOM-like structure which I can manipulate later.

The DefaultHandler Class

The DefaultHandler class is a basic event-driven handler for SAX parsing. When the driver finds an item of interest it will propogate an event to a DefaultHandler.

All you need to do is extend this class and then override any methods to handle items you wish to support. The base implementation I believe doesn't do anything and ignores any propogated events.

For most general processing there are 3 methods you'll be interested in:

  1. public void startElement(String uri, String localName, String qName, Attributes attributes)
    This method is called whenever you get a starting tag. You have access to the basic tag information and the attributes associated with that tag.
  2. public void endElement(String uri, String localName, String qName) throws SAXException
    This method is called whenever you get an ending tag.
  3. public void characters(char ch[], int start, int length) throws SAXException
    This method is called with any text data. Note that your parser must explicitly keep track of what tag you're in if that's something which is important to how you want to process the XML.

Mkyong has a pretty good article describing the actual code required to use the SAX parser on an XML datasource. I've copied the basic code below.

public void parse(InputStream stream) throws ParserConfigurationException, SAXException, IOException
{
 SAXParserFactory factory = SAXParserFactory.newInstance();
 SAXParser parser = factory.newSAXParser();
 DefaultHandler some_handler; // TODO: you need to initialize this correctly
 parser.parse(stream, some_handler, null);
}

Here's my DefaultHandler implementation class. It makes use of a MySQL database to share category information. This allows me do create a Java category which describes the Java language, then I don't have to re-write or copy/paste anything to get the same information if I put this under multiple categories such as documentation, or a user groups category. I use an enum which maps the XML tag information to the identifying category in my database. This is kind of a hack and I'm taking advantage of the ordinality and string name conversion provided by the Java implementation of enums.

import java.io.IOException;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class StructureParser extends DefaultHandler
{
 private ArrayList<Category> categories;

 private Category current_cat;

 // the mysql connection
 private Connection conn;
 private PreparedStatement query;

 private Link current_link;

 public StructureParser() throws SQLException, ClassNotFoundException
 {
  // No, this isn't a valid username/password for my database!
  conn = DriverManager.getConnection("jdbc:mysql://10.0.0.2:3306", "username",
    "password");

  query = conn.prepareStatement("select * from useful_links.categories where id = ?;");
 }

 public ArrayList<Category> parse(InputStream stream) throws ParserConfigurationException, SAXException,
   IOException
 {
  categories = new ArrayList<Category>();
  SAXParserFactory factory = SAXParserFactory.newInstance();
  SAXParser parser = factory.newSAXParser();
  current_cat = null;
  current_link = null;
  parser.parse(stream, this, null);
  return categories;
 }

 @Override
 public void startElement(String uri, String localName, String qName, Attributes attributes)
   throws SAXException
 {
  if (qName.equalsIgnoreCase("cat"))
  {
   // query MySQL database for info
   int tag_id_val = tag_id.valueOf(attributes.getValue("tag")).ordinal();
   String tag = "";
   String desc = "";
   String name = "";
   try
   {
    query.setInt(1, tag_id_val);
    ResultSet result = query.executeQuery();
    if (result.next())
    {
     tag = result.getString("tag");
     name = result.getString("title");
     desc = result.getString("description");
    }
    else
    {
     // invalid tag id
     throw new RuntimeException("Invalid tag id");
    }
   }
   catch (SQLException e)
   {
    // Auto-generated catch block
    e.printStackTrace();
   }
   Category cat = new Category(name, tag, desc);
   if (current_cat != null)
   {
    // add a sub-category
    cat.parent = current_cat;
    current_cat.add(cat);

   }
   current_cat = cat;
  }
  else if (qName.equalsIgnoreCase("link"))
  {
   if (current_cat != null)
   {
    // add a link
    current_link = new Link(attributes.getValue("href"), attributes.getValue("title"),
      "");
    current_cat.add(current_link);
   }
  }
 }

 @Override
 public void endElement(String uri, String localName, String qName) throws SAXException
 {
  if (qName.equalsIgnoreCase("cat"))
  {
   if (current_cat != null)
   {
    if (current_cat.parent == null)
    {
     // a top-level category
     categories.add(current_cat);
    }
    current_cat = current_cat.parent;
   }
  }
  else if (qName.equalsIgnoreCase("link"))
  {
   // ends the current link
   current_link = null;
  }
 }

 @Override
 public void characters(char ch[], int start, int length) throws SAXException
 {
  if (current_link != null)
  {
   // set description of the current link
   current_link.description = new String(ch, start, length);
  }
 }
}

I could have better error handling code but this is functional enough for my uses. The documentation is definitely lacking, too.

That's all for now, I'll go into more details on how I implemented the rest of my Useful Resources page later.

No comments :

Post a Comment