Drupal Vacation Week

I don’t know about where you live, but this week coming up is school vacation week around here. As the father of two of bundles of joy, I’ll be taking next week off have some Q.T. with the family as we parade around northern New England, doing what families do (e.g. goofing off, yelling, getting chocolate out of the kids’ hair, etc.).

That means, sadly, that I’ll have to tear myself away from Drupal 6 for a week (really, not a problem, believe me). In preparation for the week off I’ve been busy as all get out trying to reach the milestone that we’ve dubbed the first code merge. By that I mean it’s the first chance for Pete and I to merge the work we’ve been doing on the first phase of our rebuild, the new TV Programs and Schedule module.

Up until now, we’ve working separately (a separate peace, if you will), developing on our own Drupal installs on our development server. Pete has been creating a theme to match the current WGBH.org template (look and feel changes to the site will come in phase 2). I’ve been busy getting the PBS/TV Guide XML data import running, as well as working up the basic functionality of our Programs A-Z page and the Full Day Schedules by Channel pages.

Anyway, it’s high time that we finally incorporated each other’s changes into our own working install. So, this week was devoted to working out a process, not just for a one time code merge, but for rolling out our development work to each other, as well as to a staging area and to live installs going forward.

Currently, we’ve basically got a theme and a custom module for TV schedules. Managing the code is easy; everything is managed in a local SVN repository. Core Drupal and contributed modules and themes are pulled down via CVS and checked into SVN. So the entire code tree can be easily managed across multiple developers and pushed up to staging and live environments via SVN commands.

The tricky part - scratch that - the headache inducing part - no, sorry, scratch that too - the bang-your-head-against-the-wall-in-frustration part - is managing all of the information that’s stored in MySQL. How do we track changes to database structures (DDL) and the actual content that needs to be shared across installs (DML)?

Good question. The answer? Well, there is no easy answer that I’m aware of. The first guess is do a database dump (that’s what it’s called, I swear) of all of the structures and content. In theory that file can be managed via SVN or CVS. Problem there is you usually have lots of junk in your development database that you don’t want to replicate to staging or live environments, or even other development installs. So then you have to start weeding stuff out of the dump, which gets messy fast.

What to do, then? Here’s the strategy we’re going to use, and which i spent most of the week working on:

(1) Create a custom Drupal installation profile. This is used when first setting up a Drupal site. The WGBH profile installs a bunch of modules, sets the default theme (our WGBH.org theme) as well the administration theme (currently Zen Classic, a sub-theme of Zen - nice and clean and simple).

(2) TV Schedules module installation file. This is where most of the action is. Here we do a whole of bunch things like:

  • Create CCK content types (e.g. TV Channel, TV Program, etc.)
  • Create nodes for certain content types (e.g. channels)
  • Create Views (e.g. Programs A-Z)
  • Create a number of blocks and position them on our themes
  • Add menu links
  • Tweak a number of variable settings

I’d like to use the module install and upgrade files to manage changes to database objects and content. The install and upgrade files can be managed in SVN and used to roll out changes to development/staging/live environments. The tricky part is extracting the relevant changes from the database. That’s where it gets messy.

For this initial code merge, I relied mainly on the CCK export/import functionality to get the content type definitions. In theory, this is pretty straightforward. I used the export function on my development install to generate the code to recreate the content types. I added the code to module install file. That should have been it for the content types.

Except that it wasn’t.

CCK (like Views) on Drupal 6 is still in an alpha state, meaning there are unfinished bits, bugs, and general unforeseen shenanigans involved in using it. I found that the export/import process was a bit lacking. It copied the basic content type and field definitions just fine. However, field groupings were lost, and certain field attributes didn’t survive the transfer.

In the end what I did was perform the export/import in the module install file to create the basic types and fields (and related database objects) and then overwrite the data in the CCK definition tables using a database dump of that content from my development database. That part was put into a SQL file which, for now, has to be manually run in MySQL after the module is loaded.

It’s a bit clunky and not ideal, but I figure that part can go away once CCK for D6 is finished.

The good news is the Views export/import process worked much more smoothly. I was able to take the export code and run it from the module install file and the views were created as expected. No (or minimal) data tweaking was necessary. Yay!

Aside from that, creating actual nodes, blocks and menu links in the install file was all pretty straightforward. All in all, though, it took the better of the week to work it all out and get the process in good enough shape to hand off the Pete for testing.

Going forward, it will continue to be a bit of work to continue this process of tracking database changes. But, for now, it’s the best way I know of to proceed. If anybody has a better suggestion, I’m all ears! Please share.

Oh yeah, one annoying issue I’ve had with Views 2 is the lack of support for argument handling. Where the heck did that go? Am I missing something obvious? I found that the views object has a view_php attribute/method which seems to function the same way (I put my D5 view handling code in there), but there is no place in Views UI that I could find to add that code to a view. I had to do it manually.

Again, if you know more about this, let me know! That little problem alone gave 5 or 6 new gray hairs.

There you have it! For now, I need a break from the code and all that cursing.

Drupalcon Day 3 Rundown

What can beat the excitement of Drupalcon Days 1 and 2 - why, Drupalcon Day 3, of course!

I’ll spare you the coffee update today. Safe to say I was well lubed (caffeine-wise) all day. However, I do have a nice head cold now, so the mucous was flowing and I was burning through tissues at a healthy clip. Let this be a warning to all you young folks: if you’re thinking about having children someday, be aware that they’re walking germ factories and you will spend November through March sick as a dog. So, now you’ve been warned.

The first session I attended on Day 3 was SimpleTest: Because clicking on forms is for suckers. I’m certainly familiar with unit testing and had previously heard of the SimpleTest module, which is based on (and requires) the SimpleTest PHP library. But this session gave me a nicer rundown on all that SimpleTest offers, why it’s important to make use of it, and a primer on how to get started with it.

The panelists spoke of the importance of unit testing and recommended (ideally) a unit test for each if statement in your source code. They also warned against running SimpleTest on a production database (don’t do it), and mentioned the limitations of the module, namely that it doesn’t work for testing JavaScript or CSS (i.e. the visual interface).

SimpleTest is not only good for unit testing but also has an internal web browser that can be used for functionality testing (i.e. to test functionality end users will experience through a browser).

They also mentioned the new (as of Drupal 6) SimpleTest Automator module which really simplifies functionality testing by providing (essentially) a macro builder to record user actions (e.g. filling out forms, clicking on links, etc.).

SimpleTest can also be run via the command line using drush

Official Drupal SimpleTest documentation can be found here.

The panel also mentioned that unit testing is going to be added to core in Drupal 7; functional testing will remain a contributed module. The runkit php extension may be required to support all of this.

The keynote session of the day was given by Brian Aker from MySQL, speaking about, well, MySQL and what the fine ladies and gentlemen there are up to. I can sum it up in one word: scalability. Nuff said.

We broke up the day by having lunch with some new friends that we met here at Drupalcon. One was Margaret Rosas of Quiddities, who is currently working with KUSP, a public radio station in California. We also met two nice people from WXXI, a public broadcaster in Rochester, NY: Andrew Wheeland and Janele Robinson

After lunch I checked out the birds of a feather session on Solr led by Robert Douglass. This is a new approach to search on a Drupal site based on the Apache Solr search platform. Unfortunately, I came late to this session and missed a lot of the important stuff. However, having worked with Robert last year, he previously gave us a quick backgrounder on it. Solr offers the ability to have faceted search and performance improvements over core search. It also allows for indexing multiple sites or content not in your Drupal database using the Nutch crawler. Solr also supports use of synonym lists.

It looks pretty handy.

You can see Solr in action by using the search on Rob’s own site. He’s also set up a site that lets you use Solr to seach the Drupal.org site.

Rob has written a Drupal module to integrate Solr called Apache Solr. Note that this is different from the Solr Drupal module, which is an alternative implementation.

Rob also noted that one problem with current core search is that you can’t turn off content and user searching (you can only hide it). Rob has his own module, called Core Searches, which replicates the core content and user search functionality but allows you turn them off, which you may want to do if you run Solr.

Solr is also highly scalable since it is clusterable, meaning you can run as many Solr servers as you want and put a load balancer in front of it all.

Rob also noted that Netflix and Digg both run Solr. Them’s good company.

After the mid-afternoon break and deftly avoiding being included in the Drupalcon group photo (because, good lord, I don’t need photographic evidence that I ever attended a “con” of any sort) I attended my final session of the day: Creating custom workflows for Drupal applications - taking advantage of core hooks and context. The title was a bit deceiving, as it was really more about context and how it can be more efficiently identified, passed along and used by Drupal to make decisions about display or workflow.

The panelists pointed to their site Managing News as an example of how they implemented some nice context logic.

Basically, they used the same approach that the Drupal Internationalization module uses to set the display language for a site. The key here is the function custom_url_rewrite() (renamed to custom_url_rewrite_outbound() in Drupal 6). This function actually lives in settings.php, so it gets executed before Drupal starts doing it’s work on a page request.

The idea here is to prepend the (relative) URL gets with an identifier (e.g. http://www.domain.com/en/node/12345 for English sites). When a link is clicked, this information gets read and stripped out of the URLs using the custom URL function and translated to a normal Drupal URL, hence the context is set and can be used by block/menu/theme logic when the page is generated. The nice thing here is that the context identifier is removed before Drupal starts generating the page, so it doesn’t break pages that depend on URL arguments (e.g. views, panels).

The panelists used this approach to identify and set the context on page load. This is done with a custom function that sets a static variable. Another custom function can then be called by block/menu/theme logic in a simple and clean way to affect the display. This can have nice implications for workflow by allowing you to add links/tabs to pages to give users the ability to perform and action based on context. For example, on a blog entry page, display a link to add a new blog entry. Nice.

My head swimming with another day full of Drupal knowledge I called it a day and went home.

Tomorrow, the grand finale: Day 4!

Deleting is Fun!

I have come up with a solution for deleting large numbers of nodes that bypasses the normal node_delete PHP function, which can be quite slow. Below is a description of what I did which is not necessarily for the faint of heart. Use this method at your ow risk…

The Problem

As I described in our last post, I’ve now got a nice working version of our TV programs and schedules ingestor. In the course of development of this code we’ve previously ingested lots of TV program/episode/airing records creating thousands of nodes. Now that that code is in a somewhat complete and clean state, I wanted to delete all of the old content and start fresh, so we can proceed with the front end development.

The problem was I had well over 30,000 nodes, representing three different content types, that needed to be deleted. The usual method of deleting content in Drupal - the node_delete function - proved to be about as slow as my morning commute into Boston, and so was all but unusable in this situation.

What I needed was a way to delete all nodes of a given content type using SQL statements, rather than the standard Drupal/PHP functionality.

The Solution

Deleting records from the node table by content type is obviously pretty easy. The tricky part is that there can be all sorts of related database records in any number of tables generated by various modules that also need to be deleted. So, I needed an easy way to identify all database records for a node of a given content type.

The solution was to use the Devel module to identify everything that needs to be deleted. First, with this module installed and enabled I turned on the Collect query info and Display query log options. With these in place, each page called will display a list of the queries executed (along with number of executions and execution times) at the bottom.

Next, I created a page in Drupal that would delete a single record of the content type in question. The body of the page was a chunk of PHP code to use node_delete to delete a record, like so:

$res = db_query("SELECT n.nid FROM {node} n WHERE n.type = 'CONTENT_TYPE' LIMIT 1");
while ($n = db_fetch_object($res)) {
node_delete($n->nid);
}

In the above CONTENT_TYPE gets replaced the actual content type in question. You also need to set the input format to PHP code, so it’ll get executed, rather than just displayed.

Then, view this page in the browser. This will cause one node to be deleted and, thanks to the Devel module, all of the SQL statements executed during page generation to be listed at the bottom of the page. Among the list of SELECT statements executed by the page, you’ll also see a bunch of DELETE statements, like for example:

DELETE FROM node_revisions WHERE nid = 1234;

Once I gathered up all of these delete statements, I then rewrote each one to delete not just one record but all records related to a specific content type. The above DELETE statement got rewritten as:

DELETE t.* FROM node_revisions t JOIN node n on (t.nid = n.nid) WHERE n.type = 'CONTENT_TYPE';

I took all of these rewritten DELETE statements, put them into a file and executed the file using the MySQL command line tool, via the command:

mysql>source delete_all_airings.sql;

Voila! I repeated this for each content type that I wanted to clean out. All told, using this method I was able to delete all those nodes in under an hour.

Obviously, I don’t plan on doing this very often. But it sure was fun!