How Do You Spell Relief? M-O-S-H-E!

As you regular readers know, we’ve been struggling with a little memory problem here as of late. Basically, under several different circumstances, PHP would quit and tell us it had used up all of its allocated memory, which we had jacked all the way up to 512MB.

At first, I suspected it was due to some server configuration issues, since we had just built an entirely new environment from the ground up. After a while we figured out it was definitely a code issue, and possibly more than one. After examining our own code, poking around contributed module code and some trial and error, I hadn’t been able to pinpoint the problem. Things were starting to look … worrisome.

But, as they say, it’s always darkest before the dawn for the other day what to my wondering eyes should appear but a comment on this blog from Moshe Weitzman. Moshe is one of the original Drupal developers and remains one the key core and contributed module maintainers to this day. Apparently, he is also a local resident and WGBH fan.

He’s also a very nice guy.

Moshe had discovered this blog and offered up his considerable help. Right away he fixed a memory leak in the Devel module and released a new version, which solved our problem of sometimes running out of memory when invoking the theme editor.

But he wasn’t done helping us with just that tidbit. Oh me oh my no.

I gave him a rundown on our problem of running out of memory when doing our nightly TV schedule data import and he was able top quickly suggest some possible culprits. Sure enough, after some tinkering around, I found that the problem went away when I disabled the Pathauto and Token modules. I tinkered with the FeedAPI cron function to disable these modules at the start and reenable them at the end of the process. As a result, memory usage during the import of our schedule data dropped by ten-fold.

Whew!

This fix, however, did introduce one new wrinkle: we used the Pathauto module to set URL aliases for new TV program and episode nodes that are created during the nightly import. By disabling Pathauto, I then had to write my own bit of code to set these aliases during import. Not a huge deal and, really, quite a small price to pay.

The bottom line of all this is that I am now sleeping just a little bit better each night and we’ve been able to ratchet down the maximum amount of memory assigned to PHP from 512MB to 128MB.

Thanks again, Moshe!

Deleting is Fun!

I have come up with a solution for deleting large numbers of nodes that bypasses the normal node_delete PHP function, which can be quite slow. Below is a description of what I did which is not necessarily for the faint of heart. Use this method at your ow risk…

The Problem

As I described in our last post, I’ve now got a nice working version of our TV programs and schedules ingestor. In the course of development of this code we’ve previously ingested lots of TV program/episode/airing records creating thousands of nodes. Now that that code is in a somewhat complete and clean state, I wanted to delete all of the old content and start fresh, so we can proceed with the front end development.

The problem was I had well over 30,000 nodes, representing three different content types, that needed to be deleted. The usual method of deleting content in Drupal - the node_delete function - proved to be about as slow as my morning commute into Boston, and so was all but unusable in this situation.

What I needed was a way to delete all nodes of a given content type using SQL statements, rather than the standard Drupal/PHP functionality.

The Solution

Deleting records from the node table by content type is obviously pretty easy. The tricky part is that there can be all sorts of related database records in any number of tables generated by various modules that also need to be deleted. So, I needed an easy way to identify all database records for a node of a given content type.

The solution was to use the Devel module to identify everything that needs to be deleted. First, with this module installed and enabled I turned on the Collect query info and Display query log options. With these in place, each page called will display a list of the queries executed (along with number of executions and execution times) at the bottom.

Next, I created a page in Drupal that would delete a single record of the content type in question. The body of the page was a chunk of PHP code to use node_delete to delete a record, like so:

$res = db_query("SELECT n.nid FROM {node} n WHERE n.type = 'CONTENT_TYPE' LIMIT 1");
while ($n = db_fetch_object($res)) {
node_delete($n->nid);
}

In the above CONTENT_TYPE gets replaced the actual content type in question. You also need to set the input format to PHP code, so it’ll get executed, rather than just displayed.

Then, view this page in the browser. This will cause one node to be deleted and, thanks to the Devel module, all of the SQL statements executed during page generation to be listed at the bottom of the page. Among the list of SELECT statements executed by the page, you’ll also see a bunch of DELETE statements, like for example:

DELETE FROM node_revisions WHERE nid = 1234;

Once I gathered up all of these delete statements, I then rewrote each one to delete not just one record but all records related to a specific content type. The above DELETE statement got rewritten as:

DELETE t.* FROM node_revisions t JOIN node n on (t.nid = n.nid) WHERE n.type = 'CONTENT_TYPE';

I took all of these rewritten DELETE statements, put them into a file and executed the file using the MySQL command line tool, via the command:

mysql>source delete_all_airings.sql;

Voila! I repeated this for each content type that I wanted to clean out. All told, using this method I was able to delete all those nodes in under an hour.

Obviously, I don’t plan on doing this very often. But it sure was fun!