Archive for the ‘SQL’ Category
* Drupal Vacation Week
Posted on April 18th, 2008 by Phil. Filed under CCK, CVS, Drupal, Install Profiles, MySQL, SQL, SVN, Views.
I don’t know about where you live, but this week coming up is school vacation week around here. As the father of two of bundles of joy, I’ll be taking next week off have some Q.T. with the family as we parade around northern New England, doing what families do (e.g. goofing off, yelling, getting chocolate out of the kids’ hair, etc.).
That means, sadly, that I’ll have to tear myself away from Drupal 6 for a week (really, not a problem, believe me). In preparation for the week off I’ve been busy as all get out trying to reach the milestone that we’ve dubbed the first code merge. By that I mean it’s the first chance for Pete and I to merge the work we’ve been doing on the first phase of our rebuild, the new TV Programs and Schedule module.
Up until now, we’ve working separately (a separate peace, if you will), developing on our own Drupal installs on our development server. Pete has been creating a theme to match the current WGBH.org template (look and feel changes to the site will come in phase 2). I’ve been busy getting the PBS/TV Guide XML data import running, as well as working up the basic functionality of our Programs A-Z page and the Full Day Schedules by Channel pages.
Anyway, it’s high time that we finally incorporated each other’s changes into our own working install. So, this week was devoted to working out a process, not just for a one time code merge, but for rolling out our development work to each other, as well as to a staging area and to live installs going forward.
Currently, we’ve basically got a theme and a custom module for TV schedules. Managing the code is easy; everything is managed in a local SVN repository. Core Drupal and contributed modules and themes are pulled down via CVS and checked into SVN. So the entire code tree can be easily managed across multiple developers and pushed up to staging and live environments via SVN commands.
The tricky part - scratch that - the headache inducing part - no, sorry, scratch that too - the bang-your-head-against-the-wall-in-frustration part - is managing all of the information that’s stored in MySQL. How do we track changes to database structures (DDL) and the actual content that needs to be shared across installs (DML)?
Good question. The answer? Well, there is no easy answer that I’m aware of. The first guess is do a database dump (that’s what it’s called, I swear) of all of the structures and content. In theory that file can be managed via SVN or CVS. Problem there is you usually have lots of junk in your development database that you don’t want to replicate to staging or live environments, or even other development installs. So then you have to start weeding stuff out of the dump, which gets messy fast.
What to do, then? Here’s the strategy we’re going to use, and which i spent most of the week working on:
(1) Create a custom Drupal installation profile. This is used when first setting up a Drupal site. The WGBH profile installs a bunch of modules, sets the default theme (our WGBH.org theme) as well the administration theme (currently Zen Classic, a sub-theme of Zen - nice and clean and simple).
(2) TV Schedules module installation file. This is where most of the action is. Here we do a whole of bunch things like:
- Create CCK content types (e.g. TV Channel, TV Program, etc.)
- Create nodes for certain content types (e.g. channels)
- Create Views (e.g. Programs A-Z)
- Create a number of blocks and position them on our themes
- Add menu links
- Tweak a number of variable settings
I’d like to use the module install and upgrade files to manage changes to database objects and content. The install and upgrade files can be managed in SVN and used to roll out changes to development/staging/live environments. The tricky part is extracting the relevant changes from the database. That’s where it gets messy.
For this initial code merge, I relied mainly on the CCK export/import functionality to get the content type definitions. In theory, this is pretty straightforward. I used the export function on my development install to generate the code to recreate the content types. I added the code to module install file. That should have been it for the content types.
Except that it wasn’t.
CCK (like Views) on Drupal 6 is still in an alpha state, meaning there are unfinished bits, bugs, and general unforeseen shenanigans involved in using it. I found that the export/import process was a bit lacking. It copied the basic content type and field definitions just fine. However, field groupings were lost, and certain field attributes didn’t survive the transfer.
In the end what I did was perform the export/import in the module install file to create the basic types and fields (and related database objects) and then overwrite the data in the CCK definition tables using a database dump of that content from my development database. That part was put into a SQL file which, for now, has to be manually run in MySQL after the module is loaded.
It’s a bit clunky and not ideal, but I figure that part can go away once CCK for D6 is finished.
The good news is the Views export/import process worked much more smoothly. I was able to take the export code and run it from the module install file and the views were created as expected. No (or minimal) data tweaking was necessary. Yay!
Aside from that, creating actual nodes, blocks and menu links in the install file was all pretty straightforward. All in all, though, it took the better of the week to work it all out and get the process in good enough shape to hand off the Pete for testing.
Going forward, it will continue to be a bit of work to continue this process of tracking database changes. But, for now, it’s the best way I know of to proceed. If anybody has a better suggestion, I’m all ears! Please share.
Oh yeah, one annoying issue I’ve had with Views 2 is the lack of support for argument handling. Where the heck did that go? Am I missing something obvious? I found that the views object has a view_php attribute/method which seems to function the same way (I put my D5 view handling code in there), but there is no place in Views UI that I could find to add that code to a view. I had to do it manually.
Again, if you know more about this, let me know! That little problem alone gave 5 or 6 new gray hairs.
There you have it! For now, I need a break from the code and all that cursing.
* Deleting is Fun!
Posted on February 24th, 2008 by Phil. Filed under Devel, MySQL, PHP, SQL, Tools.
I have come up with a solution for deleting large numbers of nodes that bypasses the normal node_delete PHP function, which can be quite slow. Below is a description of what I did which is not necessarily for the faint of heart. Use this method at your ow risk…
The Problem
As I described in our last post, I’ve now got a nice working version of our TV programs and schedules ingestor. In the course of development of this code we’ve previously ingested lots of TV program/episode/airing records creating thousands of nodes. Now that that code is in a somewhat complete and clean state, I wanted to delete all of the old content and start fresh, so we can proceed with the front end development.
The problem was I had well over 30,000 nodes, representing three different content types, that needed to be deleted. The usual method of deleting content in Drupal - the node_delete function - proved to be about as slow as my morning commute into Boston, and so was all but unusable in this situation.
What I needed was a way to delete all nodes of a given content type using SQL statements, rather than the standard Drupal/PHP functionality.
The Solution
Deleting records from the node table by content type is obviously pretty easy. The tricky part is that there can be all sorts of related database records in any number of tables generated by various modules that also need to be deleted. So, I needed an easy way to identify all database records for a node of a given content type.
The solution was to use the Devel module to identify everything that needs to be deleted. First, with this module installed and enabled I turned on the Collect query info and Display query log options. With these in place, each page called will display a list of the queries executed (along with number of executions and execution times) at the bottom.
Next, I created a page in Drupal that would delete a single record of the content type in question. The body of the page was a chunk of PHP code to use node_delete to delete a record, like so:
$res = db_query("SELECT n.nid FROM {node} n WHERE n.type = 'CONTENT_TYPE' LIMIT 1");
while ($n = db_fetch_object($res)) {
node_delete($n->nid);
}
In the above CONTENT_TYPE gets replaced the actual content type in question. You also need to set the input format to PHP code, so it’ll get executed, rather than just displayed.
Then, view this page in the browser. This will cause one node to be deleted and, thanks to the Devel module, all of the SQL statements executed during page generation to be listed at the bottom of the page. Among the list of SELECT statements executed by the page, you’ll also see a bunch of DELETE statements, like for example:
DELETE FROM node_revisions WHERE nid = 1234;
Once I gathered up all of these delete statements, I then rewrote each one to delete not just one record but all records related to a specific content type. The above DELETE statement got rewritten as:
DELETE t.* FROM node_revisions t JOIN node n on (t.nid = n.nid) WHERE n.type = 'CONTENT_TYPE';
I took all of these rewritten DELETE statements, put them into a file and executed the file using the MySQL command line tool, via the command:
mysql>source delete_all_airings.sql;
Voila! I repeated this for each content type that I wanted to clean out. All told, using this method I was able to delete all those nodes in under an hour.
Obviously, I don’t plan on doing this very often. But it sure was fun!
* One Problem Solved, New Problem Found
Posted on February 22nd, 2008 by Phil. Filed under Drupal, PHP, SQL, cron.
Not like I needed it, but the impending snow storm is again proving that winter stinks (to put it in family-friendly terms). I blame that over sized rat in Pennsylavania.
I’ve (temporarily) solved the timeout problem I’ve been having with importing TV schedule feeds using FeedAPI. Ideally, due to the time it takes to run them, I’d like to be able to schedule these feed imports separately from the rest of the Drupal cron jobs. The multi-threaded cron patch looked promising, but it was written for Drupal 5.1 and we’re currently building on 5.7. At this point I don’t want to get too deep into engineering some custom cron functionality. Instead, I’ll settle for a shortcut to allow my feeds to be imported nightly via the normal Drupal cron routine without timing out.
After digging a little deeper into the FeedAPI code I found that it’s implementation of hook_cron calls the function feedapi_cron_time before processing each row in a feed. That function basically checks that the total run time of the feed import hasn’t exceed the allotted percentage of total cron run time. For now, I’ve decided to simply have that function aways return true, so a timeout won’t happen.
Problem solved - for now. Hopefully I can revisit this later and find a better solution, like maybe multi-threaded cron will be more fully fleshed out and available.
Now that the entire feed import process is working I have a new problem.
I’d like to clean out the program and schedule data that I’ve accumulated in our development database and start importing data using the new code. This is actually quite a bit of data, as it includes data imported by an earlier version of the feed ingestor that we wrote last year. In fact, as of right now we have:
- 539 programs
- 4,926 episodes
- 27,731 airings
This is far too much data to delete using the clunky delete content functionality (i.e. the node_delete function) in Drupal. That functionality is also pretty damned slow, as it has to churn through lots of PHP code and functions for each node to delete.
We’ve got a custom module for the TV schedules data that implements hook_nodeapi. If an episode gets deleted, this implementation makes sure all child airings are deleted. Likewise, if a program gets deleted, all child episodes (and hence all of their airings) get deleted. This ensures we should never have orphaned airings or episodes.
Last night I tried to delete 10 programs using the standard node_delete functionality and it took at least 30 minutes.
Obviously, deleting all program and schedule data isn’t something we’ll be doing often. But I do need to do it at least once and I may need to do it again at some point during the build and I don’t want to wait several days for it to get done. So I need a faster solution for deleting large amounts of data.
I did find this delete all module, which claimed to it allowed you delete content by content type and to choose either the standard (and safest) node_delete option or a faster (and riskier) bulk delete just using SQL statements. However, when I installed it I found all it offers is the option to delete all content and only using the usual PHP functions.
Soooo, I’m about to dig in and see if I can write my own SQL statements to delete this content cleanly and quickly.
In between round of shoveling my driveway, of course…
Archives:
- February 2009
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
Categories:
- Apache
- Architecture
- Boost
- caching
- CCK
- CMS
- cron
- CVS
- database
- Date
- Devel
- Drupal
- Drupalcon
- FeedAPI
- Flickr
- Image Assist
- Images
- Install Profiles
- MacBook
- Memcache
- MySQL
- NPR
- Pathauto
- PBS
- PHP
- Preview
- Protrack
- Public Media
- search
- Social Media
- SQL
- SVN
- tags
- Television
- Testing
- theme
- TinyMCE
- Token
- Tools
- TV Guide
- Uncategorized
- Views
- WordPress
Disclaimer
- The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.










