* Not So Fast, Mr. Smarty Pants

Posted on February 20th, 2008 by Phil. Filed under Drupal, PHP, Television, cron.


OK, that last post may have been premature.

I’m still having a timeout problem with the TV schedules import routine. Before I get into the latest issue, let’s take a step back and review what we’re doing here.

As I’ve said before, we’re importing TV program and schedule data from PBS via XML. These data are generated by TV Guide and provided to us as RSS feeds, one for each of our channels, with each daily file containing schedule data two weeks into the future.

In order to support the publication of a schedule grid, daily channel schedules and program/series/episode pages, we’re using CCK to create the following custom content types:

  • TV Channel
  • TV Program
  • TV Episode - Related to one program via node reference
  • TV Airing - Related to one episode and one channel via node references

We’re coding a routine to import the program and schedule XML data and create program/episode/airing nodes. The import is done using FeedAPI, the SimplePie parser and a custom feed processor of our own construction. We were originally thinking of using the FeedAPI Node module to process the data and create the nodes but didn’t because (1) we need to create or update up to three different nodes for each line of a feed (program/episode/airing nodes) and (2) the logic to match feed content to existing nodes is, due to the nature of the PBS data, quite messy.

We’ve got the import, parsing and processing code working. However, as I mentioned in my previous post, we’re running into issues with process timeouts during the feed refresh. There’s quite a bit of processing going on for each line in a schedule feed. Given that and the fact that each feed has 400+ items and that we’re working with six schedule feeds (one for each of our channels), the nightly feed refresh process is not quick. At this point, no real application or database tuning has occurred, which is part of the problem, but I think that even after everything is tuned to the hilt for performance we’ll still have this issue.

I first ran into this last week when I tried to refresh a feed using the Refresh link on the Feed Administration page. As I said earlier, I realized that this process was restricted but the maximum execution time PHP variable, which defaulted to 30 seconds. I was able to get around this by using the set_time_limit function to restart the timer as each row in the feed was processed. That worked great for refreshes initiated on the Feed Administration screen.

However, things broke down again when I tried to have the feeds refresh via cron. I found that drupal_cron_run, the function in includes/common.inc which executes cron jobs, sets the cron timeout to 240 seconds using set_time_limit. This should still be overridden by my code’s calls to set_time_limit.

After digging into the workings of FeedAPI I found that that module’s implementation of hook_cron checks after (or before, I forget which) each row of a feed is processed to make sure that the process is not using up more than the allotted percentage of the total cron run time as specified on the FeedAPI administration screen. I’ve got that percentage set to 75%, which works out to 180 seconds. Not enough.

The bottom line here is that we’ve got an issue with refreshing our feeds nightly via cron without being restricted by these timeout conditions. Ideally, I’d like to be able to schedule each feed refresh separately, outside of the global Drupal cron job. I’d also like them to run in a multi-threaded fashion.

It seems to me that the Drupal cron functionality needs a bit of work. It’d be nice, in general, to be able to schedule cron jobs for different modules at different times and at different intervals and even give them different timeout limits. The only thing I see out there which may help is this patch for multi-threaded cron jobs. I haven’t tried it yet, but will.

If you have a brilliant idea for how to crack this nut let me know by leaving a comment. Also, if you have any hot stock tips, leave those also.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Pownce
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis



3 Responses to “Not So Fast, Mr. Smarty Pants”

  1. Kerri Says:

    Hi Phil, We are looking into upgrading our site using Drupel so thanks for the blog, I am sure I will need all the help I can get.

    Kerri
    KMUW Wichita Public Radio

  2. Larry Goldberg Says:

    Hi Phil,
    Does the TV Guide data via PBS contain the information as to whether a program has been video described? The DVS indicator appears in some program guides (Yahoo TV for one) and I assume everyone is using the same data. I expect we will surface that info on wgbh.org schedules, right?

    … Larry …

  3. Phil Says:

    Hi Larry,

    The TV Guide data does *not* currently give us the DVS flag. In fact, if you look at our schedules on tvguide.com you’ll see it’s not there either.

    We’d like (and are planning to) indicate whether a program offers DVS on the new schedules. As it stands, we’ll have manually curate that flag.

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.

    www.flickr.com
    This is a Flickr badge showing public photos and videos from WGBH.org Development Blog. Make your own badge here.

Archives:

Categories:

  • Disclaimer

  • The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.