It’s Go Time!

After weeks and months of working on something you can easily forget that - at some point - it actually has to get done! Well, I’m not able to say that our new TV Programs and Schedules module is completed yet, but we have reached a big milestone: principal development is done!

What does that mean? Well, it means that Pete and I have coded up everything that we know about to the specifications provided and now it’s ready to be fully tested. You could basically call it an alpha release of the front and back end code.

Exciting? That’s one word for it.

The last piece of the puzzle was a custom search module, that Pete has coded up, based on the core Drupal search module. Basically, it’s search scoped to our TV programs and episodes (that’s all the content we have in Drupal just now anyways). Down the line, as we rebuild the whole rest of WGBH.org, we also envision a scoped search for each major subsite (e.g. one for radio, one for web only content, etc.), plus some sort of global search across everything.

We’ll write up more about how search was implemented later but for now, here’s a sneak peak:

TV Programs Search

Our goal in the next few days is to get the code base installed on our test/staging servers, document how things work and then let our content producers and functional testers have at it!

Obviously, I don’t anticipate there being any bugs, issues or blemishes of any sort. But we’ll go through the charade of testing anyways, just make everybody feel good!

Imagine All The Images

Today is the first day of summer, so you know what that means: it’s officially the longest day of the year! It’s also the day to break out the “longest day of the year” jokes (i.e. “It’s the longest day of the year - other than Thanksgiving with my family,” etc.).

Thank you! You’re a great audience! I’ll be doing two shows nightly all week. Try the fish!

Back in the real world, principal development on our new TV Programs and Schedules module is almost done! Pete is busy crossing the i’s and dotting the T’s (wait, stop, reverse that) on the search module. Everything else is basically ready for full blown functional testing and tuning. Next week we’ll be getting a real staging environment set up and ready to go.

Today, class, I wanted to show you how we’re going to handle images on the new site. A site like ours will certainly make use of lots of graphics (e.g. show logos and images). We need a content producer-friendly way to upload, find and reuse images, with minimal hassle. Our current CMS makes image reuse next to impossible, not to mention the fact that all images get written to the database (shudder). That sort of thing is not what we’re looking for; we want our images in the new world to get written to the file system and we want to be able to reuse images.

First up, the requirements. Check it out, yo:

Episode Page Image

As you can see, an episode page can display up to two graphics: the series logo, and an optional image. The logo is attached to the series and will display on all of the series’ episode pages. Below that there could be an image. There may be an image related to the episode, or there could be an image attached to the series. In this case, an image differs from a logo in that the image could have (a) a caption and/or (b) a larger version that would appear in a new window if the image is clicked on. If the episode has an image, that would appear under the logo. If there is no episode image, the program image would display, if there is one. It could also be there is no logo and no images of any sort.

Now for the fun part: the implementation!

I wrote in March about our plans for handling images. Our approach hasn’t changed: we’re using the Image Assist module (which, in turn, uses the Image module) to allow content producers to easily find and reuse existing images or upload new ones. Images ultimately get stored as nodes themselves, which can be tagged, which is then used for finding images later. We did, however, make a few small tweaks.

Oh yeah, and thanks to the Image module, images get automatically resized (i.e. to create thumbnails or previews), during upload! We love that.

Here’s how it all works:

Image Upload Field

The image field of an episode is a text area. For all text areas on the node edit pages, we’ve enabled the TinyMCE WYSIWYG editor. This will allow content producers to add some simple HTML markup, like bolding and alignment and whatnot, without having to add tags! Also, image assist can be configured to work with the TinyMCE editor (which, given the unfinished state of some contributed modules for Drupal 6, took a little monkeying around). Clicking on that little camera icon in the bottom row launches…

Image Assist Popup

…the Image Assist popup window! Using this window you can search for and select an existing image or you can upload a new one, all without leaving the episode edit page. By default Image Assist lets you narrow your image search to images you uploaded or using any tags you’ve applied to images, all using one drop down. We also added the ability to search against the title of the image. Sweet!

Once an image is either selected or uploaded, you see something like this:

Set Image Properties

Here you can set the title of the image (which is used at the alt attribute of the image tag), choose which size to display on the page (the original or a smaller version) and whether or not to link to the full sized version of the image (and whether to open that in a new window). Once you specfiy all of this, the HTML to display the image gets generated and written back to the image field on the node edit page, like so:

Uploaded Image

Here is where content producers can add an optional caption. They just type it under the image; they can also choose to link the caption using the link button.

One last thing: if the image is linked to a larger version, the front end page code will automatically generate an Enlarge this image link and place it under the caption.

That’s it!

Who says building web pages has to be hard?! Not me. No sir.

Sidebar Buildin’ Blues

Like Big Brown coming down the stretch at the Belmont Stakes last week, us coders here at WGBH Online are bearing down on the finish line that is the rebuild of the TV Programs and Schedules module, well ahead of the pack… er, on second thought, maybe Big Brown at Belmont isn’t the best analogy.

In any case, deadlines are looming! We’re scheduled for a database freeze at the end of the month, by which time we need to have principal development completed and the code ready for some serious testing and tuning. At this point Pete is busy working on the search functionality and I’m tying up as many loose ends as possible.

I’d like to use today’s post to discuss one particularly funky and challenging bit of functionality that is finally working - the episode page sidebar!

Now tell me the name alone - episode page sidebar - doesn’t send a chill down your spine! It does mine, at least.

Let’s take a look-see at this little puppy to see what we’re talking about:

Episode Page Sidebar

As you can see episode page sidebars will display information related to the series or episode (e.g. related programs or events) , as well as a search box, and some promotional content (e.g. shop, support). Nothing out of the ordinary here. We do the same thing on our current episode pages.

So how would this be implemented in Drupal? Easy, you say! Each item could be managed as a block. Voila! Hardly any technical expertise at all required here.

Not so fast, partner. There are a few additional requirements.

(1) There needs to be a default set of these items, that would display on all episode pages, unless…

(2) An editor chooses to create a series-specific version of a particular sidebar item, in which case s/he needs to be able to create it, relate it the series and have that override the default item on all episode pages for that series, unless…

(3) An editor chooses to create an episode-specific sidebar item, in which case s/he needs to be able to create it, relate it to the episode and have it override any default or series-specific version of that item.

OK, maybe it’s not so simple.

Here’s what we’ve noodled up to make this all happen.

(1) Since the search box is the one sidebar item which will always be there at the top and cannot be overridden, that is indeed managed as a stand alone Drupal block and positioned at the top of the heap.

(2) All other items in the sidebar are managed using a custom (CCK) content type called a content box. Content boxes are pretty simple and have the following attributes: title (for internal use only), a body (which contains all of the content to display, including any optional title or header), and an optional parent content item (like a TV program or episode), managed as a node reference. Also, using the Scheduler module, each content box can have specific publish and expiration dates. Content boxes will be able to have images, which will be added to the body using the Image Assist module.

(3) In order to determine where a content box displays in the sidebar we defined a fixed taxonomy vocabulary (Content Box Location), which is a hierarchical set of terms (e.g. TV -> Programs -> Sidebar -> Events). The display order of the sidebar items is determined by the order of the terms within the vocabulary. The idea here is that content boxes will eventually be used on other parts of the site once it’s all on Drupal. At that point we’ll be able to define additional terms in the vocabulary for other locations on the site (e.g. Radio Home Page -> Coming Up).

(4) Default sidebar content boxes are then defined as content boxes that are assigned a TV -> Program -> Sidebar tag, are not explicitly related to a series or episode node and which have the word default in their (internal) title.

(5) Series sidebar content boxes are defined as content boxes that are assigned a TV -> Program -> Sidebar tag and are explicitly related to a series node.

(6) Episode sidebar content boxes are defined as content boxes that are assigned a TV -> Program -> Sidebar tag and are explicitly related to an episode node.

Got all that? Good, because this will be on the final.

Given that set up, here’s how we actually construct the sidebar for a given episode.

We have a block view that selects all of the default episode sidebar content boxes. In the theme file for that view display, we then call a second block display for the sidebar view that takes a parent node id as an argument. We call this first to get any sidebar views related to the parent series and then again to get any sidebar views related to the episode. All of these content boxes that we’ve gotten back are then merged according to the rules above and the final list is displayed.

Almost forgot, the Related Programs sidebar content box adds one additional layer of complexity. The contents of that box are generated using a related programs view, which finds up to three related series using another taxonomy vocubulary (TV Program Genres). It’s pretty straightforward, unless

See, easy peasy lemon squeezy, as my kids say!

Series Page, We Hardly Knew Ye

Feels like we’re already well into summer mode, here at WGBH Online. The temperature is heating up, people are starting to take long vacations (welcome back from Hawaii, Pete!) and - the surest sign that summer is almost upon us - I’m in full blown shorts-wearing mode.

I'm in shorts mode!

But don’t think the work on the WGBH.org rebuild has slowed at all. Oh no. My fingers are already sore from all the coding going on and we’re getting near the home stretch of the first phase of our rebuild.

Before I give the latest update, note the newest feature on the site over there to the right - a Flickr badge! Based on the recommendation of my friend - and a friend of WGBH - Steve Garfield, I’ve created a Flickr account to go along with this blog. After all, when you think web site development you think photos! Tell your friends!

Anyways, back to the rebuild of WGBH.org, I’ve been busy building our our episode page, which is fairly complex but is coming along nicely. So for those keeping track, once that’s complete we’ll have built a Programs A-Z page, Schedule Grid, Full Day by Channel schedules and now the episode page. That’ll really just leave a search page to be built and then we can prepare to test and launch!

Wait, but what about the series page, you say?

An excellent question, class. Somebody’s been paying attention!

Well, that’s no longer a concern since we recently decided to do away with building our own series pages. How about them apples?

In case it’s not clear what we’re talking about, a series page would, for example, be a page dedicated to an episodic program, such as Nova, Frontline, Masterpiece, etc. Such a page could have a description of the series, a list of upcoming episodes, other related series, etc. An episode page, on the other hand, would be dedicated to, well, a specific episode of a series.

We currently do publish series pages on WGBH.org, however most people never see them, as we generally try to direct people to the next upcoming episode of a series. But they are there and we do curate them. However, after initially planning to build them and support them in the new world, we decided to no longer produce them.

There were basically three reasons to forgo series pages:

(1) Series descriptions do not come through the PBS/TV Guide data feeds. The data contain episodic descriptions, but not descriptions for a series. So, if we want to publish series pages we’d have to enter these descriptions ourselves, which defeats one of the main purposes why we’re going to this new data feed in the first place.

(2) The episodic/non-episodic issue. Obviously, some programs are non-episodic, meaning they are not series. Since the schedule data that we’ll be getting does not explicitly distinguish between episodic and non-episodic programs, all programs coming into the system will be treated the same way and have at least one episode. In this case, non-episodic series in our system will just be a series with one episode. In this case, it wasn’t going to be trivial to deal with this issue. For example, for non-episodic programs, what page do we send users to? The series page or the episode page? Clearly we would have to either manually flag non-episodic series as they came in or work up some convoluted logic to properly display the information for such shows. Neither option sounded good.

(3) Why even bother? Each show presumably already has it’s own web site anyways, so why should we reinvent the wheel? Episode pages at least provide airing information specific to our channels, plus we can use them for promotion of DVD’s and whatnot or cross-promotion of other shows, so we do want to have those pages. But a series page really wouldn’t be adding much value to anybody.

So, there you have it: we punted the whole idea. This was fine with me, as it makes development a bit easier. No complaints here.

Time to put some sunblock on my pasty legs.

Love is in the Air!

As I write this, my buddy and co-developer Pete Bull is gallivanting around Hawaii, enjoying a well deserved vacation in paradise. So, until our own Don Ho gets back next week - no doubt tan, refreshed and mellowed out - I’m on my own coding up the new version of WGBH.org.

We’re making good progress with the new TV programs and schedules module. As mentioned earlier, the Programs A-Z list, schedule grid and full day by channel schedules are pretty much good to go. I’m currently up to my neck coding up the episode pages.

The big news as of late is that we’ve decided to forgo program pages (or, rather, series pages) and go only with episode pages. I’ll have more on that in an upcoming post. That, folks, is what they call a cliffhanger.

I’d actually like to take today’s post to introduce you to the newest member of the WGBH Online development squad (I prefer to think of us a “squad” more than a “team”; please note there is still no I in “squad”).

This member joined the team just about a month or so ago and has already proven to be an impact player. This member has also proven to be an invaluable cog in our machinery. I’m already having a hard time remembering what life was like before this member came on board.

So, without further adieu…

allow me to present…

the newest WGBH Online technical squad member…

my new MacBook Pro!

MacBook Pro

After years - actually, technically, decades - of working primarily on a PC (mainly out of convenience, not any religious thing), this little cupcake worked her charms on me and has caused me to switch sides!

It’s not that the MacBook is making me any more efficient; the tools aren’t all that different, in many ways. My main development tools went from Macromedia HomeSite, Firefox, and PuTTY to ActiveState Komodo, Firefox and the Mac terminal.

What it comes down to, I suppose, is that the thing is just so dang … sexy. I just want to use it! Silvery smooth, light weight, a keyboard that’s a delight to tap, it’s got all sorts of sensory goodness going on. You gotta give Steve Jobs and his squad mad props (whatever that means): they sure know about industrial design.

Anyhow, the MacBook has become my new official workstation of choice. I really like it. In fact, truth be told, I’ve become quite attached to it in a very short amount of time. Actually, I think I’m even developing feelings for it.

Ahh, heck, who are we kidding? I’m in love!

I love my MacBook Pro!

OK, time for a meeting, which means I must leave my MacBook (ever so briefly) … I’ll miss you sweetie! XOXOXOXOXO!

How Do You Spell Relief? M-O-S-H-E!

As you regular readers know, we’ve been struggling with a little memory problem here as of late. Basically, under several different circumstances, PHP would quit and tell us it had used up all of its allocated memory, which we had jacked all the way up to 512MB.

At first, I suspected it was due to some server configuration issues, since we had just built an entirely new environment from the ground up. After a while we figured out it was definitely a code issue, and possibly more than one. After examining our own code, poking around contributed module code and some trial and error, I hadn’t been able to pinpoint the problem. Things were starting to look … worrisome.

But, as they say, it’s always darkest before the dawn for the other day what to my wondering eyes should appear but a comment on this blog from Moshe Weitzman. Moshe is one of the original Drupal developers and remains one the key core and contributed module maintainers to this day. Apparently, he is also a local resident and WGBH fan.

He’s also a very nice guy.

Moshe had discovered this blog and offered up his considerable help. Right away he fixed a memory leak in the Devel module and released a new version, which solved our problem of sometimes running out of memory when invoking the theme editor.

But he wasn’t done helping us with just that tidbit. Oh me oh my no.

I gave him a rundown on our problem of running out of memory when doing our nightly TV schedule data import and he was able top quickly suggest some possible culprits. Sure enough, after some tinkering around, I found that the problem went away when I disabled the Pathauto and Token modules. I tinkered with the FeedAPI cron function to disable these modules at the start and reenable them at the end of the process. As a result, memory usage during the import of our schedule data dropped by ten-fold.

Whew!

This fix, however, did introduce one new wrinkle: we used the Pathauto module to set URL aliases for new TV program and episode nodes that are created during the nightly import. By disabling Pathauto, I then had to write my own bit of code to set these aliases during import. Not a huge deal and, really, quite a small price to pay.

The bottom line of all this is that I am now sleeping just a little bit better each night and we’ve been able to ratchet down the maximum amount of memory assigned to PHP from 512MB to 128MB.

Thanks again, Moshe!

It’s Alive!

This week brought good news and bad news. Actually, more like very good news and pretty annoying news. First, the very good news:

We have officially posted some pages for internal testing! Behold…

TV Programs A-Z (click to enlarge)

Programs A-Z

Nothing too fancy here. On our current site the A-Z list is a popup. But now it’s a full grown page! One note: the Search TV Programs form does not yet work, on this page or any other.

TV Schedule Grid (click to enlarge)

Schedule Grid

Now we’re talking! The grid is the big magilla of this project. It finally allows us to display schedule information for all of our channels at once. Basically, we’re finally catching up to the rest of the world.

As you can see, the grid displays schedule information in three hour blocks. The user can navigate forward or backward or jump to a specific block of time using the Pick a Time form at the top. There is also the calendar selector, which lets users view the schedules for a given day. Note how the calendar highlights the current day (or the day of the schedule that you’re looking at), as well as the schedule data window, the period of time for which we display schedule data which, as of now, is one week week back, two weeks forward (almost).

We still need to play around with limiting the number of characters in the program or episode title that we display on the grid. There’s always something…

Full Day TV Schedules by Channel (click to enlarge)

Full Day Schedule by Channel

As you can see, the full day schedule shares the calendar selector with the grid, and replaces the Pick a Time selector with a Pick a Channel form. Nice!

Now we can proceed with some preliminary testing, while Pete and I get to work on the program/series, episode, search and other pages.

Ok, on to the pretty annoying news. This… (click to enlarge)

Out of Memory

…is still happening.

Under a couple of different scenarios, the underlying PHP process uses up its allotted memory and then - like one of my kids - holds its breath and refuses to continue until is gets what it wants (more memory!). The above error was generated simply by trying to enable the theme developer. It can also happen during our nightly schedule data ingest, though our current allocation of 512MB is enough to prevent this, thank goodness.

So far all I’ve been able to confirm is that it’s not a server configuration issue. It seems to be a code leak and there may be more than one culprit out there. It’s starting to give me real headaches and needs to be resolved in the not too distant future.

However, for now, I refuse to let it ruin my weekend!

Code, Test, Fix, Rinse, Repeat

Been a low-key week here at WGBH Online World Headquarters, or as we call it WOW HQ.

The big event of the week: I got a haircut. All that goofy hair on my head was making it hard to think properly, so I had quite a bit of it lopped off. Things should really start picking up now.

Pete and I have been finishing up our first pass at the TV schedule pages (a multi-channel grid, full day schedules by channel, and a Programs A-Z list). Doesn’t sound like much, just three pages (basically), but there’s quite a bit involved, as you might imagine.

The plan is to get these pages in shape and then post to our test suite for a preliminary round of testing. This testing will be done by people other than Pete and I and the purpose is really just to evaluate how well the data is being imported from PBS. We haven’t worked up a formal test plan yet, but it will most likely involve comparing the WGBH schedules on PBS.org (which use the exact same TV Guide data) to the schedules we’re producing with the new code. We won’t yet be testing the ability to modify this information locally.

Needless to say, testing is just an annoying formality. The odds of there being any bugs in our code are minimal.

I’ve also been working with our new project manager Louise (Hi, Louise! I know you’re reading this…) to help fully flesh out the project tasks and schedule. That’s been a long process but I think we finally have all of the pieces laid out and scheduled.

In addition to the schedule page testing, we’ve also blocked out a longer period of time for full functional testing after we complete all of the development for this module (i.e. schedule pages and program and episode pages), as you would expect. Again, that’s testing outside to be done by others distinct from the unit testing that Pete and I do during coding. We thought about trying to break out functional testing into a more discreet chunks, but given the relatively small size of this project, it didn’t seem to make sense.

OK, I gotta go wrap up these schedule pages. Don’t forget to call mom on Sunday! I mean your mom, not mine. I’ll handle her.

Not So Fast, Mr. Smarty Pants

OK, that last post may have been premature.

I’m still having a timeout problem with the TV schedules import routine. Before I get into the latest issue, let’s take a step back and review what we’re doing here.

As I’ve said before, we’re importing TV program and schedule data from PBS via XML. These data are generated by TV Guide and provided to us as RSS feeds, one for each of our channels, with each daily file containing schedule data two weeks into the future.

In order to support the publication of a schedule grid, daily channel schedules and program/series/episode pages, we’re using CCK to create the following custom content types:

  • TV Channel
  • TV Program
  • TV Episode - Related to one program via node reference
  • TV Airing - Related to one episode and one channel via node references

We’re coding a routine to import the program and schedule XML data and create program/episode/airing nodes. The import is done using FeedAPI, the SimplePie parser and a custom feed processor of our own construction. We were originally thinking of using the FeedAPI Node module to process the data and create the nodes but didn’t because (1) we need to create or update up to three different nodes for each line of a feed (program/episode/airing nodes) and (2) the logic to match feed content to existing nodes is, due to the nature of the PBS data, quite messy.

We’ve got the import, parsing and processing code working. However, as I mentioned in my previous post, we’re running into issues with process timeouts during the feed refresh. There’s quite a bit of processing going on for each line in a schedule feed. Given that and the fact that each feed has 400+ items and that we’re working with six schedule feeds (one for each of our channels), the nightly feed refresh process is not quick. At this point, no real application or database tuning has occurred, which is part of the problem, but I think that even after everything is tuned to the hilt for performance we’ll still have this issue.

I first ran into this last week when I tried to refresh a feed using the Refresh link on the Feed Administration page. As I said earlier, I realized that this process was restricted but the maximum execution time PHP variable, which defaulted to 30 seconds. I was able to get around this by using the set_time_limit function to restart the timer as each row in the feed was processed. That worked great for refreshes initiated on the Feed Administration screen.

However, things broke down again when I tried to have the feeds refresh via cron. I found that drupal_cron_run, the function in includes/common.inc which executes cron jobs, sets the cron timeout to 240 seconds using set_time_limit. This should still be overridden by my code’s calls to set_time_limit.

After digging into the workings of FeedAPI I found that that module’s implementation of hook_cron checks after (or before, I forget which) each row of a feed is processed to make sure that the process is not using up more than the allotted percentage of the total cron run time as specified on the FeedAPI administration screen. I’ve got that percentage set to 75%, which works out to 180 seconds. Not enough.

The bottom line here is that we’ve got an issue with refreshing our feeds nightly via cron without being restricted by these timeout conditions. Ideally, I’d like to be able to schedule each feed refresh separately, outside of the global Drupal cron job. I’d also like them to run in a multi-threaded fashion.

It seems to me that the Drupal cron functionality needs a bit of work. It’d be nice, in general, to be able to schedule cron jobs for different modules at different times and at different intervals and even give them different timeout limits. The only thing I see out there which may help is this patch for multi-threaded cron jobs. I haven’t tried it yet, but will.

If you have a brilliant idea for how to crack this nut let me know by leaving a comment. Also, if you have any hot stock tips, leave those also.

set_time_limit: Learn It, Know It, Live It

I’ve got the basic code for importing our TV programs and schedule data working. In a nutshell, PBS provides us an XML feed of schedule data provided by TV Guide for each of our channels. Using FeedAPI and a custom module we’re importing these data and creating various nodes. It basically works as intended! Gotta like that.

More later on the exact implementation details, but I was running into one problem: each time I tried to ingest a feed I was running into the following errors:

Fatal error: Maximum execution time of 30 seconds exceeded in… blah blah blah

Based on my years of application development experience and a very keen gut instinct I quickly surmised that a fatal error is bad. I then dug in to see what was doing here.

As the error message said, the code was timing out; it was taking longer than the maximum execution time as defined by the PHP setting variable max_execution_time. One possible solution here is to increase this value in the setting.php file. For example, we could double it to 60 seconds via:

ini_set('max_execution_time', '60');

This would increase the maximum execution time for all Drupal processes. Rather than do that, I chose option B, which involves the PHP function set_time_limit. You can call this function in a PHP script and it will restart the timeout counter, effectively increasing the maximum execution time on the fly.

So, I added the following call to a routine in the feed processing code, which gets called each time a record in the feed:

set_time_limit(30);

Voila! Problem solved. Time for a beer.

Next Page »