Series Page, We Hardly Knew Ye

Feels like we’re already well into summer mode, here at WGBH Online. The temperature is heating up, people are starting to take long vacations (welcome back from Hawaii, Pete!) and - the surest sign that summer is almost upon us - I’m in full blown shorts-wearing mode.

I'm in shorts mode!

But don’t think the work on the WGBH.org rebuild has slowed at all. Oh no. My fingers are already sore from all the coding going on and we’re getting near the home stretch of the first phase of our rebuild.

Before I give the latest update, note the newest feature on the site over there to the right - a Flickr badge! Based on the recommendation of my friend - and a friend of WGBH - Steve Garfield, I’ve created a Flickr account to go along with this blog. After all, when you think web site development you think photos! Tell your friends!

Anyways, back to the rebuild of WGBH.org, I’ve been busy building our our episode page, which is fairly complex but is coming along nicely. So for those keeping track, once that’s complete we’ll have built a Programs A-Z page, Schedule Grid, Full Day by Channel schedules and now the episode page. That’ll really just leave a search page to be built and then we can prepare to test and launch!

Wait, but what about the series page, you say?

An excellent question, class. Somebody’s been paying attention!

Well, that’s no longer a concern since we recently decided to do away with building our own series pages. How about them apples?

In case it’s not clear what we’re talking about, a series page would, for example, be a page dedicated to an episodic program, such as Nova, Frontline, Masterpiece, etc. Such a page could have a description of the series, a list of upcoming episodes, other related series, etc. An episode page, on the other hand, would be dedicated to, well, a specific episode of a series.

We currently do publish series pages on WGBH.org, however most people never see them, as we generally try to direct people to the next upcoming episode of a series. But they are there and we do curate them. However, after initially planning to build them and support them in the new world, we decided to no longer produce them.

There were basically three reasons to forgo series pages:

(1) Series descriptions do not come through the PBS/TV Guide data feeds. The data contain episodic descriptions, but not descriptions for a series. So, if we want to publish series pages we’d have to enter these descriptions ourselves, which defeats one of the main purposes why we’re going to this new data feed in the first place.

(2) The episodic/non-episodic issue. Obviously, some programs are non-episodic, meaning they are not series. Since the schedule data that we’ll be getting does not explicitly distinguish between episodic and non-episodic programs, all programs coming into the system will be treated the same way and have at least one episode. In this case, non-episodic series in our system will just be a series with one episode. In this case, it wasn’t going to be trivial to deal with this issue. For example, for non-episodic programs, what page do we send users to? The series page or the episode page? Clearly we would have to either manually flag non-episodic series as they came in or work up some convoluted logic to properly display the information for such shows. Neither option sounded good.

(3) Why even bother? Each show presumably already has it’s own web site anyways, so why should we reinvent the wheel? Episode pages at least provide airing information specific to our channels, plus we can use them for promotion of DVD’s and whatnot or cross-promotion of other shows, so we do want to have those pages. But a series page really wouldn’t be adding much value to anybody.

So, there you have it: we punted the whole idea. This was fine with me, as it makes development a bit easier. No complaints here.

Time to put some sunblock on my pasty legs.

set_time_limit: Learn It, Know It, Live It

I’ve got the basic code for importing our TV programs and schedule data working. In a nutshell, PBS provides us an XML feed of schedule data provided by TV Guide for each of our channels. Using FeedAPI and a custom module we’re importing these data and creating various nodes. It basically works as intended! Gotta like that.

More later on the exact implementation details, but I was running into one problem: each time I tried to ingest a feed I was running into the following errors:

Fatal error: Maximum execution time of 30 seconds exceeded in… blah blah blah

Based on my years of application development experience and a very keen gut instinct I quickly surmised that a fatal error is bad. I then dug in to see what was doing here.

As the error message said, the code was timing out; it was taking longer than the maximum execution time as defined by the PHP setting variable max_execution_time. One possible solution here is to increase this value in the setting.php file. For example, we could double it to 60 seconds via:

ini_set('max_execution_time', '60');

This would increase the maximum execution time for all Drupal processes. Rather than do that, I chose option B, which involves the PHP function set_time_limit. You can call this function in a PHP script and it will restart the timeout counter, effectively increasing the maximum execution time on the fly.

So, I added the following call to a routine in the feed processing code, which gets called each time a record in the feed:

set_time_limit(30);

Voila! Problem solved. Time for a beer.

Rebuild Phase 1: TV Schedules and Programs

Early on in the plans for the rebuild of WGBH.org we made a strategic decision (or, as a certain well-known someone might say, a strategeric decision): to rebuild and relaunch the site in phases. The main reason was that we knew a complete overhaul of the site (i.e. new look and feel, new information architecture, new back end, etc.) was a meaty task that would take some time. However, we had one pressing problem that needed to be addressed more quickly. That problem was the publication of TV schedules and program information.

Ahh, TV schedules. Just the mere mention of them to those involved with getting them on the web site often evokes a quick intake of breath, a wince or an - in the more extreme cases - a curse word (or two). To put it mildly, our current process for publishing TV schedules and their related program information is painful. Not only painful, but very time consuming. Not only painful and time consuming but also annoying, frustrating, headache inducing, stomach churning, etc. etc.

I think you get my point.

What is that process, you ask? in a nutshell, here’s how it currently works:

(1) Through a series of complicated and mysterious processes television schedule information - as well as program and episode descriptions - for our various channels make it into a piece of software called Protrack, which is used by WGBH staff to actually manage what goes on the air. Protrackis designed for use by television programmers, engineers and technicians; the data in it is not meant for general public consumption.

(2) A home grown piece of Java code (which we call the ingestor) runs once an hour to export scheduling and program information from Protrack and import it into our current CMS. Now, bear in mind that the database schemas for Protrack and our CMS were developed independently and for different purposes. This Java code is trying to do the impossible: translate the data in Protrack into our CMS so it is ready for display to the public. There are two real problems here:

(a) The differences in the way the data are modeled in each system is very different. One program in Protrack can often have several different titles and versions (e.g. one version for regular airings, one for airings during pledge drives). For our purposes on the web, we only want one version of the program. The ingestor has to try and reconcile these differences in a programmatic, which is no easy task, due to the nature of the data. In my 5+ years here at WGBH this code has undergone at least two major revisions (i.e. reengineered from the ground up) and, due to the inconsistent and unrigorous nature of the upstream data, it still produces regular errors and needs constant babysitting. The end result is that human intervention is regularly required to clean up errors at ingest time.

(b) The bigger problem is that the data coming from Protrack is not meant for public consumption. Program titles and descriptions in that system can often contain information only meant for internal staff (e.g. “Great show for pledge!”). Or sometimes descriptions just aren’t there. So, all of the data coming in from Protrack needs to be copy edited - or just completely rewritten - by WGBH Online staff. This is very time consuming.

For these reasons we decided early on in the rebuild process that the old way of building schedule had to go ASAFP. After investigating our options (which included talking to other PBS stations and even hiring a consultant) we decided on a new method for publishing TV schedules and programs to WGBH.org. The first thing we needed was a new data source.

Luckily, PBS offers to member stations free XML feeds of TV Guide schedule data. This is a relatively new offering by PBS It’s one feed for each of our channels, providing airing and descriptive program/episode information two weeks into the future. The main advantage of these data is that the information, being curated by TV Guide, is ready for public consumption. In theory, each feed could be pulled, transformed via XSLT and displayed right on the site as is. The drawback is the data feed is updated once a day and won’t reflect last minute schedule changes (unlike the Protrack data).

We decided that the savings in editorial effort (not to mention the fact that the feeds are free to us) made this data source the one for us. However, due to the potential last minute schedule changes that wouldn’t be reflected in the data we also decided that some programming muscle would still be required to make these data work for us. So, that has led us to decide on the following new method for publishing TV schedules and program information to WGBH.org:

(1) Import the PBS XML schedules feeds into Drupal. During import create airing, episode and program nodes, from which we can produce a schedule grid, program A-Z list and program and episode description pages. Also, since we can design the database schema in Drupal around the structure of the PBS/TV Guide data, ingestor errors should be reduced.

(2) While the imported PBS data will be published by default, WGBH editorial staff will be able to create or modify schedule and program information as they see fit in our new CMS.

This new process will still involve some heavy lifting, development-wise, but should result in some significant time savings, particularly on the editorial side of things, allowing us to focus on other types of content curation and creation for WGBH.org.

This means that the rebuild of WGBH.org will take place in at least two distinct stages:

Phase 1: Replace the current engine behind TV schedules and program information with the new system. Keep the existing look and feel, site architecture and all other related content and systems.

Phase 2: Complete site redesign and rebuild, retaining the same process for TV programs and schedules. It’s likely that this phase will be divided into smaller phases itself, but that is TBD.

We are currently and actively involved in the development for Phase 1! Next time I will provide more details on the actual implementation of this phase.

Stay warm!