February 12, 2008
Rebuild Phase 1: TV Schedules and Programs
Filed by Phil at 11:21 am under Television, PBS, TV Guide, Protrack
Early on in the plans for the rebuild of WGBH.org we made a strategic decision (or, as a certain well-known someone might say, a strategeric decision): to rebuild and relaunch the site in phases. The main reason was that we knew a complete overhaul of the site (i.e. new look and feel, new information architecture, new back end, etc.) was a meaty task that would take some time. However, we had one pressing problem that needed to be addressed more quickly. That problem was the publication of TV schedules and program information.
Ahh, TV schedules. Just the mere mention of them to those involved with getting them on the web site often evokes a quick intake of breath, a wince or an - in the more extreme cases - a curse word (or two). To put it mildly, our current process for publishing TV schedules and their related program information is painful. Not only painful, but very time consuming. Not only painful and time consuming but also annoying, frustrating, headache inducing, stomach churning, etc. etc.
I think you get my point.
What is that process, you ask? in a nutshell, here’s how it currently works:
(1) Through a series of complicated and mysterious processes television schedule information - as well as program and episode descriptions - for our various channels make it into a piece of software called Protrack, which is used by WGBH staff to actually manage what goes on the air. Protrackis designed for use by television programmers, engineers and technicians; the data in it is not meant for general public consumption.
(2) A home grown piece of Java code (which we call the ingestor) runs once an hour to export scheduling and program information from Protrack and import it into our current CMS. Now, bear in mind that the database schemas for Protrack and our CMS were developed independently and for different purposes. This Java code is trying to do the impossible: translate the data in Protrack into our CMS so it is ready for display to the public. There are two real problems here:
(a) The differences in the way the data are modeled in each system is very different. One program in Protrack can often have several different titles and versions (e.g. one version for regular airings, one for airings during pledge drives). For our purposes on the web, we only want one version of the program. The ingestor has to try and reconcile these differences in a programmatic, which is no easy task, due to the nature of the data. In my 5+ years here at WGBH this code has undergone at least two major revisions (i.e. reengineered from the ground up) and, due to the inconsistent and unrigorous nature of the upstream data, it still produces regular errors and needs constant babysitting. The end result is that human intervention is regularly required to clean up errors at ingest time.
(b) The bigger problem is that the data coming from Protrack is not meant for public consumption. Program titles and descriptions in that system can often contain information only meant for internal staff (e.g. “Great show for pledge!”). Or sometimes descriptions just aren’t there. So, all of the data coming in from Protrack needs to be copy edited - or just completely rewritten - by WGBH Online staff. This is very time consuming.
For these reasons we decided early on in the rebuild process that the old way of building schedule had to go ASAFP. After investigating our options (which included talking to other PBS stations and even hiring a consultant) we decided on a new method for publishing TV schedules and programs to WGBH.org. The first thing we needed was a new data source.
Luckily, PBS offers to member stations free XML feeds of TV Guide schedule data. This is a relatively new offering by PBS It’s one feed for each of our channels, providing airing and descriptive program/episode information two weeks into the future. The main advantage of these data is that the information, being curated by TV Guide, is ready for public consumption. In theory, each feed could be pulled, transformed via XSLT and displayed right on the site as is. The drawback is the data feed is updated once a day and won’t reflect last minute schedule changes (unlike the Protrack data).
We decided that the savings in editorial effort (not to mention the fact that the feeds are free to us) made this data source the one for us. However, due to the potential last minute schedule changes that wouldn’t be reflected in the data we also decided that some programming muscle would still be required to make these data work for us. So, that has led us to decide on the following new method for publishing TV schedules and program information to WGBH.org:
(1) Import the PBS XML schedules feeds into Drupal. During import create airing, episode and program nodes, from which we can produce a schedule grid, program A-Z list and program and episode description pages. Also, since we can design the database schema in Drupal around the structure of the PBS/TV Guide data, ingestor errors should be reduced.
(2) While the imported PBS data will be published by default, WGBH editorial staff will be able to create or modify schedule and program information as they see fit in our new CMS.
This new process will still involve some heavy lifting, development-wise, but should result in some significant time savings, particularly on the editorial side of things, allowing us to focus on other types of content curation and creation for WGBH.org.
This means that the rebuild of WGBH.org will take place in at least two distinct stages:
Phase 1: Replace the current engine behind TV schedules and program information with the new system. Keep the existing look and feel, site architecture and all other related content and systems.
Phase 2: Complete site redesign and rebuild, retaining the same process for TV programs and schedules. It’s likely that this phase will be divided into smaller phases itself, but that is TBD.
We are currently and actively involved in the development for Phase 1! Next time I will provide more details on the actual implementation of this phase.
Stay warm!
