Archive for the ‘PBS’ Category
* All Systems Go!!
Posted on August 1st, 2008 by Phil. Filed under Boost, Drupal, Memcache, MySQL, PBS, TV Guide.
As they say at NASA we here at WGBH Online are in official launch mode! I can almost count on both hands the number of days until we light the candle under our new TV Programs and Schedules. The clock is ticking and we’re very busy trying to make sure this rocket won’t explode on the launch pad.
Our new production system is up (well, mostly) and ready for our content producers to begin entering content in preparation for launch. This mainly involves adding information to TV programs that we don’t get through our PBS/TV Guide data feed (series descriptions, photos, related sidebar items, etc.).
The only piece of the puzzle that we’ll be unable to have in place in time for launch is memcache integration. As of now, there isn’t a version of the module yet for Drupal 6. I asked Robert Douglass - one of the memcache module maintainers and somebody we’ve worked with in the past and a good guy - about the D6 version of memcache and he said it’s coming soon. In the meantime, we’re preparing to launch with traditional Drupal database caching.
What could wrong?
So the plan here is to get the content producers producing, well, content on the new system next week while Pete and I and our beloved buddies in the IT department work on system and application benchmarking and tuning. We’re having a MySQL consultant come in next week to help tune that end of things. I’ve been running benchmarking tests using Apache’s ab utility, and Pete has been tinkering with getting the Boost module (a file-based caching mechanism) running on D6. I was aware of Boost before but we hadn’t been planning on using it at launch, but now that memcache is on hold we’re giving it a go.
Again, really, what could wrong?
Oh yeah, plus I have a bunch of work to do to integrate the new TV schedules with the rest of WGBH.org which is not yet being ported to Drupal (like, um, the home page).
The biggest news of this week, however and by far, is that we had a visit by a very special guest - Drupal core developer (and local resident) Moshe Weitzman! You know somebody is important in the Drupal world when they have a URL like drupal.org/NAME.
We made the connection with Moshe a few weeks back and he’s been kind enough to offer some Drupal help and advice that’s already saved us a lot of headaches. So, we wanted to meet him in person to say thanks.
Moshe came by and we gave him a tour of the new WGBH facilities and then he, Pete and i went across the street and had a nice lunch. Moshe is a very nice guy and we had a great time meeting him and talking Drupal (amongst other things). In fact, Moshe was the one who encouraged us to give Boost a try. Thanks Moshe!
There you have it! Now, if you’ll excuse me, it’s time for me to get back to mission control.
* Rebuild Phase 1: TV Schedules and Programs
Posted on February 12th, 2008 by Phil. Filed under PBS, Protrack, TV Guide, Television.
Early on in the plans for the rebuild of WGBH.org we made a strategic decision (or, as a certain well-known someone might say, a strategeric decision): to rebuild and relaunch the site in phases. The main reason was that we knew a complete overhaul of the site (i.e. new look and feel, new information architecture, new back end, etc.) was a meaty task that would take some time. However, we had one pressing problem that needed to be addressed more quickly. That problem was the publication of TV schedules and program information.
Ahh, TV schedules. Just the mere mention of them to those involved with getting them on the web site often evokes a quick intake of breath, a wince or an - in the more extreme cases - a curse word (or two). To put it mildly, our current process for publishing TV schedules and their related program information is painful. Not only painful, but very time consuming. Not only painful and time consuming but also annoying, frustrating, headache inducing, stomach churning, etc. etc.
I think you get my point.
What is that process, you ask? in a nutshell, here’s how it currently works:
(1) Through a series of complicated and mysterious processes television schedule information - as well as program and episode descriptions - for our various channels make it into a piece of software called Protrack, which is used by WGBH staff to actually manage what goes on the air. Protrackis designed for use by television programmers, engineers and technicians; the data in it is not meant for general public consumption.
(2) A home grown piece of Java code (which we call the ingestor) runs once an hour to export scheduling and program information from Protrack and import it into our current CMS. Now, bear in mind that the database schemas for Protrack and our CMS were developed independently and for different purposes. This Java code is trying to do the impossible: translate the data in Protrack into our CMS so it is ready for display to the public. There are two real problems here:
(a) The differences in the way the data are modeled in each system is very different. One program in Protrack can often have several different titles and versions (e.g. one version for regular airings, one for airings during pledge drives). For our purposes on the web, we only want one version of the program. The ingestor has to try and reconcile these differences in a programmatic, which is no easy task, due to the nature of the data. In my 5+ years here at WGBH this code has undergone at least two major revisions (i.e. reengineered from the ground up) and, due to the inconsistent and unrigorous nature of the upstream data, it still produces regular errors and needs constant babysitting. The end result is that human intervention is regularly required to clean up errors at ingest time.
(b) The bigger problem is that the data coming from Protrack is not meant for public consumption. Program titles and descriptions in that system can often contain information only meant for internal staff (e.g. “Great show for pledge!”). Or sometimes descriptions just aren’t there. So, all of the data coming in from Protrack needs to be copy edited - or just completely rewritten - by WGBH Online staff. This is very time consuming.
For these reasons we decided early on in the rebuild process that the old way of building schedule had to go ASAFP. After investigating our options (which included talking to other PBS stations and even hiring a consultant) we decided on a new method for publishing TV schedules and programs to WGBH.org. The first thing we needed was a new data source.
Luckily, PBS offers to member stations free XML feeds of TV Guide schedule data. This is a relatively new offering by PBS It’s one feed for each of our channels, providing airing and descriptive program/episode information two weeks into the future. The main advantage of these data is that the information, being curated by TV Guide, is ready for public consumption. In theory, each feed could be pulled, transformed via XSLT and displayed right on the site as is. The drawback is the data feed is updated once a day and won’t reflect last minute schedule changes (unlike the Protrack data).
We decided that the savings in editorial effort (not to mention the fact that the feeds are free to us) made this data source the one for us. However, due to the potential last minute schedule changes that wouldn’t be reflected in the data we also decided that some programming muscle would still be required to make these data work for us. So, that has led us to decide on the following new method for publishing TV schedules and program information to WGBH.org:
(1) Import the PBS XML schedules feeds into Drupal. During import create airing, episode and program nodes, from which we can produce a schedule grid, program A-Z list and program and episode description pages. Also, since we can design the database schema in Drupal around the structure of the PBS/TV Guide data, ingestor errors should be reduced.
(2) While the imported PBS data will be published by default, WGBH editorial staff will be able to create or modify schedule and program information as they see fit in our new CMS.
This new process will still involve some heavy lifting, development-wise, but should result in some significant time savings, particularly on the editorial side of things, allowing us to focus on other types of content curation and creation for WGBH.org.
This means that the rebuild of WGBH.org will take place in at least two distinct stages:
Phase 1: Replace the current engine behind TV schedules and program information with the new system. Keep the existing look and feel, site architecture and all other related content and systems.
Phase 2: Complete site redesign and rebuild, retaining the same process for TV programs and schedules. It’s likely that this phase will be divided into smaller phases itself, but that is TBD.
We are currently and actively involved in the development for Phase 1! Next time I will provide more details on the actual implementation of this phase.
Stay warm!
Archives:
- February 2009
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
Categories:
- Apache
- Architecture
- Boost
- caching
- CCK
- CMS
- cron
- CVS
- database
- Date
- Devel
- Drupal
- Drupalcon
- FeedAPI
- Flickr
- Image Assist
- Images
- Install Profiles
- MacBook
- Memcache
- MySQL
- NPR
- Pathauto
- PBS
- PHP
- Preview
- Protrack
- Public Media
- search
- Social Media
- SQL
- SVN
- tags
- Television
- Testing
- theme
- TinyMCE
- Token
- Tools
- TV Guide
- Uncategorized
- Views
- WordPress
Disclaimer
- The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.











