Archive for August, 2008
* Keep on Searchin’
Posted on August 22nd, 2008 by Pete. Filed under Drupal, Television, search.
Visitors to WGBH.org often come to the site seeking out information about a specific TV program. Maybe they enjoy Frontline, and they want to find upcoming episodes. Or they caught the last 5 minutes of a show about hot dogs, but they can’t remember the name of the program and they want to know if will be airing again.
So one of the requirements of the TV Programs and Schedules was that we implement a scoped search – an advanced search page that would only return a list of TV episodes in the results.

In the past, I’ve used some interesting modules that modify or expand upon the core Drupal search. The Views Fast Search module offers the flexibility to define the content you perform a search on, which really enhances the searching capabilities. Unfortunately, the module is only available for Drupal 5 (although parts of Views Fast Search made it into Drupal 6). Drupal’s built-in advanced search form is also capable of limiting a search query to specific content types — there are several different approaches to achieving this. And the Restricted Search module allows administrators to exclude content types from the search index entirely.
But simply blocking other content types from the query won’t quite cut it for several reasons, and excluding content from the search index would not be a good long-term solution, because eventually we will need to make use of the full site search in addition to this scoped search. Also, just to spice things up a bit, the additional criteria for the TV search specified that:
- • The search should only return TV Episode nodes. The Airing and Program nodes do not show up in the search results, although they do play a factor in the indexing of the Episode nodes and the ordering of the results.
- • Search results should include the program and episode title, a brief description, a link to the episode page on WGBH.org, and a link to the program web site, if there is one.
- • In ordering the search results, keyword relevance is the most import factor, but upcoming airings are a close second. For example, a search for “Curious George” would yield a long list of episodes for that program, but the episode that is airing this afternoon would be at the top of the list, followed by the episode airing tomorrow morning, and so on.
The real heavy lifting of Drupal’s search mechanism can be broken down into two areas: the indexing of the nodes (hook_update_index()) and search query (hook_search()). Both of which involve some code that quickly made my head hurt. But as luck would have it, I pulled out our copy of Pro Drupal Development and discovered a whole chapter dedicated to search. That, combined with Robert Douglass’ very detailed blog post, Drupal Search: How indexing works, worked wonders like a big bottle of ibuprofen.
Indexing
When cron runs, Drupal will index any new nodes, and reindex nodes that have changed since the last run. The title and body of a node, with all HTML tags intact, are parsed — Drupal uses the HTML tags to give additional weight to words. Text in an H1 tag must be important, so those words would carry a very high score, while linked text would carry a lower score (although higher than plain text). Words that are bolded, italicized, or underlined also get a small boost.
This is why a node with “Nova” in the title scores higher than a node with “bossa-nova” in the description, when the search term is “Nova”.
Overriding the Index
For our purposes, when we index an Episode node, we also want to include the title and description of the parent Program in that index. It is entirely possible that an episode of Nova, for example, might not even mention the word “Nova” in the title or description, so we must include the Program title and description.
To achieve this we use hook_update_index() to loop through any new Episodes. We load both the Episode node and the parent Program node, and then build a string with both the Program and Episode titles in H1 tags, and append the body of each node with all HTML tags intact. That string is then passed off to search_index() where each term is counted, scored, and added to the index.
Search Query: Ordering the Results
As the requirements specified, the results of the search query should be weighted with keyword relevance and upcoming airing date being the primary factors in determining the order.
Keyword relevance, of course, is a standard part of the Drupal search ranking mechanism, but to affect the score based on upcoming airings, we construct an additional ranking query. That query, which returns the difference of the upcoming airing timestamp and the end of the data window (or 1 if there are no upcoming airings), is passed to Drupal’s do_search() function. An array of node IDs is returned and passed off to the theme level.
One very nice thing about Drupal’s search is that this custom search was developed without impacting the existing full site search capability. No core code needed to be touched, and in the future we can add scoped search to other areas (like Radio) by replicating several functions and adding a few case statements.
* The Eagle Has Landed!
Posted on August 15th, 2008 by Phil. Filed under Drupal, Television.
I don’t want to belabor the whole space travel analogy, but, what the heck - life is short! Let’s belabor-away!
Earlier this week we took one small step for TV schedules and one giant leap for WGBH.org by officially launching the new Drupal-based TV Programs and Schedules section of WGBH.org!
Aside from finding a significant bug literally 10 minutes before launch, it has all gone quite smoothly! Drupal is behaving like a champ and everything is humming along.
In fact, for a worry-wart like me, you could almost say it’s going too well.
So, now I’ve jinxed everything. Oh well. Like I said, life is short.
The list of people to thank for making this all happen so smoothly is long. At the top of the list is my friend and co-developer Pete Bull who did a great job from day one and has saved my bacon on a bunch of occasions already (including tracking down and fixing that aforementioned last minute pre-launch bug). Our good buddies in the IT department, especially, did a ton of work to get a whole new development, testing and production infrastructure in place for us, so a big thanks to Peter M., Bruce D., Sarah, Larissa and all those folks. You guys and gals rock.
Also, our project manager Louise, designer Tyler, WGBH Online Director Darleen and all of our patient content producers were great to work with under the sometimes-trying circumstances. Finally, but not leastly, our former director Bruce K. and my old buddy and our former designer Peter L. (note: WGBH apparently has a rule about employing a large number of guys named “Peter”) played big roles early on in the process.
There were also any number of outside folks who also helped out at different stages, including Drupal-gods Robert Douglass and Moshe Weitzman and the fine folks at Lullabot. Of course, there’s also all the faceless folks in the Drupal community who contribute modules, create patches and document all sort of helpful things. We love open source!
Whew! See, lots of people have contributed here.
Now, of course, the real fun begins: a full blown overhaul of all of WGBH.org, including new information architecture, look and feel and, of course, a complete port of, well, everything to Drupal.
Should be no sweat!
Sadly, though, unlike for the Apollo astronauts, there will be no ticker tape parade. Maybe next time…!
* Almost Time To Start The Eggs!
Posted on August 8th, 2008 by Phil. Filed under Boost, Drupal, Television, caching.
Hours before they were launched into the space, and few days before two of them became the first men to walk on the moon, the crew of Apollo 11 sat down for a hearty breakfast of eggs, toast, coffee and - of course - Tang!

Well we here at WGBH Online are getting ready for our own launch in just a few days. We’re not quite ready for the pre-launch breakfast, but we should probably go shopping for all the ingredients real soon.
A whole bunch of people are working feverishly right now to dot all the i’s and cross all the t’s on our new TV Programs and Schedules module. Everything is going smoothly - knock on wood - and I think we may pull this off as planned. Suffice it to say, lots of coffee is being consumed.
The biggest development of late is that Pete has worked wonders to get the Boost module working for us on Drupal 6. Initial benchmarking tests indicate that Boost has increased our throughput by about ten-fold over the traditional Drupal database caching.
Boy, we love that.
Anyhow, keep your eyes peeled here for the big launch announcement - and get ready to start the eggs!
* All Systems Go!!
Posted on August 1st, 2008 by Phil. Filed under Boost, Drupal, Memcache, MySQL, PBS, TV Guide.
As they say at NASA we here at WGBH Online are in official launch mode! I can almost count on both hands the number of days until we light the candle under our new TV Programs and Schedules. The clock is ticking and we’re very busy trying to make sure this rocket won’t explode on the launch pad.
Our new production system is up (well, mostly) and ready for our content producers to begin entering content in preparation for launch. This mainly involves adding information to TV programs that we don’t get through our PBS/TV Guide data feed (series descriptions, photos, related sidebar items, etc.).
The only piece of the puzzle that we’ll be unable to have in place in time for launch is memcache integration. As of now, there isn’t a version of the module yet for Drupal 6. I asked Robert Douglass - one of the memcache module maintainers and somebody we’ve worked with in the past and a good guy - about the D6 version of memcache and he said it’s coming soon. In the meantime, we’re preparing to launch with traditional Drupal database caching.
What could wrong?
So the plan here is to get the content producers producing, well, content on the new system next week while Pete and I and our beloved buddies in the IT department work on system and application benchmarking and tuning. We’re having a MySQL consultant come in next week to help tune that end of things. I’ve been running benchmarking tests using Apache’s ab utility, and Pete has been tinkering with getting the Boost module (a file-based caching mechanism) running on D6. I was aware of Boost before but we hadn’t been planning on using it at launch, but now that memcache is on hold we’re giving it a go.
Again, really, what could wrong?
Oh yeah, plus I have a bunch of work to do to integrate the new TV schedules with the rest of WGBH.org which is not yet being ported to Drupal (like, um, the home page).
The biggest news of this week, however and by far, is that we had a visit by a very special guest - Drupal core developer (and local resident) Moshe Weitzman! You know somebody is important in the Drupal world when they have a URL like drupal.org/NAME.
We made the connection with Moshe a few weeks back and he’s been kind enough to offer some Drupal help and advice that’s already saved us a lot of headaches. So, we wanted to meet him in person to say thanks.
Moshe came by and we gave him a tour of the new WGBH facilities and then he, Pete and i went across the street and had a nice lunch. Moshe is a very nice guy and we had a great time meeting him and talking Drupal (amongst other things). In fact, Moshe was the one who encouraged us to give Boost a try. Thanks Moshe!
There you have it! Now, if you’ll excuse me, it’s time for me to get back to mission control.
Archives:
- February 2009
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
Categories:
- Apache
- Architecture
- Boost
- caching
- CCK
- CMS
- cron
- CVS
- database
- Date
- Devel
- Drupal
- Drupalcon
- FeedAPI
- Flickr
- Image Assist
- Images
- Install Profiles
- MacBook
- Memcache
- MySQL
- NPR
- Pathauto
- PBS
- PHP
- Preview
- Protrack
- Public Media
- search
- Social Media
- SQL
- SVN
- tags
- Television
- Testing
- theme
- TinyMCE
- Token
- Tools
- TV Guide
- Uncategorized
- Views
- WordPress
Disclaimer
- The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.












