Archive for October, 2008
* Pass the Aspirin
Posted on October 24th, 2008 by Phil. Filed under Drupal, PHP, Television, Views, tags.
For those of us in the northern hemisphere, fall has arrived! In between raking up and burning piles of leaves (and useless 401K statements), we here at WGBH Online have continued to fine tune our new(ish) TV Programs and Schedules pages.
As you may recall, not long after launch in August, we began to revisit the whole notion of how we’re tagging our TV programs and episodes. The main reason was to improve the way we generate lists of related programs, so as to suggest to visitors other shows they might like. Our initial approach was simple: just tag the programs (not individual episodes) and use a Drupal view to generate a list of up to three related programs.
But this soon proved restrictive. Sure, Frontline is a News and Public Affairs program, but individual shows in the series can be about different things (technology, politics, science). So, we wanted to be able to capture this more detailed level of information and use it to generate more useful lists of related programs for our visitors.
After much thought and discussion (not to mention headaches), we came up with an expanded tagging scheme and more sophisticated program matching logic, which has now been implemented on the site. Here’s what we did:
We renamed our existing TV Program Genre vocabulary to TV Program Primary Genres.
The terms remained the same (a small set of high level classifications) and these are still only applied at the program level.
We then added a new vocabulary that can be applied to both TV programs and episodes: TV Program/Episode Secondary Genres.
This secondary list has many more terms that now allow for a more sophisticated level of classification. tags applied at the program level apply to all episodes in a series. Tags applied at the episode level are only applicable to that particular episode.
Once we had that in place we then had to think about how, using these tags and given a single program episode, we would define rules for identifying “related” programs and episodes.
This is where the aforementioned headaches started to kick in.
Once you started to think about it, all sorts of questions cropped up, like, which carries more weight, matching primary genre tags or secondary genre tags (or should they count equally)? Or, assuming two related programs have the same tags as the target episode, how to break the tie? Or, do we match an episode within one series to other episodes in that series or restrict it to episodes of other series?
Pass the aspirin, because I’m getting a headache just thinking about it again.
Luckily, we have some fine folks working here who sat down and really noodled through this to come up with some matching logic. When written out, the matching rules looked something like this:
1. Match at the episode level
2. Cull only from upcoming or recently-aired episodes
3. Look for most tag matches, with all tags equally weighted
4. Only allow one episode per program/series to appear in “You Might Also Like” box
5. In a tie, give priority to episodes with same “Program Primary” tag
6. If still a tie, give priority to episodes with exact same tag makeup (i.e. both have only one Primary tag)
7. If still a tie, give priority to the episode with soonest upcoming airing.
The idea was then to use the tags and these rules to generate up to three matches for each episode to display in the “You might also like” block in the right hand rail.
Well, up to three matches, unless there were more than three episodes with the exact same tag structure as the target episode. In that case, we will display up to five such matches.
No sweat!
In order to actually implement this, we could no longer just spit out the results from a view. Nope. Instead, we had to jump through a whole bunch of hoops. Here’s the thumbnail sketch of the implementation:
1. Given the tags for a target episode, query a view of TV programs, fetching all programs that match at least one Program Primary or Secondary tag.
2. Filter this list of programs, including only programs with an airing in our schedule data window (one week back, two weeks ahead).
3. Then count the exact number of tag matches and calculate a matching score for each program, based on the above rules. Then store the program in an array.
NOTE: I won’t go into the exact matching score formula here. Suffice it to say we came up with a formula that encapsulates the above matching and ordering rules. Please pass the aspirin again…
4. Next query a view of TV episodes, fetching all episodes that match at least one Episode Secondary genre of the episode in question.
5. Filter this list of episodes, including only those with an airing in our schedule data window. For each one count the exact number of matching genre tags for the episode and calculate the matching score. See if the episode’s parent program is already in the array of matching programs. If so, replace it with this episode if the matching score is higher.
6. Given the final array of matching episodes, reorder the array by the matching scores and display the top three (or five) entries!
The resulting PHP code to implement all of this ran to about 240 lines and looked a little something like this:
All that just to generate this on the front end:
Anybody know the limit on the number of aspirin you can take in one day?
* Site Maintenance Maintenance
Posted on October 10th, 2008 by Phil. Filed under Apache, Drupal, MySQL, SVN, database.
One great thing about Drupal is being able to easily put your site into site maintenance mode. For example, if you need to perform some site maintenance work (like installing core or contributed module code updates) you can easily put the site into this mode by clicking a button on the following form:
What this will do is then direct all anonymous and non-admin users to a page using your chosen theme with a message that you write, which you plug into the message box. For WGBH.org, the page looks like this:
So - easy!
However, this page can’t be used for some types of site maintenance, like, for example, maintenance work that takes the database offline. No database, no Drupal-generated site maintenance page. Bummer.
We recently faced this problem when our fine friends in the WGBH IT department needed to do some MySQL maintenance work (they wanted to do a little cleanup and reorganization of the database files on the production server). Since this type of work comes up now and again, I wanted to devise an easy way to post the same site down page that all requests for the site would get directed to, with the minimal amount of work, so IT could incorporate it into their process for future maintenance work. I wanted a simple process to make everybody’s life as easy as possible.
My first thought was to have an alternate Apache config file for the site, which would point to a different document root that stored the site down code and graphics. This would work well enough, but would require stopping Apache and then restarting it using the new configuration file. Not too complicated, but still more steps then I wanted.
After some coffee and deep thinking the solution popped right out me: symbolic links!
The document root for the site is actually a symbolic link to the real directory of Drupal code. So, I figured, if we just change that link to point to a new document root, containing the site down code, then - bingo - we’d be done! That’s even easier than using an alternative Apache conf file.
So, this is what we did. There was just one other fine point here: where to put the site down directory?
Initially, I figured on a directory completely separate from the Drupal tree. However, that would then mean we’d need to copy all the required images and style sheets from the Drupal tree to the site down tree, making future maintenance a bit more work (we do tweak the site down page according to what’s going on at the time). Kind of a pain.
What we did instead was put the site down directory within the Drupal directory tree. That way the page code could reference the appropriate images and style sheets. Plus, that code then gets managed via SVN as part of our Drupal code base. The final approach, then, involved the following:
* Create a site_down directory under the top-level Drupal directory.
* Create an index.html file in that directory that contains the source code from the Drupal-generated site maintenance page.
* Tweak the source code to use absolute, rather than relative, links to images and CSS files.
* Create an .htaccess file to make sure all page requests get redirected to the index.html page.
Voila! Using this method, all IT had to do before performing their database maintenance was change the docroot’s symbolic link to point to the site down directory. Then, when the work was done, change the link back. No need to even stop/restart Apache.
So - once again - easy!
If only fixing the economy were so simple…
Do you have a different method for handling this sort of thing (or for fixing the economy)? Talk amongst yourselves then please share!
Archives:
- February 2009
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
Categories:
- Apache
- Architecture
- Boost
- caching
- CCK
- CMS
- cron
- CVS
- database
- Date
- Devel
- Drupal
- Drupalcon
- FeedAPI
- Flickr
- Image Assist
- Images
- Install Profiles
- MacBook
- Memcache
- MySQL
- NPR
- Pathauto
- PBS
- PHP
- Preview
- Protrack
- Public Media
- search
- Social Media
- SQL
- SVN
- tags
- Television
- Testing
- theme
- TinyMCE
- Token
- Tools
- TV Guide
- Uncategorized
- Views
- WordPress
Disclaimer
- The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.
















