Archive for the ‘tags’ Category

* Pass the Aspirin

Posted on October 24th, 2008 by Phil. Filed under Drupal, PHP, Television, Views, tags.


For those of us in the northern hemisphere, fall has arrived! In between raking up and burning piles of leaves (and useless 401K statements), we here at WGBH Online have continued to fine tune our new(ish) TV Programs and Schedules pages.

As you may recall, not long after launch in August, we began to revisit the whole notion of how we’re tagging our TV programs and episodes. The main reason was to improve the way we generate lists of related programs, so as to suggest to visitors other shows they might like. Our initial approach was simple: just tag the programs (not individual episodes) and use a Drupal view to generate a list of up to three related programs.

But this soon proved restrictive. Sure, Frontline is a News and Public Affairs program, but individual shows in the series can be about different things (technology, politics, science). So, we wanted to be able to capture this more detailed level of information and use it to generate more useful lists of related programs for our visitors.

After much thought and discussion (not to mention headaches), we came up with an expanded tagging scheme and more sophisticated program matching logic, which has now been implemented on the site. Here’s what we did:

We renamed our existing TV Program Genre vocabulary to TV Program Primary Genres.

TV Program Primary Genres

The terms remained the same (a small set of high level classifications) and these are still only applied at the program level.

We then added a new vocabulary that can be applied to both TV programs and episodes: TV Program/Episode Secondary Genres.

TV Program/Episode Secondary Genres

This secondary list has many more terms that now allow for a more sophisticated level of classification. tags applied at the program level apply to all episodes in a series. Tags applied at the episode level are only applicable to that particular episode.

Once we had that in place we then had to think about how, using these tags and given a single program episode, we would define rules for identifying “related” programs and episodes.

This is where the aforementioned headaches started to kick in.

Once you started to think about it, all sorts of questions cropped up, like, which carries more weight, matching primary genre tags or secondary genre tags (or should they count equally)? Or, assuming two related programs have the same tags as the target episode, how to break the tie? Or, do we match an episode within one series to other episodes in that series or restrict it to episodes of other series?

Pass the aspirin, because I’m getting a headache just thinking about it again.

Luckily, we have some fine folks working here who sat down and really noodled through this to come up with some matching logic. When written out, the matching rules looked something like this:

1. Match at the episode level
2. Cull only from upcoming or recently-aired episodes
3. Look for most tag matches, with all tags equally weighted
4. Only allow one episode per program/series to appear in “You Might Also Like” box
5. In a tie, give priority to episodes with same “Program Primary” tag
6. If still a tie, give priority to episodes with exact same tag makeup (i.e. both have only one Primary tag)
7. If still a tie, give priority to the episode with soonest upcoming airing.

The idea was then to use the tags and these rules to generate up to three matches for each episode to display in the “You might also like” block in the right hand rail.

Well, up to three matches, unless there were more than three episodes with the exact same tag structure as the target episode. In that case, we will display up to five such matches.

No sweat!

In order to actually implement this, we could no longer just spit out the results from a view. Nope. Instead, we had to jump through a whole bunch of hoops. Here’s the thumbnail sketch of the implementation:

1. Given the tags for a target episode, query a view of TV programs, fetching all programs that match at least one Program Primary or Secondary tag.

2. Filter this list of programs, including only programs with an airing in our schedule data window (one week back, two weeks ahead).

3. Then count the exact number of tag matches and calculate a matching score for each program, based on the above rules. Then store the program in an array.

NOTE: I won’t go into the exact matching score formula here. Suffice it to say we came up with a formula that encapsulates the above matching and ordering rules. Please pass the aspirin again…

4. Next query a view of TV episodes, fetching all episodes that match at least one Episode Secondary genre of the episode in question.

5. Filter this list of episodes, including only those with an airing in our schedule data window. For each one count the exact number of matching genre tags for the episode and calculate the matching score. See if the episode’s parent program is already in the array of matching programs. If so, replace it with this episode if the matching score is higher.

6. Given the final array of matching episodes, reorder the array by the matching scores and display the top three (or five) entries!

The resulting PHP code to implement all of this ran to about 240 lines and looked a little something like this:

Related Program Code

All that just to generate this on the front end:

Related Programs Block

Anybody know the limit on the number of aspirin you can take in one day?

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Pownce
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis



* Tag, You’re It!

Posted on September 8th, 2008 by Phil. Filed under Television, Views, tags.


The new WGBH TV Programs and Schedules module has been up and running on Drupal for almost a month now and - knock on wood - everything is working great! In fact, things have been going so well, operationally, at least, that all has been … quiet!

Quiet is good.

Now that this phase of the project is all done we are turning our full attention towards the real goal: porting all of WGBH.org to Drupal and completely overhauling the information architecture and user interface. We’re currently busy doing content audits, wireframes, schedules and all that sort of fun site redesign stuff. Nothing is ready yet for actual development.

In the meantime, we’re also addressing a few small desired functionality changes to TV programs and schedules that we chose not to address during the build. At the top of the list is the way that we generated the list of related (i.e. You might also like) programs on our episode pages.

You Might Also Like

Currently, this list is generated automatically using tags applied at the program (series) level. We developed a simple TV Program Genre vocabularly to potentially apply to each program.

TV Program Genres

Content producers ultimately have the ability to override the automatically generated list if they like, but, for the most part, what you see is generated on the fly based on the tags.

The upshot here is that by only applying tags at the program/series level, each episode of a given series (e.g. all NOVA episodes) will display the same set of related programs. So while NOVA may generally be a science program, a given program may be focused on, say, a physics problem, but this isn’t reflected in the related programs list. Our tagging scheme doesn’t currently allow us to relate programs on a more granular level than the simple genres we’ve defined.

Initially, we had planned to support tagging at the episode level for the initial build for use in generating the related program list. However, when we sat down to hash out how it should work it quickly became clear that using tags at both the program and episode level made things far more complex.

For example, right now, with tags only at the program level, it’s pretty simple. On a given episode page, we fetch the tags on the parent program, then using a view, generate a list of other programs with the same tag(s) and display three of those. Easy-peasy!

But once you throw episodes into the mix you now have to make decisions on issues, like, do matches on program or episode level tags count more? Should we weigh episodes with matching tags that are part of the same series more - or less - than similarly tagged episodes from other series? Etc and etc.

So, we tabled the issue for the first release and just went with program level tags. Now we want to start tagging episodes and come up with rules to generate a more granular list of related programs. That’s something we’re hoping to work out this week.

Have you run into a similar problem? Any thoughts on how best to do this? Speak now!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Pownce
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • TwitThis



    www.flickr.com
    This is a Flickr badge showing public photos and videos from WGBH.org Development Blog. Make your own badge here.

Archives:

Categories:

  • Disclaimer

  • The opinions expressed in here are those of the writers/contributors and do not necessarily represent the views or opinions of the WGBH Educational Foundation.