The Summary
This summary is just about the process behind developing the formula. For more on the general purpose behind the number, I'll point you to the intro from about a month ago.
First, I broke the external context of a TV rating into three categories: overall TV viewing, timeslot competition, and lead-in rating.
I began the overall viewing level portion by taking a general look at viewing level tendencies across the week (specifically the five weeknights vs. the two weekend nights) and also by half-hour across each evening of primetime.
Then I looked at overall viewing across the entire season and compared it to ratings changes across the entire season (for a selection of shows that didn't undergo drastic competition/lead-in changes). The two correlated well for a big chunk of the season, but there were two big exceptions.
1) Ratings in early fall were noticeably bigger than the viewing levels suggested they should be. This problem had no clear cause aside from the considerable hype surrounding the beginning of the season, so I devised an "Early Fall Hype" adjustment for the first six weeks of the season.
2) Ratings in the late spring were noticeably smaller than the viewing levels suggested they should be. The main cause for this was a late-season change in the way Nielsen calculated viewing levels. So I came up with a way to convert New Methodology viewing levels to a reasonably reliable Old Methodology extrapolation.
As a last way of checking my viewing level work, I looked at viewing changes during particularly down times like weekends and holidays and compared them with rating changes during those down times.
I began competition with some general thoughts on the project's definition of competition and how that part of the formula would develop.
To get a baseline adjustment for competition, I looked at how a variety of shows performed in the face of drastically increased competition.
Again, a couple key problems arose.
1) My measures of "competition" were unfair to programs in the 10:00 hour since Fox does not program that hour nationally. Their numbers are (usually) unknown and thus not counted, but still a very real part of the competitive picture. So I came up with a constant number to add to competition figures at 10:00 to try to give the hour a fairer shake.
2) As first supported in the viewing level portion, the audience for sporting events is less inclined toward other primetime programming and thus significantly different from regular competition. So I came up with a way to have sports audiences count for less than their full rating when devising competition counts.
To finish off the competition portion of the formula, I came up with a baseline for "normal" competition that, in turn, helps figure out when shows have "heavy" or "light" competition.
I started lead-ins by laying out the same two fundamental issues as in the other two categories: how much does a lead-in affect a show, and what's a "normal" lead-in?
The first issue is affected by the length of the show, so I took a look at big lead-in changes for half-hour shows and hour-long shows to develop a baseline adjustment for each case.
So I could apply this to all shows, I had to approximate a lead-in value for shows at the beginning of primetime; 7:00 numbers for the aggregation of local programming are not consistently available. I also came up with a separate approximation for the CW.
Finally, I devised a rather simple solution to the question of what a "normal lead-in" is.
The Full Formula, v1.0
True Strength = A18-49 rating * PUT Adjustment * Competition Adjustment + Lead-in Adjustment
PUT Adjustment = Early Fall Hype * 34.12 / (PUT * Old Methodology Adjustment)
Early Fall Hype = 0.948 (if in the first six weeks of the season)
Old Methodology Adjustment = 0.968 in the 8:00 hour, 0.929 in the 9:00 hour, 0.901 in the 10:00 hour (if date is 3/28/2011 or later)
The A18-49 rating divided by PUT part is explained here. PUT = an estimation of the percentage of TV-owning adults 18-49 persons using TV in the timeslot.
The 34.12 constant is an estimated average Old Methodology PUT. It creates a ratio with the actual PUT to create a number relatively close to an actual rating. This number was 33.805 originally, then recalculated as 33.75; a recent tweak to the methodology adjustment then saw it go up to its current 34.12.
Competition Adjustment = 1 + (Competition Baseline * (bcPUT - Expected bcPUT))
Competition Baseline = 0.0375
bcPUT = Sum of known broadcast PUT-adjusted A18-49 ratings in the timeslot (plus an additional 3.54 in the 10:00 hour, and sports shows count as 1/2 their PUT-adjusted A18-49 rating)
Expected bcPUT = 0.30 * PUT (if Sunday to Thursday) OR 0.23 * PUT (if Friday or Saturday)
Lead-In Adjustment = Lead-in Baseline * (Com-TS - Lead-In)
Lead-In Baseline = (1 / (4+ 2 * Program Length in hours))
Com-TS = Program's A18-49 rating with PUT and Competition Adjustments above
Lead-In = Lead-in's A18-49 with PUT and Competition Adjustments above; beginning-of-primetime shows on the big four get a 1.81, while those shows on the CW get a 0.67
Here's an image of the whole thing all in one place (click to enlarge):
Weaknesses/Outstanding Issues
As above, I'm keeping the talk about the general functions of the number to the intro, and this will mostly be about the integrity of the process explained above.
I don't have the mathematical background that it seems a lot of the sports world's sabermetricians do; I was an English major! So while I feel I've come up with something that works somewhat well, there are a few things that my relatively basic math skillz can't tackle (or at least haven't so far). I still think it's a good idea to throw some of the problems out there, if nothing else just so I can articulate them to myself. Here goes:
- One of the ongoing annoyances about creating an objective analysis of TV ratings is that the foundations are not solid. More people are using other sources to watch TV. Even a perfect formula based on 2010-11 may be obsoleted by next year, which is why I didn't dig back into the 2009-10 season or before to get more data. The numbers in blue above are those that I feel are particularly susceptible to a changing TV landscape. Perhaps lead-ins and competition will continue to matter less as DVR usage increases. Perhaps overall viewing will change. So those are things I'll have to stay on top of even if I think the stat ends up being awesome.
- A big obstacles this summer has been the same thing that plagued daily posts: a lack of access to breakdowns within the show. This means shows contribute the same value to a PUT calculation in every half-hour and the same rating to a competition calculation in every half-hour, and that's usually not the case. I think (but am not 100% sure) that in 2011-12 I will have half-hour breakdowns for every show (more on the logistics of this in tomorrow's post), and that better info may teach us some new things or that some of the old things were wrong. My hope is that what we have so far isn't hugely wrong because much of the discrepancies are cancelled out; for example, there may be less overall viewing at 8:00 than is credited, but there's also less competition because the broadcast shows are weaker then.
- I switched from using "Competition" (all OTHER broadcast ratings) to total "Broadcast Persons Using TV" (ALL broadcast ratings) at the beginning of the competition posts because I wanted each show in a given timeslot to get credit for being in the same competitive environment. A nice idea, but it ends up greatly rewarding a really dominant show like American Idol which gets credit for "facing itself" so to speak. I thought the other way (going with "Competition") would be unfair to an Idol because it would not account for the fact that Idol's mere presence is lowering the ratings of all the competition. Either way, I think Idol is currently the only regularly scheduled show big enough for this to be a really noticeable problem, and its True Strength is gonna be way out ahead of anything else anyway. Just a matter of degree. But it's something I probably ought to figure out at some point.
- There are always going to be shows that are "underadjusted" or "overadjusted," and trying to get shows to have the exact same True Strength in every single situation is a pretty hopeless endeavor. However, I have noticed that episodes do tend to be consistently "overadjusted" when the situation is drastically pointing in the same direction in all three of the steps. The most egregious example of this is the post-DWTS episode of Better with You, which had a 2.6 A18-49 but was dragged way down by all three of the adjustments to a point where it actually performed much worse in True Strength than its usual Wednesday episodes (which were getting around a 1.6 A18-49). There seems to be a "whole is less than the sum of the parts" situation in play both there and with a few other random episodes. I haven't come up with a mathematically legit way of explaining it, but there's just this sense that when all three things drag in one direction, perhaps we shouldn't take those drastic adjustments all three times. Will continue to think on it.
- Of all the posts linked above, the only one where I really feel I didn't come up with a fairly decent solution is the lead-ins for hour-long shows. It seemed like I was sort of hinting at a gulf between how new shows are affected by lead-in changes (typically upper-10%s to 20%+ of the lead-in change) and how veteran shows are affected (closer to 10%), and it matched up reasonably well with the shows in the local programming post. The problem with pulling the trigger on that is that there's no sign of similar behavior in the half-hour shows, and I don't have an explanation that doesn't sound like "look at the results first, then come up with explanation." I could say something like: "It's always 20% of the lead-in in the first half-hour regardless. That 20% is maintained or close to it in the second half-hour for a new show getting sampled, but with veteran shows, the influence is almost completely wiped away by the second half-hour." It sounds like an accurate description, just sounds a little overly fanwanky/pulled-from-my-ass to me. So I'll see if I can come up with something. If not, I'll just go with that middle ground number (16.667%) which doesn't seem to work too well in either case, but isn't absolutely awful in either case either.
I will likely add to (or hopefully subtract from!) this portion going forward. For now, those are the biggies. In the grand scheme, I hope they aren't that big, but they should indicate that this is still a fairly fluid process and this number is not exactly at a point where it stands up to intense scrutiny. I'm sure there are others I haven't thought of, but I think it's time I stop coming up with reasons to discredit my own statistic!
I may make tweaks to the formula between now and the unofficial start of the regular season (September 13), though I think at that point I'll want to lock something in at least till the networks go on hiatus in late December. If I'm changing the formula every week, it'll be even less useful. Probably the best thing that can happen is to have some new data to apply it all to; I feel like I've been looking at these same 2010-11 ratings forever.
Tomorrow, one more wrap-up post as I explain how I'll implement this number going forward.
No comments:
Post a Comment