Sampling, panels, and online survey data quality

The Advertising Research Foundation (ARF) announced a new “Research Quality Super Council” on October 15, 2010, a follow-on to the large, protracted Online Research Quality Council (ORQC) that concluded its work in 2009 (Disclosure: the author was a member of the ORQC from 2008-2009.)

The original ORQC effort had its genesis in a much-publicized incident in which the senior research buyer for a large consumer packaged goods company announced that she had concerns about online surveying. She had observed that some of their recent, commissioned online surveys featuring parallel questions and sampling specifications turned out to have results that varied more than expected. She suspected sampling error due to the almost exclusive use of panels in online research. In theory, at least, parallel studies should not produce disparate results if correct, “probability-based” sampling techniques are used. In reality, true probability sampling in market research applications has not been achievable for decades, not in mail surveying, not in telephone surveying, and not in the increasingly popular realm of online survey research, for reasons which I will explore in a later post. However, there are techniques that can appropriately enhance data quality, in sampling as well as design and administration. Here, I want to comment briefly on the much maligned “sample blending” practices of online survey research, and which was one of the principle emphases of the original ORQC investigation.

Migrating the highly segmented, lengthy, granular surveys common to the RDD telephone surveying era to the internet has been a somewhat bumpy process. Deploying such surveys, that may have dozens or even hundreds of sampling cells and multiple layers of nested criteria, requires huge sampling frames. Although the web has some estimated 2 billion users, only a small percentage belong to a research panel. Even the largest panels top out at about a million members, and only a small slice of these are active members. To fulfill the demands of most modern, complex surveys, research companies must “blend” panels, meaning that they work with other panel companies and sample aggregators to pull together enough sample to meet the criteria. Many studies require a “sample-to-complete” ratio of up to 100:1; sampling plans targeting more rarefied segments (such as super-affluent professionals, young males, etc.) may need ratios of as high as 250:1 or even 500:1. To put this in perspective, a highly targeted study looking for a final sample size of n=5000 could need an initial sampling frame of 2,500,000 panelists to reach that number of completes. The only way researchers can meet that goal is by “blending” panels.

Although panel blending is almost a universal practice in online research, it lacks a basis in classic sampling theory. Additionally, it is to be expected that different panels are recruited from different sources, may attract different members depending on the appeals used, and are likely to have different management practices applied to them. All of these differences could have an impact on data quality when these sample sources are subjected to a “mashup” due to blending. The ORQC recommended that researchers be transparent about the source of their sample, and we should be able to demonstrate ways in which panel variances may affect results, or not, for any given study. Sampling theory itself may need to be re-examined to incorporate the reality of the modern research environment, in which unified, comprehensive sampling frames are no longer readily available.

The news isn’t all bad, however:  some studies have shown as samples grow larger — regardless of whether ‘probability-based’ or not, they tend to exhibit qualities more like a random distribution. In other words, larger samples help overcome possible sampling bias due to blending, and very large sample sizes can be achieved at generally lower cost through online survey sampling than through any other modality.

Other research has shown (including the ORQC’s study) that variance can be reduced, as in a tracking study scenario, by meticulously repeating the sample distribution by panel source from study to study. Therefore, if a survey pulls 40% of its sample from Panel A in wave 1, 35% from Panel B, and 25% from Panel C, Wave 2 should replicate this as closely as possible.

Further research needs to be done on the quality implications associated with pulling sample from traffic aggregators  (also known as “river sampling.”)     Although their methods vary, they do not “empanel” people but instead typically capture internet users as they travel through a network of sites, often targeting users by search keywords used, demography of the site they are visiting (and by attribution, assuming the user will have similar characteristics…) or by applying other types of behavioral targeting to ‘prescreen’ potential survey candidates against the needs of various research requirements. These aggregators pass these visitors through to the survey organization. This method has potential in the sense that it re-introduces some aspects of random sampling. The downside is that aggregators cannot know as much about the individual as panel companies know about their panelists, most of whom are thoroughly profiled over time and whose response/cooperation rates are known, their engagement in survey processes can be measured and tracked, etc.

The “Super Research Council” assembled five years ago promised to extend the initial ORQC work by considering how social media and market research will intersect, and continuing to explore many different areas in which research and data quality can improve.

Online research is evolving at a rapid pace. It cannot be overstated, though, how important it is to make certain your research vendor is thoroughly up-to-date on the latest methodologies, techniques, and best practices in the field. Ask as many questions as you need about any aspect of the research process that would be used in your study. A qualified researcher will be very comfortable explaining exactly how the firm expects to proceed, and what is done to optimize data quality and the applicability of the resulting information to your decision needs.

* The focus on sample as the sole source of survey error is probably misplaced. There are many potential sources of error in all survey research modalities, which must be anticipated and minimized.
— Dr. Cheryl Harris

Similar Posts

  • April 20, 2015 03:01 ET

    Enterprise Mobile Security: New Global Study Exposes...

  • In the past several years,  even marketers that had long resisted bringing...

  • Here’s another important but under-explored factor in data quality — respondent engagement.  ...

  • The Advertising Research Foundation (ARF) announced a new “Research Quality Super...

  • The first rule of marketing, particularly in online marketing where there is...

  • We all know about the "call to action" and its importance in...

Leave a Reply

Your email address will not be shared or published.

You must be logged in to post a comment.