Respondent Driven Sampling
What is Respondent Driven Sampling?
Respondent Driven Sampling (RDS) is the method used for sampling from hidden or hard-to-reach human populations. First developed in 1997 by Douglas Heckathorn, it is commonly used in HIV research, where groups at 'high risk' of disease exposure are hard-to-reach such as sex workers and illicit drug users. It has also been used for sampling other groups for public health such as the homeless or street youth. The method has been in use since the late 1990s and it is a link tracking network sampling technique for collecting information. Data are collected through a peer referral process over social networks. RDS has been used by a number of well known public health organisations such as the World Health Organisation (WHO) and the US Centers for Disease Control as it can be used in difficult settings.
RDS essentially combines "snowball sampling" (getting individuals to refer those they know, these individuals in turn refer those they know and so on) with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a non-random way. This means the results can be analysed using statistical methods and conclusions can be drawn about the target population. It overcomes the problem of trying to sample from a very small population for which there are no sampling or population lists or who are difficult to identify. It combines the breadth of coverage of network-based methods with the statistical validity of standard probability sampling methods. This makes it possible for the first time to draw statistically valid samples of previously unreachable groups. Respondents recruit their peers and and researchers keep track of who recruited whom and their numbers of social contacts. A mathematical model of the recruitment process then weights the sample to compensate for non-random recruitment patterns. This model is based on a synthesis and extension of two areas of mathematics: Markov chain theory and biased network theory.
How does it work?
Researchers select people in an ad hoc manner, typically 5-10 members of the target population to serve as 'seeds'. Each seed is interviewed and given a fixed number of coupons (three is common practice) that they may use to recruit other members of the target population. These recruits are in turn provided with coupons that they use to recruit others. In this way, the sample grows in what is called 'waves' resulting in what is termed 'recruitment trees'. Respondents are encouraged to participate and recruit through the use of financial and other incentives. It is recommended that people have a week to recruit as there is evidence to show that 92-95% of participants will distribute their coupons within the first week. The majority of participants are recruited by respondents not by the researchers.
RDS is designed to begin as a convenience sample, selecting subsequent samples dependent on previous samples and then treating the final sample as a probability sample. Analysis can be done using specific computer packages such as RDS analysis tool (RDST) and RDS-Analyst. RDSAT estimates parameters such as proportions. Analysis with RDSAT requires separate weight assignment for individual variables even in a single individual (which makes running regression models difficult). RDS-analyst performs three methods of estimates, namely, successive sampling method, RDS I and RDS II. Both these statistical software are in the process of refinement and you would need special skills to perform the analysis. There is a growing body of work to simplify this process (e.g. Selvaraj et al, cited below).
Researchers assume reciprocity - ie. the recruiter and recruit are known to each other and that both people are willing to recruit each other. Researchers should collect information about the relationship between recruiters and recruits. Researchers are also aware that respondents' decision making affects the sampling process. Respondents are not allowed to recruit people who have already participated. Researchers also have to be aware of the influence of any small well-connected subgroup being sampled at a high rate as it influences the future referral choices of other subgroup members.
Researchers are recommended to test the following assumptions:
- finite population effects on sampling;
- recruitment bias;
- validity of the timeframe used.
Assumptions can be tested through computer simulation or by analytical methods that detect violation of the assumptions in practice.
Advantages of Respondent Driven Sampling
- It can generate large samples of a wide variety of hard-to-reach populations.
- It is designed to reduce the biases of network-based, snowball or chain-based sampling such as the choice of initial participants, volunteerism, and masking.
Issues with Respondent Driven Sampling
- The sampling design is beyond the control of the researcher and not fully observable. Researchers may not know the size of the personal networks of recruits.
- It requires researchers to make assumptions about the recruitment process and the social network that connects the target population. There is an unknown dependency between recruiters and recruits
- It can be difficult to attain the sample size as the available population may be smaller than the target population or participants may be unable to recruit additional members of the target population. Other risks are insufficient incentives, inadequate network connections in the population and negative perception of the study by the target population.
- It may be difficult to recruit - respondents may be influenced by those they know who have already participated in the study.
- The recruitment process is affected by three different types of decision making:
- the decision by the recruiter to pass on the coupons;
- the decision of the recruit to accept a coupon
- the decision of the recruit to participate in the study.
Gile, K; Johnston, L.G. & Salganik, M.J. 2015. "Diagnostics for respondent-driven sampling". Journal of the Royal Statistical Society, 178:241-269.