Extended Abstract. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. To draw a sample from the underlying population, a variety of sampling methods can be employed, individually or in combination.
Cut-off sampling is a procedure commonly used by national statistical institutes to select samples. There are different types of cut-off sampling methods employed in practice. In its simplest case, part of the target population is deliberately excluded from selection. For example, in business statistics it is not unusual to cut off (very) small enterprises from the sampling frame. Indeed, it may be tempting not to use resources on enterprises that contribute little to the overall results of the survey. So, in this case, the frame and the sample are typically restricted to enterprises of at least a given size, e.g. a certain number of employees. It is assumed that the contribution of this part of the population is, if not negligible, at least small in comparison with the remaining population.
In particular, cut-off sampling is used when the distribution of the values Y1, ..., YN is highly skewed, and no reliable frame exists for the small elements. As explained above, such populations are often found in business surveys. A considerable portion of the population may consist of small business enterprises whose contribution to the total of a variable of interest (for example, sales) is modest or negligible. At the other extreme, such a population often contains some giant enterprises whose inclusion in the sample is virtually mandatory in order not to risk large error in an estimated total. One may decide in such a case to cut off (exclude from the frame, thus from sample selection) the enterprises with few employees, say five or less. The procedure is not recommended if a good frame for the whole population can be constructed without excessive cost.
This method may reduce the response burden for these small enterprises. On the other hand, this elementary form of cut-off sampling, which we refer to as type I cut-off sampling, may be considered a dirty method, simply because (i) the sampling probability is set equal to zero for some sampling units and so it can be considered as a type of non-probability sampling design, and (ii) it leads to biased estimates.
However, the use of cut-off sampling and its modified versions can be justified by many arguments. Among other one can argue, and justify the use of cut-off sampling, that
- It would cost too much, in relation to a small gain in accuracy, to construct and maintain a reliable frame for the entire population;
- Excluding the units of population that give little contribution to the aggregates to be estimated usually implies a large decrease of the number of units which have to be surveyed in order to get a predefined accuracy level of the estimates;
- Putting a constraint to the frame population and, as a consequence, to the sample allows to reduce the problem of empty strata;
- The bias caused by the cut-off is deemed negligible.
In this paper we discuss different types of cut-off sampling methods with more emphasize on analyzing type III cut-off sampling which consists of take all, take some, and take none criteria. Roughly speaking, in our discussed methods, the population is partitioned in two or three strata such that the units in each stratum are treated differently; in particular, a part of the target population is usually excluded a priori from sample selection. We discuss where we should consider cut-off sampling as a permitted method and how to deal with it concerning estimation of the population mean or total using model-based, model-assisted, and design-based strategies. Theoretical results will be given to show how the cut-off thresholds and the sample size should be chosen. Different error sources and their effects on the overall accuracy of our presented estimates are also addressed.
The outline of the paper is as follows. In section 2, we briefly discuss different types of cut-off sampling design and some of their properties. In section 3, we first introduce our notations and motivate the use of type III cut-off sampling. We further discuss estimation of the population mean (or total) based on ignoring the population units in ``take none" strata or by modeling them using auxiliary information. We study the problem of ratio estimation of the population mean and type III sample size determination (for given precision of estimation) using design-based, model-based, and model-assisted strategies. In this section, we also study the problem of threshold calculation and its approximation using different methods and under different conditions. Finally, in section 4, we present a simulation study and compare our obtained results with the ones under commonly used cut-off sampling of type I and its modification.