| Title: | Robust Survey Statistics Estimation |
|---|---|
| Description: | Robust (outlier-resistant) estimators of finite population characteristics like of means, totals, ratios, regression, etc. Available methods are M- and GM-estimators of regression, weight reduction, trimming, and winsorization. The package extends the 'survey' <https://CRAN.R-project.org/package=survey> package. |
| Authors: | Beat Hulliger [aut] (ORCID: <https://orcid.org/0000-0001-5252-8606>), Tobias Schoch [aut, cre] (ORCID: <https://orcid.org/0000-0002-1640-3395>), Martin Sterchi [ctr, com], R-core [ctb, cph] (for zeroin2.c) |
| Maintainer: | Tobias Schoch <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.7-3 |
| Built: | 2026-06-04 07:25:28 UTC |
| Source: | https://github.com/tobiasschoch/robsurvey |
A key design pattern of the package is that the majority of the estimating methods is available in two "flavors":
bare-bone methods
survey methods
Bare-bone methods are stripped-down versions of the survey methods in terms of functionality and informativeness. These functions may serve users and package developers as building blocks. In particular, bare-bone functions cannot compute variances.
The survey methods are much more capable and depend, for variance estimation, on the survey package.
Bare-bone methods: weighted_mean_trimmed and
weighted_total_trimmed
Survey methods: svymean_trimmed and
svytotal_trimmed
Bare-bone methods:
Survey methods:
Bare-bone methods: weighted_mean_dalen and
weighted_total_dalen
Survey methods: svymean_dalen and
svytotal_dalen
Bare-bone methods:
huber2 (weighted Huber Proposal 2
estimator)
Survey methods:
mer (minimum estimated risk estimator)
Regression M-estimators: svyreg_huberM and
svyreg_tukeyM
Regression GM-estimators (Mallows and Schweppe):
svyreg_huberGM and svyreg_tukeyGM
Ratio M-estimators:
svyratio_huber and svyratio_tukey
Note: The functions svyreg_huber and
svyreg_tukey are deprecated, use instead
svyreg_huberM and svyreg_tukeyM, respectively;
see also robsurvey-deprecated.
Regression predictors: svymean_reg and
svytotal_reg
Ratio predictors: svymean_ratio and
svytotal_ratio
Methods and utility functions for objects of class svyreg_rob.
## S3 method for class 'svyreg_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' summary(object, mode = c("design", "model", "compound"), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' coef(object, ...) ## S3 method for class 'svyreg_rob' vcov(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' SE(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' residuals(object, ...) ## S3 method for class 'svyreg_rob' fitted(object, ...) ## S3 method for class 'svyreg_rob' robweights(object) ## S3 method for class 'svyreg_rob' plot(x, which = 1L:4L, hex = FALSE, caption = c("Standardized residuals vs. Fitted Values", "Normal Q-Q", "Response vs. Fitted values", "Sqrt of abs(Residuals) vs. Fitted Values"), panel = if (add.smooth) function(x, y, ...) panel.smooth(x, y, iter = iter.smooth, ...) else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, add.smooth = getOption("add.smooth"), iter.smooth = 3, label.pos = c(4, 2), cex.caption = 1, cex.oma.main = 1.25)## S3 method for class 'svyreg_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' summary(object, mode = c("design", "model", "compound"), digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svyreg_rob' coef(object, ...) ## S3 method for class 'svyreg_rob' vcov(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' SE(object, mode = c("design", "model", "compound"), ...) ## S3 method for class 'svyreg_rob' residuals(object, ...) ## S3 method for class 'svyreg_rob' fitted(object, ...) ## S3 method for class 'svyreg_rob' robweights(object) ## S3 method for class 'svyreg_rob' plot(x, which = 1L:4L, hex = FALSE, caption = c("Standardized residuals vs. Fitted Values", "Normal Q-Q", "Response vs. Fitted values", "Sqrt of abs(Residuals) vs. Fitted Values"), panel = if (add.smooth) function(x, y, ...) panel.smooth(x, y, iter = iter.smooth, ...) else points, sub.caption = NULL, main = "", ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, qqline = TRUE, add.smooth = getOption("add.smooth"), iter.smooth = 3, label.pos = c(4, 2), cex.caption = 1, cex.oma.main = 1.25)
x |
object of class |
digits |
|
... |
additional arguments passed to the method. |
object |
object of class |
mode |
|
which |
|
hex |
|
caption |
|
panel |
panel function. The useful alternative to
|
sub.caption |
|
main |
|
ask |
|
id.n |
|
labels.id |
|
cex.id |
|
qqline |
|
add.smooth |
|
iter.smooth |
|
label.pos |
|
cex.caption |
|
cex.oma.main |
|
Package survey must be attached to the search path in order to use
the functions (see library or require).
For variance estimation (summary, vcov, and
SE) three modes are available:
"design": design-based variance estimator using
linearization; see Binder (1983)
"model": model-based weighted variance estimator
(the sampling design is ignored)
"compound": design-model-based variance
estimator; see Rubin-Bleuer and Schiopu-Kratina (2005)
and Binder and Roberts (2009)
The following utility functions are available:
summary gives a summary of the estimation
properties
plot shows diagnostic plots for the estimated
regression model
robweights extracts the robustness weights
(if available)
coef extracts the estimated regression coefficients
vcov extracts the (estimated) covariance matrix
residuals extracts the residuals
fitted extracts the fitted values
Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review 51, 279–292. doi:10.2307/1402588
Binder, D. A. and Roberts, G. (2009). Design- and Model-Based Inference for Model Parameters. In: Sample Surveys: Inference and Analysis ed. by Pfeffermann, D. and Rao, C. R. Volume 29B of Handbook of Statistics, Amsterdam: Elsevier, Chap. 24, 33–54 doi:10.1016/S0169-7161(09)00224-7
Rubin-Bleuer, S. and Schiopu-Kratina, I. (2005). On the Two-phase framework for joint model and design-based inference. The Annals of Statistics 33, 2789–2810. doi:10.1214/009053605000000651
Weighted least squares: svyreg; robust weighted regression
svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM and svyreg_tukeyGM
head(workplace) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 14) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Utility functions summary(m) coef(m) SE(m) vcov(m) residuals(m) fitted(m) robweights(m)head(workplace) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 14) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Utility functions summary(m) coef(m) SE(m) vcov(m) residuals(m) fitted(m) robweights(m)
Methods and utility functions for objects of class svystat_rob.
mse(object, ...) ## S3 method for class 'svystat_rob' mse(object, ...) ## S3 method for class 'svystat' mse(object, ...) ## S3 method for class 'svystat_rob' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' coef(object, ...) ## S3 method for class 'svystat_rob' SE(object, ...) ## S3 method for class 'svystat_rob' vcov(object, ...) ## S3 method for class 'svystat_rob' scale(x, ...) ## S3 method for class 'svystat_rob' residuals(object, ...) ## S3 method for class 'svystat_rob' fitted(object, ...) robweights(object) ## S3 method for class 'svystat_rob' robweights(object) ## S3 method for class 'svystat_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' confint(object, parm, level = 0.95, df = Inf, ...)mse(object, ...) ## S3 method for class 'svystat_rob' mse(object, ...) ## S3 method for class 'svystat' mse(object, ...) ## S3 method for class 'svystat_rob' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' coef(object, ...) ## S3 method for class 'svystat_rob' SE(object, ...) ## S3 method for class 'svystat_rob' vcov(object, ...) ## S3 method for class 'svystat_rob' scale(x, ...) ## S3 method for class 'svystat_rob' residuals(object, ...) ## S3 method for class 'svystat_rob' fitted(object, ...) robweights(object) ## S3 method for class 'svystat_rob' robweights(object) ## S3 method for class 'svystat_rob' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'svystat_rob' confint(object, parm, level = 0.95, df = Inf, ...)
df |
|
digits |
|
level |
|
object |
object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
x |
object of class |
... |
additional arguments passed to the method. |
Package survey must be attached to the search path in order to use
the functions (see library or require).
Utility functions:
mse computes the estimated risk (mean square
error) in presence of representative outliers; see also
mer
summary gives a summary of the estimation properties
robweights extracts the robustness weights
coef extracts the estimate of location
SE extracts the (estimated) standard error
vcov extracts the (estimated) covariance matrix
residuals extracts the residuals
fitted extracts the fitted values
svymean_dalen, svymean_huber,
svymean_ratio, svymean_reg,
svymean_tukey, svymean_trimmed,
svymean_winsorized
svytotal_dalen, svytotal_huber,
svytotal_ratio, svytotal_reg,
svytotal_tukey, svytotal_trimmed,
svytotal_winsorized
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated one-sided k winsorized population total (i.e., k = 2 observations # are winsorized at the top of the distribution) wtot <- svytotal_k_winsorized(~employment, dn, k = 2) # Show summary statistic of the estimated total summary(wtot) # Estimated mean square error (MSE) mse(wtot) # Estimate, std. err., variance, and the residuals coef(wtot) SE(wtot) vcov(wtot) residuals(wtot) # M-estimate of the total (Huber psi-function; tuning constant k = 3) mtot <- svytotal_huber(~employment, dn, k = 45) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(mtot), robweights(mtot))head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated one-sided k winsorized population total (i.e., k = 2 observations # are winsorized at the top of the distribution) wtot <- svytotal_k_winsorized(~employment, dn, k = 2) # Show summary statistic of the estimated total summary(wtot) # Estimated mean square error (MSE) mse(wtot) # Estimate, std. err., variance, and the residuals coef(wtot) SE(wtot) vcov(wtot) residuals(wtot) # M-estimate of the total (Huber psi-function; tuning constant k = 3) mtot <- svytotal_huber(~employment, dn, k = 45) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(mtot), robweights(mtot))
Data from a simple random sample (without replacement) of 100 of the 3141 counties in the United Stated (U.S. Bureau of the Census, 1994).
data(counties)data(counties)
A data.frame with 100 observations on the following variables:
statestate, [character].
countycounty, [character].
landarealand area, 1990 (square miles),
[double].
totpoppopulation total, 1992, [double].
unempnumber of unemployed persons, 1991,
[double].
farmpopfarm population, 1990, [double].
numfarmnumber of farms, 1987, [double].
farmacreacreage in farms, 1987, [double].
weightssampling weight, [double].
fpcfinite population corretion, [double].
The data (and 10 additional variables) are published in Lohr (1999, Appendix C).
Lohr, S. L. (1999). Sampling: Design and Analysis, Pacific Grove (CA): Duxbury Press.
head(counties) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties) }head(counties) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weights, data = counties) }
Measurement of copper content in wholemeal flour (measured in parts per million).
data(flour)data(flour)
A data.frame with 24 observations (sorted in ascending order)
on the following variables:
coppercopper content [double].
weightweight [double].
The data are published in Maronna et al. (2019, p. 2).
Maronna, R. A., Martin, R. D., Yohai, V. J. and Salibián-Barrera, M. (2019). Robust Statistics: Theory and Methods (with R), Hoboken (NJ): John Wiley and Sons, 2nd edition. doi:10.1002/9781119214656
head(flour)head(flour)
Weighted Huber Proposal 2 estimator of location and scatter.
huber2(x, w, k = 1.5, na.rm = FALSE, maxit = 50, tol = 1e-04, info = FALSE, k_Inf = 1e6, df_cor = TRUE)huber2(x, w, k = 1.5, na.rm = FALSE, maxit = 50, tol = 1e-04, info = FALSE, k_Inf = 1e6, df_cor = TRUE)
x |
|
w |
|
k |
|
na.rm |
|
maxit |
|
tol |
|
info |
|
k_Inf |
|
df_cor |
|
Function huber2 computes the weighted Huber (1964) Proposal 2
estimates of location and scale.
The method is initialized by the weighted median (location) and the weighted interquartile range (scale).
The return value depends on info:
info = FALSE:estimate of mean or total [double]
info = TRUE:a [list] with items:
characteristic [character],
estimator [character],
estimate [double],
variance (default: NA),
robust [list],
residuals [numeric vector],
model [list],
design (default: NA),
[call]
The huber2 estimator is initialized by the weighted median and
the weighted (scaled) interquartile range. For unweighted data, this
estimator differs from hubers in MASS,
which is initialized by mad.
The difference between the estimators is usually negligible (for
sufficiently small values of tol). See examples.
Huber, P. J. (1964). Robust Estimation of a Location Parameter. Annals of Mathematical Statistics 35, 73–101. doi:10.1214/aoms/1177703732
head(workplace) # Weighted "Proposal 2" estimator of the mean huber2(workplace$employment, workplace$weight, k = 8) # More information on the estimate, i.e., info = TRUE m <- huber2(workplace$employment, workplace$weight, k = 8, info = TRUE) # Estimate of scale m$scale # Comparison with MASS::hubers (without weights). We make a copy of MASS::hubers library(MASS) hubers_mod <- hubers # Then we replace mad by the (scaled) IQR as initial scale estimator body(hubers_mod)[[7]][[3]][[2]] <- substitute(s0 <- IQR(y, type = 2) * 0.7413) # Define the numerical tolerance TOLERANCE <- 1e-8 # Comparison m1 <- huber2(workplace$payroll, rep(1, 142), tol = TOLERANCE) m2 <- hubers_mod(workplace$payroll, tol = TOLERANCE)$mu m1 / m2 - 1 # The absolute relative difference is < 4.0-09 (smaller than TOLERANCE)head(workplace) # Weighted "Proposal 2" estimator of the mean huber2(workplace$employment, workplace$weight, k = 8) # More information on the estimate, i.e., info = TRUE m <- huber2(workplace$employment, workplace$weight, k = 8, info = TRUE) # Estimate of scale m$scale # Comparison with MASS::hubers (without weights). We make a copy of MASS::hubers library(MASS) hubers_mod <- hubers # Then we replace mad by the (scaled) IQR as initial scale estimator body(hubers_mod)[[7]][[3]][[2]] <- substitute(s0 <- IQR(y, type = 2) * 0.7413) # Define the numerical tolerance TOLERANCE <- 1e-8 # Comparison m1 <- huber2(workplace$payroll, rep(1, 142), tol = TOLERANCE) m2 <- hubers_mod(workplace$payroll, tol = TOLERANCE)$mu m1 / m2 - 1 # The absolute relative difference is < 4.0-09 (smaller than TOLERANCE)
A simple random sample of 70 patients in inpatient hospital treatment.
data(losdata)data(losdata)
A data.frame with data on the following variables:
loslength of stay (days) [integer].
weightsampling weight [double].
fpcfinite population correction [double].
The losdata are a simple random sample without replacement (SRSWOR)
of size patients from the (fictive) population of
patients in inpatient hospital treatment. We have
constructed the losdata as a showcase; though, the LOS
measurements are real data that we have taken from the 201 observations
in Ruffieux et al. (2000). The original LOS data of Ruffieux et al.
(2000) are available in the R package robustbase; see
robustbase::data(los). Our losdata are a SRSWOR of
size from the 201 original observations.
Ruffieux et al. (2000) and data.frame los in the R
package robustbase.
Ruffieux, C., Paccaud, F. and Marazzi, A. (2000). Comparing rules for truncating hospital length of stay. Casemix Quarterly 2.
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) }head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) }
mer is an adaptive M-estimator of the weighted mean or total. It
is defined as the estimator that minimizes the estimated mean square error,
mse, of the estimator under consideration.
mer(object, verbose = TRUE, max_k = 10, init = 1, method = "Brent", optim_args = list())mer(object, verbose = TRUE, max_k = 10, init = 1, method = "Brent", optim_args = list())
object |
an object of class |
verbose |
|
init |
|
method |
|
max_k |
|
optim_args |
|
Package survey must be attached to the search path in order to use
the functions (see library or require).
MER-estimators are available for the methods svymean_huber,
svytotal_huber, svymean_tukey and
svytotal_tukey.
Object of class svystat_rob
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) } # M-estimator of the total with tuning constant k = 8 m <- svymean_huber(~los, dn, type = "rhj", k = 8) # MER estimator mer(m)head(losdata) library(survey) # Survey design for simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata, calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~1, fpc = ~fpc, weights = ~weight, data = losdata) } # M-estimator of the total with tuning constant k = 8 m <- svymean_huber(~los, dn, type = "rhj", k = 8) # MER estimator mer(m)
Probability-proportional-to-size sample (PPS) without replacement of municipalities from the MU284 population in Särndal et al. (1992). The sample inclusion probabilities are proportional to the population size in 1975 (variable P75).
data(MU284pps)data(MU284pps)
A data.frame with 60 observations on the following variables:
LABELidentifier variable, [integer].
P851985 population size (in thousands),
[double].
P751975 population size (in thousands),
[double].
RMT85Revenues from the 1985 municipal taxation
(in millions of kronor), [double].
CS82number of Conservative seats in municipal council,
[double].
SS82number of Social-Democrat seats in municipal
council (1982), [double].
S82total number of seats in municipal council (1982),
[double].
ME84number of municipal employees in 1984,
[double].
REV84real estate values according to 1984 assessment
(in millions of kronor), [double].
REGgeographic region indicator, [integer].
CLcluster indicator (a cluster consists of a set of
neighbouring municipalities), [integer].
weightssampling weights, [double].
pisample inclusion probability, [double].
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284 population data
are available in the sampling package of Tillé and Matei (2021).
The data frame MU284pps is a probability-proportional-to-size
sample (PPS) without replacement from the MU284 population.
The sample inclusion probabilities are proportional to the
population size in 1975 (variable P75). The sample has been
selected by Brewer’s method; see Tillé (2006, Chap. 7).
The sampling weight (inclusion probabilities) are calibrated to
the population size and the population total of P75.
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
Tillé, Y. (2006). Sampling Algorithms. New York: Springer-Verlag.
head(MU284pps) library(survey) # Survey design with inclusion probabilities proportional to size dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer", calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer") }head(MU284pps) library(survey) # Survey design with inclusion probabilities proportional to size dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer", calibrate.formula = ~1) } else { # legacy mode svydesign(ids = ~LABEL, fpc = ~pi, data = MU284pps, pps = "brewer") }
Stratified simple random sample (without replacement) of municipalities from the MU284 population in Särndal et al. (1992). Stratification is by geographic region and a take-all stratum (by 1975 population size), which includes the big cities Stockholm, Göteborg, and Malmö.
data(MU284strat)data(MU284strat)
A data.frame with 60 observations on the following variables:
LABELidentifier variable, [integer].
P851985 population size (in thousands),
[double].
P751975 population size (in thousands),
[double].
RMT85Revenues from the 1985 municipal taxation
(in millions of kronor), [double].
CS82number of Conservative seats in municipal council,
[double].
SS82number of Social-Democrat seats in municipal
council (1982), [double].
S82total number of seats in municipal council (1982),
[double].
ME84number of municipal employees in 1984,
[double].
REV84real estate values according to 1984 assessment
(in millions of kronor), [double].
CLcluster indicator (a cluster consists of a set of
neighbouring municipalities), [integer].
REGgeographic region indicator, [integer].
Stratumstratum indicator, [integer].
weightssampling weights, [double].
fpcfinite population correction, [double].
The MU284 population of Särndal et al. (1992, Appendix B) is a
dataset with observations on the 284 municipalities in Sweden in the
late 1970s and early 1980s. The MU284 population data
are available in the sampling package of Tillé and Matei (2021).
The population is divided into two parts based on 1975 population
size (P75):
the MU281 population, which consists of the 281 smallest municipalities;
the MU3 population of the three biggest municipalities/ cities in Sweden (Stockholm, Göteborg, and Malmö).
The three biggest cities take exceedingly large values (representative
outliers) on almost all of the variables. To account for this, a stratified
sample has been drawn from the MU284 population using a take-all stratum.
The sample data, MU284strat, (of size ) consists of
a stratified simple random sample (without replacement)
from the MU281 population, where stratification is by
geographic region (REG) with proportional sample
size allocation;
a take-all stratum that includes the three biggest cities/ municipalities (population M3).
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
Tillé, Y. and Matei, A. (2021). sampling: Survey Sampling. R package version 2.9. https://CRAN.R-project.org/package=sampling
head(MU284strat) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat, calibrate.formula = ~-1 + Stratum) } else { # legacy mode svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat) }head(MU284strat) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat, calibrate.formula = ~-1 + Stratum) } else { # legacy mode svydesign(ids = ~LABEL, strata = ~Stratum, fpc = ~fpc, weights = ~weights, data = MU284strat) }
Methods to compute the first-order sample inclusion probabilities (given a measure of size) and sampling mechanisms to draw samples with probabilities proportional to size (pps).
pps_probabilities(size, n) pps_draw(x, method = "brewer", sort = TRUE) ## S3 method for class 'prob_pps' print(x, ...)pps_probabilities(size, n) pps_draw(x, method = "brewer", sort = TRUE) ## S3 method for class 'prob_pps' print(x, ...)
size |
|
n |
|
x |
object of class |
method |
|
sort |
|
... |
additional arguments. |
Function pps_probabilities computes the first-order sample inclusion
probabilities for a given sample size n; see e.g., Särndal et
al., 1992 (p. 90). The probabilities (and additional attributes) are
returned as a vector, more precisely as an object of class prob_pps.
To get the probabilities, use the function as.numeric() on the
object.
For an object of class prob_pps (inclusion probabilities and
additional attributes), function pps_draw draws a pps sample
without replacement and returns the indexes of the population elements.
Only the method of Brewer (1963, 1975) is currently implemented.
Function pps_probabilities returns the probabilities (an object
of class prob_pps).
Function pps_draw returns a pps sample of indexes from the
population elements.
Brewer, K. W. R. (1963). A Model of Systematic Sampling with Unequal Probabilities. Australian Journal of Statistics 5, 93–105. doi:10.1111/j.1467-842X.1963.tb00132.x
Brewer, K. W. R. (1975). A simple procedure for pswor,
Australian Journal of Statistics 17, 166–172.
doi:10.1111/j.1467-842X.1975.tb00954.x
Särndal, C.-E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer-Verlag.
# We are going to pretend that the workplace sample is our population. head(workplace) # The population size is N = 142. We want to draw a pps sample (without # replacement) of size n = 10, where the variable employment is the measure of # size. The first-order sample inclusion probabilities are calculated as # follows p <- pps_probabilities(workplace$employment, n = 10) # Extract the probabilities as.numeric(p) # Now, we draw a pps sample using Brewer's method. pps_draw(p, method = "brewer")# We are going to pretend that the workplace sample is our population. head(workplace) # The population size is N = 142. We want to draw a pps sample (without # replacement) of size n = 10, where the variable employment is the measure of # size. The first-order sample inclusion probabilities are calculated as # follows p <- pps_probabilities(workplace$employment, n = 10) # Extract the probabilities as.numeric(p) # Now, we draw a pps sample using Brewer's method. pps_draw(p, method = "brewer")
These functions are provided for compatibility with older versions of the package only, and may be defunct as soon as the next release.
Use instead:
Internal function to call the robust survey regression GM-estimator; this function is only intended for internal use. The function does not check or validate the arguments. In particular, missing values in the data may make the function crash.
robsvyreg(x, y, w, k, psi, type, xwgt, var = NULL, verbose = TRUE, ...) svyreg_control(tol = 1e-5, maxit = 100, k_Inf = 1e6, init = NULL, mad_center = TRUE, ...)robsvyreg(x, y, w, k, psi, type, xwgt, var = NULL, verbose = TRUE, ...) svyreg_control(tol = 1e-5, maxit = 100, k_Inf = 1e6, init = NULL, mad_center = TRUE, ...)
x |
|
y |
|
w |
|
k |
|
psi |
|
type |
|
xwgt |
|
var |
|
verbose |
|
tol |
|
maxit |
|
k_Inf |
|
init |
either |
mad_center |
|
... |
additional arguments passed to the method
(see |
Not documented
[list]
Dalen's estimators Z2 and Z3 of the population mean and total; see
weighted_mean_dalen for further details.
svymean_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...) svytotal_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...)svymean_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...) svytotal_dalen(x, design, censoring, type = "Z2", na.rm = FALSE, verbose = TRUE, ...)
x |
a one-sided |
design |
an object of class |
censoring |
|
type |
|
na.rm |
|
verbose |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
type = "Z2" or type = "Z3"; see
weighted_mean_dalen for more details.
See weighted_mean_dalen and
weighted_total_dalen.
Object of class svystat_rob
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
Overview (of all implemented functions)
svymean_trimmed, svytotal_trimmed,
svymean_winsorized, svytotal_winsorized,
svymean_huber and svytotal_huber
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Dalen's estimator Z3 of the population total svytotal_dalen(~employment, dn, censoring = 20000, type = "Z3") # Dalen's estimator Z3 of the population mean m <- svymean_dalen(~employment, dn, censoring = 20000, type = "Z3") # Summarize summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Dalen's estimator Z3 of the population total svytotal_dalen(~employment, dn, censoring = 20000, type = "Z3") # Dalen's estimator Z3 of the population mean m <- svymean_dalen(~employment, dn, censoring = 20000, type = "Z3") # Summarize summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m)
Robust ratio predictor (M-estimator) of the population mean and total with Huber and Tukey biweight (bisquare) psi-function.
svytotal_ratio(object, total, variance = "wu", keep_object = TRUE, ...) svymean_ratio(object, total, N = NULL, variance = "wu", keep_object = TRUE, N_unknown = FALSE, ...)svytotal_ratio(object, total, variance = "wu", keep_object = TRUE, ...) svymean_ratio(object, total, N = NULL, variance = "wu", keep_object = TRUE, N_unknown = FALSE, ...)
object |
an object of class |
total |
|
N |
|
variance |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
The (robust) ratio predictor of the population total or mean is computed in two steps.
Step 1: Fit the ratio model associated with the predictor
by one of the functions svyratio_huber
or svyratio_tukey. The fitted model is called
object.
Step 2: Based on the fitted model obtained in the first step,
we predict the population total and mean, respectively, by
the predictors svytotal_ratio and svymean_ratio,
where object is the fitted ratio model.
Two types of auxiliary variables are distinguished: (1)
population size and (2) the population total of the
auxiliary variable (denominator) used in the ratio model.
The option N_unknown = TRUE can be used in the predictor
of the population mean if is unknown.
Three variance estimators are implemented (argument
variance): "base", "wu", and "hajek".
These estimators correspond to the estimators v0, v1,
and v2 in Wu (1982).
The return value is an object of class svystat_rob.
Thus, the utility functions summary,
coef,
SE,
vcov,
residuals,
fitted, and
robweights are available.
Object of class svystat_rob
Wu, C.-F. (1982). Estimation of Variance of the Ratio Estimator. Biometrika 69, 183–189.
Overview (of all implemented functions)
svymean_reg and svytotal_reg for (robust) GREG
regression predictors
svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM and svyreg_tukeyGM for robust
regression - and -estimators
svymean_huber, svytotal_huber,
svymean_tukey and svytotal_tukey for
-estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust ratio M-estimator with Huber psi-function rat <- svyratio_huber(~payroll, ~ employment, dn, k = 5) # Summary of the ratio estimate summary(rat) # Diagnostic plots of the ration/regression M-estimate (e.g., # standardized residuals against fitted values) plot(rat, which = 1L) # Plot of the robustness weights of the ratio/regression M-estimate # against its residuals plot(residuals(rat), robweights(rat)) # Robust ratio predictor of the population mean m <- svymean_ratio(rat, total = 1001233, N = 90840) m # Summary of the ratio estimate of the population mean summary(m) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust ratio M-estimator with Huber psi-function rat <- svyratio_huber(~payroll, ~ employment, dn, k = 5) # Summary of the ratio estimate summary(rat) # Diagnostic plots of the ration/regression M-estimate (e.g., # standardized residuals against fitted values) plot(rat, which = 1L) # Plot of the robustness weights of the ratio/regression M-estimate # against its residuals plot(residuals(rat), robweights(rat)) # Robust ratio predictor of the population mean m <- svymean_ratio(rat, total = 1001233, N = 90840) m # Summary of the ratio estimate of the population mean summary(m) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
Generalized regression estimator (GREG) predictor of the mean and total,
and robust GREG -estimator predictor
svytotal_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, ...) svymean_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, N_unknown = FALSE, ...)svytotal_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, ...) svymean_reg(object, totals, N = NULL, type, k = NULL, check.names = TRUE, keep_object = TRUE, N_unknown = FALSE, ...)
object |
an object of class |
totals |
|
N |
|
type |
|
k |
|
check.names |
|
keep_object |
|
N_unknown |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
The (robust) GREG predictor of the population total or mean is computed in two steps.
Step 1: Fit the regression model associated with the GREG
predictor by one of the functions svyreg,
svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM or svyreg_tukeyGM.
The fitted model is called object.
Step 2: Based on the fitted model obtained in the first step,
we predict the population total and mean, respectively, by
the predictors svytotal_reg and svymean_reg,
where object is the fitted regression model.
The following GREG predictors are available:
k = NULL)The following non-robust GREG predictors are available:
type = "projective" ignores the bias correction
term of the GREG predictor; see Särndal and Wright (1984).
type = "ADU" is the "standard" GREG,
which is an asymptotically design unbiased (ADU)
predictor; see Särndal et al.(1992, Chapter 6).
If the fitted regression model (object) does include
a regression intercept, the predictor types "projective"
and "ADU" are identical because the bias correction
of the GREG is zero by design.
The following robust GREG predictors are available:
type = "huber" and type = "tukey" are,
respectively, the robust GREG predictors with Huber
and Tukey bisquare (biweight) psi-function. The tuning
constant must satisfy 0 < k <= Inf.
We can use the Huber-type GREG predictor although the
model has been fitted by the regression estimator
with Tukey psi-function (and vice versa).
type = "BR" is the bias-corrected robust GREG
predictor of Beaumont and Rivest (2009), which is
inspired by the bias-corrected robust predictor of
Chambers (1986). The tuning constant must satisfy
0 < k <= Inf.
type = "lee" is the bias-corrected predictor
of Lee (1991; 1992). Tthe tuning constant k must
satisfy 0 <= k <= 1.
type = "duchesne" is the bias-corrected,
calibration-type estimator/ predictor of Duchesne (1999).
The tuning constant k must be specified as a
vector k = c(a, b), where a and b
are the tuning constants of Duchesne's modified Huber
psi-function (default values: a = 9 and
b = 0.25).
Two types of auxiliary variables are distinguished: (1)
population size and (2) population totals of the
auxiliary variables used in the regression model (i.e.,
non-constant explanatory variables).
The option N_unknown = TRUE can be used in the predictor
of the population mean if is unknown.
The names of the entries of totals are checked against
the names of the regression fit (object), unless we specify
check.names = FALSE.
The return value is an object of class svystat_rob.
Thus, the utility functions summary,
coef,
SE,
vcov,
residuals,
fitted, and
robweights are available.
Object of class svystat_rob
Beaumont, J.-F. and Rivest, L.-P. (2009). Dealing with outliers in survey data. In: Sample Surveys: Theory, Methods and Inference ed. by Pfeffermann, D. and Rao, C. R. Volume 29A of Handbook of Statistics, Amsterdam: Elsevier, Chap. 11, 247–280. doi:10.1016/S0169-7161(08)00011-4
Chambers, R. (1986). Outlier Robust Finite Population Estimation. Journal of the American Statistical Association 81, 1063–1069. doi:10.1080/01621459.1986.10478374
Duchesne, P. (1999). Robust calibration estimators, Survey Methodology 25, 43–56.
Gwet, J.-P. and Rivest, L.-P. (1992). Outlier Resistant Alternatives to the Ratio Estimator. Journal of the American Statistical Association 87, 1174–1182. doi:10.1080/01621459.1992.10476275
Lee, H. (1991). Model-Based Estimators That Are Robust to Outliers, in Proceedings of the 1991 Annual Research Conference, Bureau of the Census, 178–202. Washington, DC, Department of Commerce.
Lee, H. (1995). Outliers in business surveys. In: Business survey methods ed. by Cox, B. G., Binder, D. A., Chinnappa, B. N., Christianson, A., Colledge, M. J. and Kott, P. S. New York: John Wiley and Sons, Chap. 26, 503–526. doi:10.1002/9781118150504.ch26
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, New York: Springer.
Särndal, C.-E. and Wright, R. L. (1984). Cosmetic Form of Estimators in Survey Sampling. Scandinavian Journal of Statistics 11, 146–156.
Overview (of all implemented functions)
svymean_ratio and svytotal_ratio for (robust)
ratio predictors
svymean_huber, svytotal_huber,
svymean_tukey and svytotal_tukey for
-estimators
svyreg, svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM and svyreg_tukeyGM for robust
regression - and -estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust regression M-estimator with Huber psi-function reg <- svyreg_huberM(payroll ~ employment, dn, k = 3) # Summary of the regression M-estimate summary(reg) # Diagnostic plots of the regression M-estimate (e.g., standardized # residuals against fitted values) plot(reg, which = 1L) # Plot of the robustness weights of the regression M-estimate against # its residuals plot(residuals(reg), robweights(reg)) # ADU (asymptotically design unbiased) estimator m <- svytotal_reg(reg, totals = 1001233, 90840, type = "ADU") m # Robust GREG estimator of the mean; the population means of the auxiliary # variables are from a register m <- svymean_reg(reg, totals = 1001233, 90840, type = "huber", k = 20) m # Summary of the robust GREG estimate summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m) # Approximation of the estimated mean square error mse(m)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust regression M-estimator with Huber psi-function reg <- svyreg_huberM(payroll ~ employment, dn, k = 3) # Summary of the regression M-estimate summary(reg) # Diagnostic plots of the regression M-estimate (e.g., standardized # residuals against fitted values) plot(reg, which = 1L) # Plot of the robustness weights of the regression M-estimate against # its residuals plot(residuals(reg), robweights(reg)) # ADU (asymptotically design unbiased) estimator m <- svytotal_reg(reg, totals = 1001233, 90840, type = "ADU") m # Robust GREG estimator of the mean; the population means of the auxiliary # variables are from a register m <- svymean_reg(reg, totals = 1001233, 90840, type = "huber", k = 20) m # Summary of the robust GREG estimate summary(m) # Extract estimate coef(m) # Extract estimated standard error SE(m) # Approximation of the estimated mean square error mse(m)
Weighted trimmed population mean and total.
svymean_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...) svytotal_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)svymean_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...) svytotal_trimmed(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, ...)
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
Population mean or total. Let denote
the estimated trimmed population mean; then, the estimated trimmed
total is given by with
, where
summation is over all observations in the sample.
The methods trims the LB
of the smallest observations and the (1 - UB)
of the largest observations from the data.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994).
Object of class svystat_rob
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
Overview (of all implemented functions)
weighted_mean_trimmed and weighted_total_trimmed
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated trimmed population total (5% symmetric trimming) svytotal_trimmed(~employment, dn, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) svymean_trimmed(~employment, dn, UB = 0.95)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated trimmed population total (5% symmetric trimming) svytotal_trimmed(~employment, dn, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) svymean_trimmed(~employment, dn, UB = 0.95)
Weighted winsorized mean and total
svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...) svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)svymean_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svymean_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...) svytotal_winsorized(x, design, LB = 0.05, UB = 1 - LB, na.rm = FALSE, trim_var = FALSE, ...) svytotal_k_winsorized(x, design, k, na.rm = FALSE, trim_var = FALSE, ...)
x |
a one-sided |
design |
an object of class |
LB |
|
UB |
|
na.rm |
|
trim_var |
|
k |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
Population mean or total. Let
denote the estimated winsorized population mean; then, the
estimated winsorized total is given by
with
, where summation
is over all observations in the sample.
The amount of winsorization can be specified in relative or absolute terms:
Relative: By specifying LB and UB,
the method winsorizes the LB
of the smallest observations and the
(1 - UB) of the largest
observations from the data.
Absolute: By specifying argument k in the
functions with the "infix" _k_ in their name (e.g.,
svymean_k_winsorized), the
largest observations are winsorized, ,
where denotes the sample size. E.g., k = 2
implies that the largest and the second largest observation
are winsorized.
Large-sample approximation based on the influence function; see Huber and Ronchetti (2009, Chap. 3.3) and Shao (1994). Two estimators are available:
simple_var = FALSEVariance estimator of
the winsorized mean/ total. The estimator depends on
the estimated probability density function evaluated at
the winsorization thresholds, which can be – depending
on the context – numerically unstable. As a remedy,
a simplified variance estimator is available by
setting simple_var = TRUE.
simple_var = TRUEVariance is approximated using the variance estimator of the trimmed mean/ total.
See:
Object of class svystat_rob
Huber, P. J. and Ronchetti, E. (2009). Robust Statistics, New York: John Wiley and Sons, 2nd edition. doi:10.1002/9780470434697
Shao, J. (1994). L-Statistics in Complex Survey Problems. The Annals of Statistics 22, 976–967. doi:10.1214/aos/1176325505
Overview (of all implemented functions)
weighted_mean_winsorized,
weighted_mean_k_winsorized,
weighted_total_winsorized and
weighted_total_k_winsorized
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated winsorized population mean (5% symmetric winsorization) svymean_winsorized(~employment, dn, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) svytotal_k_winsorized(~employment, dn, k = 2)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Estimated winsorized population mean (5% symmetric winsorization) svymean_winsorized(~employment, dn, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) svytotal_k_winsorized(~employment, dn, k = 2)
Weighted Huber and Tukey M-estimator of the population mean and total (robust Horvitz-Thompson estimator)
svymean_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svytotal_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svymean_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...) svytotal_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)svymean_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svytotal_huber(x, design, k, type = "rwm", asym = FALSE, na.rm = FALSE, verbose = TRUE, ...) svymean_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...) svytotal_tukey(x, design, k, type = "rwm", na.rm = FALSE, verbose = TRUE, ...)
x |
a one-sided |
design |
an object of class |
k |
|
type |
|
asym |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library or require).
type = "rht" or type = "rwm"; see
weighted_mean_huber or
weighted_mean_tukey for more details.
Taylor linearization (residual variance estimator).
See weighted_mean_huber
weighted_mean_tukey,
weighted_total_huber, and
weighted_total_tukey.
Object of class svystat_rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control.
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust Horvitz-Thompson M-estimator of the population total svytotal_huber(~employment, dn, k = 9, type = "rht") # Robust weighted M-estimator of the population mean m <- svymean_huber(~employment, dn, k = 12, type = "rwm") # Summary statistic summary(m) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Robust Horvitz-Thompson M-estimator of the population total svytotal_huber(~employment, dn, k = 9, type = "rht") # Robust weighted M-estimator of the population mean m <- svymean_huber(~employment, dn, k = 12, type = "rwm") # Summary statistic summary(m) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m)) # Extract estimate coef(m) # Extract estimate of scale scale(m) # Extract estimated standard error SE(m)
svyratio_huber and svyratio_tukey compute the robust
-estimator of the ratio of two variables with, respectively,
Huber and Tukey biweight (bisquare) psi-function.
svyratio_huber(numerator, denominator, design, k, var = denominator, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyratio_tukey(numerator, denominator, design, k, var = denominator, na.rm = FALSE, verbose = TRUE, ...)svyratio_huber(numerator, denominator, design, k, var = denominator, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyratio_tukey(numerator, denominator, design, k, var = denominator, na.rm = FALSE, verbose = TRUE, ...)
numerator |
a one-sided |
denominator |
a one-sided |
design |
an object of class |
k |
|
var |
a |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library or require).
The functions svyratio_huber and svyratio_tukey are
implemented as wrapper functions of the regression estimators
svyreg_huberM and svyreg_tukeyM. See
the help files of these functions (e.g., on how additional
parameters can be passed via ... or on the usage of the
var argument).
Object of class svyreg.rob and ratio
Overview (of all implemented functions)
summary, coef,
residuals, fitted,
SE and vcov
plot for regression diagnostic plot methods
svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM and svyreg_tukeyGM for robust
regression estimators
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyratio_huber(~payroll, ~employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract estimated standard error SE(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyratio_huber(~payroll, ~employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract estimated standard error SE(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
Weighted least squares estimator of regression
svyreg(formula, design, var = NULL, na.rm = FALSE, ...)svyreg(formula, design, var = NULL, na.rm = FALSE, ...)
formula |
a |
design |
an object of class |
var |
a one-sided |
na.rm |
|
... |
additional arguments (currently not used). |
Package survey must be attached to the search path in order to use
the functions (see library or require).
svyreg computes the regression coefficients by weighted least
squares.
Models for svyreg_rob are specified symbolically. A typical
model has the form response ~ terms where response is
the (numeric) response vector and terms is a series of terms
which specifies a linear predictor for response; see
formula and lm.
A formula has an implied intercept term. To remove this use either
y ~ x - 1 or y ~ 0 + x; see formula for more
details of allowed formulae.
Object of class svyreg_rob.
Overview (of all implemented functions)
summary, coef,
residuals, fitted,
SE and vcov
plot for regression diagnostic plot methods
Robust estimating methods svyreg_huberM,
svyreg_huberGM, svyreg_tukeyM and
svyreg_tukeyGM.
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute the regression estimate (weighted least squares) m <- svyreg(payroll ~ employment, dn) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., Normal Q-Q-plot) plot(m, which = 2L)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute the regression estimate (weighted least squares) m <- svyreg(payroll ~ employment, dn) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., Normal Q-Q-plot) plot(m, which = 2L)
The function svyreg_huber is deprecated; use instead
svyreg_huberM.
svyreg_huber(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)svyreg_huber(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
See svyreg_huberM.
Object of class svyreg.rob
svyreg_huberM and svyreg_huberGM compute, respectively,
a survey weighted M- and GM-estimator of regression using
the Huber psi-function.
svyreg_huberM(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyreg_huberGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)svyreg_huberM(formula, design, k, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...) svyreg_huberGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, asym = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
asym |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library or require).
svyreg_huberM and svyreg_huberGM compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted
least squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted interquartile
range (IQR).
The regression M-estimator svyreg_huberM is robust against
residual outliers (granted that the tuning constant k is
chosen appropriately).
Function svyreg_huberGM implements the Mallows and Schweppe
regression GM-estimator (see argument type).
The regression GM-estimators are robust against residual outliers
and outliers in the model's design space (leverage
observations; see argument xwgt).
See svyreg_control.
Models for svyreg_rob are specified symbolically. A typical
model has the form response ~ terms, where response
is the (numeric) response vector and terms is a series of
terms which specifies a linear predictor for response; see
formula and lm.
A formula has an implied intercept term. To remove this use
either y ~ x - 1 or y ~ 0 + x; see
formula for more details of allowed formulae.
Object of class svyreg.rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control.
Overview (of all implemented functions)
Overview (of all implemented functions)
summary, coef,
residuals, fitted,
SE and vcov
plot for regression diagnostic plot methods
Other robust estimating methods svyreg_tukeyM and
svyreg_tukeyGM
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Huber psi-function m <- svyreg_huberM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
The function svyreg_tukey is deprecated; use instead
svyreg_tukeyM.
svyreg_tukey(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...)svyreg_tukey(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g., |
See svyreg_tukeyM.
Object of class svyreg.rob
svyreg_tukeyM and svyreg_tukeyGM compute, respectively,
a survey weighted M- and GM-estimator of regression using
the biweight Tukey psi-function.
svyreg_tukeyM(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...) svyreg_tukeyGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, verbose = TRUE, ...)svyreg_tukeyM(formula, design, k, var = NULL, na.rm = FALSE, verbose = TRUE, ...) svyreg_tukeyGM(formula, design, k, type = c("Mallows", "Schweppe"), xwgt, var = NULL, na.rm = FALSE, verbose = TRUE, ...)
formula |
a |
design |
an object of class |
k |
|
var |
a one-sided |
na.rm |
|
verbose |
|
type |
|
xwgt |
|
... |
additional arguments passed to the method (e.g., |
Package survey must be attached to the search path in order to use
the functions (see library or require).
svyreg_tukeyM and svyreg_tukeyGM compute, respectively,
M- and GM-estimates of regression by iteratively re-weighted least
squares (IRWLS). The estimate of regression scale is (by default)
computed as the (normalized) weighted median of absolute deviations
from the weighted median (MAD; see weighted_mad) for
each IRWLS iteration. If the weighted MAD is zero (or nearly so),
the scale is computed as the (normalized) weighted
interquartile range (IQR).
The regression M-estimator svyreg_tukeyM is robust against
residual outliers (granted that the tuning constant k is
chosen appropriately).
Function svyreg_huberGM implements the Mallows and Schweppe
regression GM-estimator (see argument type).
The regression GM-estimators are robust against residual outliers
and outliers in the model's design space (leverage
observations; see argument xwgt).
See svyreg_control.
Models for svyreg_rob are specified symbolically. A typical
model has the form response ~ terms, where response
is the (numeric) response vector and terms is a series of
terms which specifies a linear predictor for response; see
formula and lm.
A formula has an implied intercept term. To remove this use
either y ~ x - 1 or y ~ 0 + x; see
formula for more details of allowed formulae.
Object of class svyreg.rob
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control.
Overview (of all implemented functions)
summary, coef,
residuals, fitted,
SE and vcov
plot for regression diagnostic plot methods.
Other robust estimating methods svyreg_huberM and
svyreg_huberGM
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Tukey bisquare psi-function m <- svyreg_tukeyM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } # Compute regression M-estimate with Tukey bisquare psi-function m <- svyreg_tukeyM(payroll ~ employment, dn, k = 8) # Regression inference summary(m) # Extract the coefficients coef(m) # Extract variance/ covariance matrix vcov(m) # Diagnostic plots (e.g., standardized residuals against fitted values) plot(m, which = 1L) # Plot of the robustness weights of the M-estimate against its residuals plot(residuals(m), robweights(m))
Weighted five-number summary used for survey.design and
survey.design2 objects (similar to base::summary
for [numeric vectors]).
svysummary(object, design, na.rm = FALSE, ...)svysummary(object, design, na.rm = FALSE, ...)
object |
one-sided |
design |
an object of class |
na.rm |
|
... |
additional arguments. |
A weighted five-number summary (numeric variable) or a frequency table (factor variable).
head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } svysummary(~payroll, dn)head(workplace) library(survey) # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) } svysummary(~payroll, dn)
Weighted (normalized) interquartile range
weighted_IQR(x, w, na.rm = FALSE, constant = 0.7413)weighted_IQR(x, w, na.rm = FALSE, constant = 0.7413)
x |
|
w |
|
na.rm |
|
constant |
|
By default, the weighted IQR is normalized to be an unbiased estimate of
scale at the Gaussian core model. If normalization is not wanted, put
constant = 1.
Weighted IQR
Overview (of all implemented functions)
head(workplace) # normalized weighted IQR (default) weighted_IQR(workplace$employment, workplace$weight) # weighted IQR (without normalization) weighted_IQR(workplace$employment, workplace$weight, constant = 1)head(workplace) # normalized weighted IQR (default) weighted_IQR(workplace$employment, workplace$weight) # weighted IQR (without normalization) weighted_IQR(workplace$employment, workplace$weight, constant = 1)
weighted_line fits a robust line and allows weights.
weighted_line(x, y = NULL, w, na.rm = FALSE, iter = 1) ## S3 method for class 'medline' print(x, ...) ## S3 method for class 'medline' coef(object, ...) ## S3 method for class 'medline' residuals(object, ...) ## S3 method for class 'medline' fitted(object, ...)weighted_line(x, y = NULL, w, na.rm = FALSE, iter = 1) ## S3 method for class 'medline' print(x, ...) ## S3 method for class 'medline' coef(object, ...) ## S3 method for class 'medline' residuals(object, ...) ## S3 method for class 'medline' fitted(object, ...)
x |
|
y |
|
w |
|
na.rm |
|
iter |
|
object |
object of class |
... |
additional arguments passed to the method. |
weighted_line uses different quantiles for splitting the sample
than stats::line().
intercept and slope of the fitted line
Overview (of all implemented functions)
head(cars) # compute weighted line weighted_line(cars$speed, cars$dist, w = rep(1, length(cars$speed))) m <- weighted_line(cars$speed, cars$dist, w = rep(1:10, each = 5)) m coef(m) residuals(m) fitted(m)head(cars) # compute weighted line weighted_line(cars$speed, cars$dist, w = rep(1, length(cars$speed))) m <- weighted_line(cars$speed, cars$dist, w = rep(1:10, each = 5)) m coef(m) residuals(m) fitted(m)
Weighted median of the absolute deviations from the weighted median
weighted_mad(x, w, na.rm = FALSE, constant = 1.482602)weighted_mad(x, w, na.rm = FALSE, constant = 1.482602)
x |
|
w |
|
na.rm |
|
constant |
|
The weighted MAD is computed as the (normalized) weighted median of the
absolute deviation from the weighted median; see
weighted_median. The weighted MAD is normalized to be
an unbiased estimate of scale at the Gaussian core model. If
normalization is not wanted, put constant = 1.
Weighted median absolute deviation from the (weighted) median
Overview (of all implemented functions)
head(workplace) # normalized weighted MAD (default) weighted_mad(workplace$employment, workplace$weight) # weighted MAD (without normalization) weighted_mad(workplace$employment, workplace$weight, constant = 1)head(workplace) # normalized weighted MAD (default) weighted_mad(workplace$employment, workplace$weight) # weighted MAD (without normalization) weighted_mad(workplace$employment, workplace$weight, constant = 1)
Weighted total and mean (Horvitz-Thompson and Hajek estimators)
weighted_mean(x, w, na.rm = FALSE) weighted_total(x, w, na.rm = FALSE)weighted_mean(x, w, na.rm = FALSE) weighted_total(x, w, na.rm = FALSE)
x |
|
w |
|
na.rm |
|
weighted_total and weighted_mean compute, respectively,
the Horvitz-Thompson estimator of the population total and the Hajek
estimator of the population mean.
Estimated population mean or total
Overview (of all implemented functions)
head(workplace) # Horvitz-Thompson estimator of the total weighted_total(workplace$employment, workplace$weight) # Hajek estimator of the mean weighted_mean(workplace$employment, workplace$weight)head(workplace) # Horvitz-Thompson estimator of the total weighted_total(workplace$employment, workplace$weight) # Hajek estimator of the mean weighted_mean(workplace$employment, workplace$weight)
Dalén's estimators of the population mean and the population total (bare-bone functions with limited functionality).
weighted_mean_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE) weighted_total_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE)weighted_mean_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE) weighted_total_dalen(x, w, censoring, type = "Z2", info = FALSE, na.rm = FALSE, verbose = TRUE)
x |
|
w |
|
censoring |
|
type |
|
info |
|
na.rm |
|
verbose |
|
Let denote the expansion
estimator of the -total (summation is over all elements
in sample ). The estimators Z2 and Z3 of Dalén (1987) are
defined as follows.
The estimator Z2 of the population total sums over
; hence, it
censors the products to the
censoring constant (censoring). The estimator of
the population -mean is is defined as the total divided
by the population size.
The estimator Z3 of the population total is defined as the sum
over the elements , which is equal to
if and
otherwise.
The return value depends on info:
info = FALSE:estimate of mean or total [double]
info = TRUE:a [list] with items:
characteristic [character],
estimator [character],
estimate [double],
variance (default: NA),
robust [list],
residuals [numeric vector],
model [list],
design (default: NA),
[call]
Dalén, J. (1987). Practical Estimators of a Population Total Which Reduce the Impact of Large Observations. R & D Report U/STM 1987:32, Statistics Sweden, Stockholm.
Overview (of all implemented functions)
head(workplace) # Dalen's estimator of the total (with censoring threshold: 100000) weighted_total_dalen(workplace$employment, workplace$weight, 100000)head(workplace) # Dalen's estimator of the total (with censoring threshold: 100000) weighted_total_dalen(workplace$employment, workplace$weight, 100000)
Weighted trimmed mean and total (bare-bone functions with limited
functionality; see svymean_trimmed and
svytotal_trimmed for more capable methods)
weighted_mean_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE)weighted_mean_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_trimmed(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE)
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
Population mean or total. Let denote
the estimated trimmed population mean; then, the estimated trimmed
population total is given by with
, where
summation is over all observations in the sample.
The methods trims the LB
of the smallest observations and the
(1 - UB) of the largest observations
from the data.
See survey methods:
The return value depends on info:
info = FALSE:estimate of mean or total [double]
info = TRUE:a [list] with items:
characteristic [character],
estimator [character],
estimate [double],
variance (default: NA),
robust [list],
residuals [numeric vector],
model [list],
design (default: NA),
[call]
Overview (of all implemented functions)
svymean_trimmed and svytotal_trimmed
head(workplace) # Estimated trimmed population total (5% symmetric trimming) weighted_total_trimmed(workplace$employment, workplace$weight, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) weighted_mean_trimmed(workplace$employment, workplace$weight, UB = 0.95)head(workplace) # Estimated trimmed population total (5% symmetric trimming) weighted_total_trimmed(workplace$employment, workplace$weight, LB = 0.05, UB = 0.95) # Estimated trimmed population mean (5% trimming at the top of the distr.) weighted_mean_trimmed(workplace$employment, workplace$weight, UB = 0.95)
Weighted winsorized mean and total (bare-bone functions with limited
functionality; see svymean_winsorized and
svytotal_winsorized for more capable methods)
weighted_mean_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_mean_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE) weighted_total_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)weighted_mean_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_mean_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE) weighted_total_winsorized(x, w, LB = 0.05, UB = 1 - LB, info = FALSE, na.rm = FALSE) weighted_total_k_winsorized(x, w, k, info = FALSE, na.rm = FALSE)
x |
|
w |
|
LB |
|
UB |
|
info |
|
na.rm |
|
k |
|
Population mean or total. Let
denote the estimated winsorized population mean; then, the
estimated population total is given by
with , where summation
is over all observations in the sample.
The amount of winsorization can be specified in relative or absolute terms:
Relative: By specifying LB and UB,
the methods winsorizes the LB
of the smallest observations and the
(1 - UB) of the largest
observations from the data.
Absolute: By specifying argument k in the
functions with the "infix" _k_ in their name, the
largest observations are winsorized, ,
where denotes the sample size. E.g., k = 2
implies that the largest and the second largest
observation are winsorized.
See survey methods:
The return value depends on info:
info = FALSE:estimate of mean or total [double]
info = TRUE:a [list] with items:
characteristic [character],
estimator [character],
estimate [double],
variance (default: NA),
robust [list],
residuals [numeric vector],
model [list],
design (default: NA),
[call]
Overview (of all implemented functions)
svymean_winsorized, svymean_k_winsorized,
svytotal_winsorized and svytotal_k_winsorized
head(workplace) # Estimated winsorized population mean (5% symmetric winsorization) weighted_mean_winsorized(workplace$employment, workplace$weight, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) weighted_total_k_winsorized(workplace$employment, workplace$weight, k = 2)head(workplace) # Estimated winsorized population mean (5% symmetric winsorization) weighted_mean_winsorized(workplace$employment, workplace$weight, LB = 0.05) # Estimated one-sided k winsorized population total (2 observations are # winsorized at the top of the distribution) weighted_total_k_winsorized(workplace$employment, workplace$weight, k = 2)
Weighted population median.
weighted_median(x, w, na.rm = FALSE)weighted_median(x, w, na.rm = FALSE)
x |
|
w |
|
na.rm |
|
Weighted sample median; see weighted_quantile for more
information.
Weighted estimate of the population median
Overview (of all implemented functions)
head(workplace) weighted_median(workplace$employment, workplace$weight)head(workplace) weighted_median(workplace$employment, workplace$weight)
Robust simple linear regression based on medians: two methods are available:
"slopes" and "product".
weighted_median_line(x, y = NULL, w, type = "slopes", na.rm = FALSE)weighted_median_line(x, y = NULL, w, type = "slopes", na.rm = FALSE)
x |
|
y |
|
w |
|
type |
|
na.rm |
|
Robust simple linear regression based on medians
Two methods/ types are available. Let
denote the weighted median of variable x with weights
w:
type = "slopes":The slope is computed as
type = "products":The slope is computed as
A vector with two components: intercept and slope
Overview (of all implemented functions)
line, weighted_line and
weighted_median_ratio
x <- c(1, 2, 4, 5) y <- c(3, 2, 7, 4) weighted_line(y ~ x, w = rep(1, length(x))) weighted_median_line(y ~ x, w = rep(1, length(x))) m <- weighted_median_line(y ~ x, w = rep(1, length(x)), type = "prod") m coef(m) fitted(m) residuals(m) # cars data head(cars) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)))) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)), type = "prod")) # weighted w <- c(rep(1,20), rep(2,20), rep(5, 10)) with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in y cars$dist[49] <- 360 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in x data(cars) cars$speed[49] <- 72 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))x <- c(1, 2, 4, 5) y <- c(3, 2, 7, 4) weighted_line(y ~ x, w = rep(1, length(x))) weighted_median_line(y ~ x, w = rep(1, length(x))) m <- weighted_median_line(y ~ x, w = rep(1, length(x)), type = "prod") m coef(m) fitted(m) residuals(m) # cars data head(cars) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)))) with(cars, weighted_median_line(dist ~ speed, w = rep(1, length(dist)), type = "prod")) # weighted w <- c(rep(1,20), rep(2,20), rep(5, 10)) with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in y cars$dist[49] <- 360 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod")) # outlier in x data(cars) cars$speed[49] <- 72 with(cars, weighted_median_line(dist ~ speed, w = w)) with(cars, weighted_median_line(dist ~ speed, w = w, type = "prod"))
A weighted median of the ratios y/x determines the slope of a regression through the origin.
weighted_median_ratio(x, y = NULL, w, na.rm = FALSE)weighted_median_ratio(x, y = NULL, w, na.rm = FALSE)
x |
|
y |
|
w |
|
na.rm |
|
A vector with two components: intercept and slope
Overview (of all implemented functions)
line, weighted_line and
weighted_median_line
x <- c(1,2,4,5) y <- c(1,0,5,2) m <- weighted_median_ratio(y ~ x, w = rep(1, length(y))) m coef(m) fitted(m) residuals(m)x <- c(1,2,4,5) y <- c(1,0,5,2) m <- weighted_median_ratio(y ~ x, w = rep(1, length(y))) m coef(m) fitted(m) residuals(m)
Weighted population quantile.
weighted_quantile(x, w, probs, na.rm = FALSE)weighted_quantile(x, w, probs, na.rm = FALSE)
x |
|
w |
|
probs |
|
na.rm |
|
weighted_quantile computes the weighted
sample quantiles; argument probs allows vector inputs.
The function is based on a weighted version of the quickselect/Find algorithm with the Bentley and McIlroy (1993) 3-way partitioning scheme. For very small arrays, we use insertion sort.
For equal weighting, i.e., when all elements in
w are equal, weighted_quantile is identical with
type = 2 of stats::quantile; see also
Hyndman and Fan (1996).
Weighted estimate of the population quantiles
Bentley, J. L. and McIlroy, D. M. (1993). Engineering a Sort Function, Software - Practice and Experience 23, 1249–1265. doi:10.1002/spe.4380231105
Hyndman, R.J. and Fan, Y. (1996). Sample Quantiles in Statistical Packages, The American Statistician 50, 361–365. doi:10.1080/00031305.1996.10473566
Overview (of all implemented functions)
head(workplace) # Weighted 25% quantile (1st quartile) weighted_quantile(workplace$employment, workplace$weight, 0.25)head(workplace) # Weighted 25% quantile (1st quartile) weighted_quantile(workplace$employment, workplace$weight, 0.25)
Weighted Huber and Tukey M-estimator of the mean and total
(bare-bone function with limited functionality; see
svymean_huber, svymean_tukey,
svytotal_huber, and svytotal_tukey for more
capable methods)
weighted_mean_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_mean_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...)weighted_mean_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_huber(x, w, k, type = "rwm", asym = FALSE, info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_mean_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...) weighted_total_tukey(x, w, k, type = "rwm", info = FALSE, na.rm = FALSE, verbose = TRUE, ...)
x |
|
w |
|
k |
|
type |
|
asym |
|
info |
|
na.rm |
|
verbose |
|
... |
additional arguments passed to the method (e.g.,
|
Population mean or total. Let
denote the estimated population mean; then, the estimated
total is given by with
, where
summation is over all observations in the sample.
Two methods/types are available for estimating the
location :
type = "rwm" (default):robust weighted
M-estimator of the population mean and total,
respectively. This estimator is recommended for sampling
designs whose inclusion probabilities are not
proportional to some measure of size. [Legacy note: In an
earlier version, the method type = "rwm" was called
"rhj"; the type "rhj" is now silently
converted to "rwm"]
type = "rht":robust Horvitz-Thompson M-estimator of the population mean and total, respectively. This estimator is recommended for proportional-to-size sampling designs.
See the related but more capable functions:
By default, the Huber or Tukey
psi-function are used in the specification of the M-estimators. For
the Huber estimator, an asymmetric version of the Huber
psi-function can be used by setting the argument
asym = TRUE in the function call.
The return value depends on info:
info = FALSE:estimate of mean or total [double]
info = TRUE:a [list] with items:
characteristic [character],
estimator [character],
estimate [double],
variance (default: NA),
robust [list],
residuals [numeric vector],
model [list],
design (default: NA),
[call]
By default, the method assumes a maximum number of maxit = 100
iterations and a numerical tolerance criterion to stop the iterations of
tol = 1e-05. If the algorithm fails to converge, you may
consider changing the default values; see svyreg_control.
Hulliger, B. (1995). Outlier Robust Horvitz-Thompson Estimators. Survey Methodology 21, 79–87.
Overview (of all implemented functions)
head(workplace) # Robust Horvitz-Thompson M-estimator of the population total weighted_total_huber(workplace$employment, workplace$weight, k = 9, type = "rht") # Robust weighted M-estimator of the population mean weighted_mean_huber(workplace$employment, workplace$weight, k = 12, type = "rwm")head(workplace) # Robust Horvitz-Thompson M-estimator of the population total weighted_total_huber(workplace$employment, workplace$weight, k = 9, type = "rht") # Robust weighted M-estimator of the population mean weighted_mean_huber(workplace$employment, workplace$weight, k = 12, type = "rwm")
Weight functions associated with the Huber and the Tukey biweight psi-functions; and the weight function of Simpson et al. (1992) for GM-estimators.
huberWgt(x, k = 1.345) tukeyWgt(x, k = 4.685) simpsonWgt(x, a, b)huberWgt(x, k = 1.345) tukeyWgt(x, k = 4.685) simpsonWgt(x, a, b)
x |
|
k |
|
a |
|
b |
|
The functions huberWgt and tukeyWgt return the weights
associated with the respective psi-function.
The function simpsonWgt is used (in regression GM-estimators)
to downweight leverage observations (i.e., outliers in the model's design
space). Let denote the (robust) squared Mahalanobis
distance of the i-th observation. The Simpson et al. (1992) type of
weight is defined as
, where
a and b are tuning constants.
By default, a = 1; this choice implies that the weights
are computed on the basis of the robust Mahalanobis distances.
Alternative: a = Inf implies a weight of zero for all
observations whose (robust) squared Mahalanobis is larger than
b.
The tuning constants b is a threshold on the distances.
Numerical vector of weights
Simpson, D. G., Ruppert, D. and Carroll, R.J. (1992). On One-Step GM Estimates and Stability of Inferences in Linear Regression. Journal of the American Statistical Association 87, 439–450. doi:10.2307/2290275
Overview (of all implemented functions)
svyreg_huberM, svyreg_huberGM,
svyreg_tukeyM and svyreg_tukeyGM
head(flour) # standardized distance from median (copper content in wholemeal flour) x <- flour$copper z <- abs(x - median(x)) / mad(x) # plot of weight functions vs. distance plot(z, huberWgt(z, k = 3), ylim = c(0, 1), xlab = "distance", ylab = "weight") points(z, tukeyWgt(z, k = 6), pch = 2, col = 2) points(z, simpsonWgt(z, a = Inf, b = 3), pch = 3, col = 4) legend("topright", c("huberWgt(k = 3)", "tukeyWgt(k = 6)", "simpsonWgt(a = Inf, b = 3)"), pch = 1:3, col = c(1, 2, 4))head(flour) # standardized distance from median (copper content in wholemeal flour) x <- flour$copper z <- abs(x - median(x)) / mad(x) # plot of weight functions vs. distance plot(z, huberWgt(z, k = 3), ylim = c(0, 1), xlab = "distance", ylab = "weight") points(z, tukeyWgt(z, k = 6), pch = 2, col = 2) points(z, simpsonWgt(z, a = Inf, b = 3), pch = 3, col = 4) legend("topright", c("huberWgt(k = 3)", "tukeyWgt(k = 6)", "simpsonWgt(a = Inf, b = 3)"), pch = 1:3, col = c(1, 2, 4))
The function flags observations that fall within the tolerance interval. Observations that fall outside the interval are regarded as (potential) outliers.
within_tolerance(x, w, method = c("quartile", "modified", "boxplot"), constants, lambda = 0.05, info = FALSE, boxplot_coef = 1.5)within_tolerance(x, w, method = c("quartile", "modified", "boxplot"), constants, lambda = 0.05, info = FALSE, boxplot_coef = 1.5)
x |
|
w |
|
method |
[character] one of the methods: |
constants |
|
lambda |
|
info |
|
boxplot_coef |
|
Three methods are available.
"quartile")For the quartile method, the tolerance interval is given by
with
where denotes the (weighted) median; and
are, respectively, the first and third (weighted)
quartiles. The tuning constants and
are combined into the vector , which is
available as argument constants; both constants must be
nonnegative numbers.
The quartiles are calculated using design weights.
"modified")For the modified quartile method (Lee, 1995), the tolerance
interval is given by replacing and
with, respectively,
and
The tuning constant can only take values in
the closed unit interval and is available as argument lambda.
The quartiles are calculated using design weights.
"boxplot")The tolerance interval for the boxplot method extends from the
lower whisker to the upper whisker. By default, the length of the
whiskers is set to 1.5 times the interquartile range; see argument
boxplot_coef. For more details, see
boxplot.
The quartiles, and therefore the interquartile range, are calculated using design weights.
A vector of logicals, where TRUE indicates that an observation is within
the tolerance limits and FALSE indicates a (potential) outlier.
If info = TRUE, the function prints the tolerance interval. The
endpoints of the interval can be numbers or the symbols ‘min.’ and
‘max.’, which denote the minimum and maximum values in the data,
respectively.
Lee, H. (1995). Outliers in Business Surveys, in: Cox, B. G. et al. (eds.), Business Survey Methods, p. 503–526. New York: John Wiley and Sons.
Overview (of all implemented functions)
head(workplace) attach(workplace) # Show the tolerance limits within_tolerance(payroll, weight, method = "boxplot", info = TRUE) # Observations that fall outside the tolerance limits are (potential) outliers outlier <- !within_tolerance(payroll, weight, method = "boxplot") outlier[1:10]head(workplace) attach(workplace) # Show the tolerance limits within_tolerance(payroll, weight, method = "boxplot", info = TRUE) # Observations that fall outside the tolerance limits are (potential) outliers outlier <- !within_tolerance(payroll, weight, method = "boxplot") outlier[1:10]
The workplace data are from Fuller (2009, pp. 366–367).
data(workplace)data(workplace)
A data.frame with a sample of 142 workplaces on the following
variables
IDidentifier variable [integer].
weightsampling weight [double].
employmentemployment total [double].
payrollpayroll total (1000 dollars)[double].
stratstratum identifier[integer].
fpcfinite population correction [integer].
The workplace data represent a sample of workplaces in the
retail sector in a Canadian province. The data are not those
collected by Statistics Canada, but have been generated by Fuller
(2009, Example 3.1.1) to display similar characteristics to the
original 1999 Canadian Workplace and Employee Survey (WES).
The WES target population is defined as all workplaces operating in Canada with paid employees. The sampling frame is stratified by industry, geographic region, and size (size is defined using estimated employment). A sample of workplaces has been drawn independently in each stratum using simple random sample without replacement (the stratum-specific sample sizes are determined by Neyman allocation). Several strata containing very large workplaces were sampled exhaustively; see Patak et al (1998). The original sampling weights were adjusted for nonresponse.
The original weights of WES were about 2200 for the stratum of small workplaces, about 750 for medium-sized, and about 35 for large workspaces.
The data workplace is from Table 6.3 in Fuller (2009, pp. 366–367).
Fuller, W. A. (2009). Sampling Statistics, Hoboken (NJ): John Wiley and Sons. doi:10.1002/9780470523551
Patak, Z., Hidiroglou, M. and Lavallée, P. (1998). The methodology of the Workplace and Employee Survey. Proceedings of the Survey Research Methods Section, American Statistical Association, 83–91.
head(workplace) library("survey") # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) }head(workplace) library("survey") # Survey design for stratified simple random sampling without replacement dn <- if (packageVersion("survey") >= "4.2") { # survey design with pre-calibrated weights svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { # legacy mode svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) }