product of the indicator variable for the South Census region and the log
proportion Hispanic.
For further information on these variables see information
about data inputs.
Using counties in the CPS
ASEC sample
Our use of the CPS
ASEC implicitly assumes that the counties in the survey sample are
representative of those not selected. The CPS was designed so that Primary
Sampling Units (PSUs) are representative of their strata, primarily for
unemployment, but the degree to which the CPS
ASEC sample is representative for health insurance coverage is unknown.
The characteristics of some counties guarantee they are included, e.g., most
counties in large metropolitan areas and counties with large populations. More
generally, while all counties have a nonzero probability of being included in
the sample, some have higher probabilities than others. Further, the probability
of selecting a county is related to its income and poverty level which, in turn,
are related to the level of health insurance coverage. In the related Small Area
Income and Poverty Estimates (SAIPE) program, comparison of
regression equations based on census data for counties in the CPS
ASEC sample and equations based on all counties indicates remarkably
similar results, providing some assurance that the CPS
ASEC counties are largely representative of all counties for poverty.
Unfortunately, the analogous test is unavailable for health insurance coverage,
since there are no health insurance questions on the decennial census.
The survey weights used in estimation at the national level are not
appropriate for county-level estimates. The CPS
ASEC sample design selects some PSUs (usually a county or group of
counties) to represent a set of counties in the same stratum. The sum of the
weights for sample households from such a county estimates the total population
of the entire set of counties it represents. Because we want each county in the
CPS
ASEC sample to stand for itself, we have adjusted the weights to make the
direct estimate for each county approximately unbiased.
Estimation of the model equation
CPS
ASEC sampling variances are not constant over all counties. We avoid
giving observations with a great deal of uncertainty (larger variances) the same
influence on the regression as observations with less uncertainty (smaller
variances) by, in effect, weighting each observation by the inverse of its
variance. Representing this uncertainty requires recognizing that it arises from
two sources:
- uncertainty about where the estimates lie relative to the true values for
each county (sampling error), and
- uncertainty about where the true county values lie with respect to the
regression surface (lack of fit).
To estimate the two components of
variance, we model them as having different forms. We model the sampling error
variance to depend on the sample size and on the proportion insured. The
lack-of-fit component, on the other hand, is modeled as constant across all
counties. Then the components can be distinguished using our Bayesian estimation
method.
Model-based county-level estimates
The estimated insured rate from the
modeling is the posterior mean insured rate conditioned on the CPS
ASEC data. The effect of this is similar to that of the empirical Bayes
method used in the SAIPE program's
estimates. The final estimates for counties where there is no sample is the same
as the regression estimate, while the estimates for counties with lots of sample
or very high insured rates and, thus, low variance, tend to be closer to the
direct estimates.
The estimated number of insured in a county is the estimated insured rate
times an estimate of the CPS
universe. We create an estimate of the CPS universe by adjusting estimates of
the total resident population to the CPS universe by subtracting unpublished
demographic estimates of the group quarters population by age and the
appropriate type of group quarters from the estimate of the total resident
population. The number of uninsured, then, is that estimated CPS universe minus the estimated number
of insured. The reported confidence intervals are based on the posterior
standard deviation of the insured rate, conditioned on the CPS
ASEC data.
Controlling to the national CPS
ASEC estimate and forming the state-level estimates
The last steps in
the production process are controlling the county estimates to the national
CPS
ASEC estimates and forming the state-level estimates. The number of
uninsured from the model are aggregated to the state and national levels, and
the ratio of the national CPS
ASEC direct estimate to the aggregated national model-based estimate is
formed; this ratio is the raking factor. The raking factor is multiplied with
all of the county- and state-level uninsured to get the controlled numbers of
uninsured. This is subtracted from the state and county CPS
ASEC universe estimates, yielding the estimated numbers of insured.
Finally, everything is rounded to an integer.
Standard errors and confidence intervals
One goal of our small area work
is to provide measures of the uncertainty surrounding the estimates. The
model-based estimates shown in the tables are accompanied by their 90-percent
confidence intervals constructed from estimated standard errors.
We assume that the variance at the national level and the variance of the
CPS
ASEC universe estimates are negligible. The posterior standard deviations
of the aggregated state-level estimates need only be adjusted for correlations
between the counties, which is handled by the estimation procedure, and
multiplied by the raking factors. Confidence interval half-widths for estimated
numbers are rounded up to preserve coverage probabilities. Note also that the
widths of the confidence intervals are the same for the number of insured and
uninsured. This follows from the fact that the two must add up to the national
CPS
ASEC estimate which has negligible variance.
Source: U.S. Census Bureau, Housing and Household Economic
Statistics Division, Small Area Estimates Branch
Last Revised: July 21, 2005
For assistance, please contact our information line at 301-763-3242.