The technical details on the use of differential privacy in flexible table generators are described in this appendix using the exponential mechanism with a Discrete Laplace distribution (Rinott et al., 2018). The mechanism M(.) is defined as follows: given a count a ∈ A select a count b ∈ B (where B is the range of the output b) with probability proportional to exp where Δu is the sensitivity defined as
Given the need to cap perturbations, P(M(a) = b) < δ, then for all neighboring a~a' ∈ A, if P(M(a') = b) = 0 implies |ak − bk ≤ m for all k, then M(.) satisfies (ε, δ).
Table C-1 displays examples of Discrete Laplace perturbation probability vectors when the sensitivity Δu is 1 for the internal cells of the table. For each level of ε and δ the amount of cell perturbation is shown, depending on the draw of a random uniform number, the probability of perturbation, and the cumulative vector of probabilities of perturbation. For example, for ε = 0.5, δ = 0.008, if one draws a random uniform number of 0.8, which falls between the range of 0.77 and 0.87, the cell value will be perturbed by adding 2 to the cell total. Note that if the mechanism leads to perturbations of negative counts, they can be set to zero without invalidating the property of differential privacy.
| Amount of cell perturbation | ε = 1.5, δ = 0.00002 | ε = 0.5, δ = 0.008 | ||
|---|---|---|---|---|
| Probability of perturbation | Cumulative probability | Probability of perturbation | Cumulative probability | |
| −7 | 0.00002 | 0.00002 | 0.0076 | 0.00760 |
| −6 | 0.00008 | 0.00010 | 0.0125 | 0.02010 |
| −5 | 0.00035 | 0.00045 | 0.0206 | 0.04070 |
| −4 | 0.00157 | 0.00202 | 0.0339 | 0.07460 |
| −3 | 0.00706 | 0.00908 | 0.0559 | 0.13050 |
| −2 | 0.03162 | 0.04070 | 0.0922 | 0.22270 |
| −1 | 0.14172 | 0.18242 | 0.1520 | 0.37470 |
| 0 | 0.63516 | 0.81758 | 0.2506 | 0.62530 |
| 1 | 0.14172 | 0.95930 | 0.1520 | 0.77730 |
| 2 | 0.03162 | 0.99092 | 0.0922 | 0.86950 |
| 3 | 0.00706 | 0.99798 | 0.0559 | 0.92540 |
| 4 | 0.00157 | 0.99955 | 0.0339 | 0.95930 |
| 5 | 0.00035 | 0.99990 | 0.0206 | 0.97990 |
| 6 | 0.00008 | 0.99998 | 0.0125 | 0.99240 |
| 7 | 0.00002 | 1.00000 | 0.0076 | 1.00000 |
The way to ensure a single privacy budget is to ensure the property that any time a cell is aggregated in any table, the “seed” determining the perturbation amount is fixed; that is, the “same cell-same perturbation” rule is applied. This is carried out by assigning to each individual in the microdata a random number, the microdata key. When aggregating individuals into a cell, the microdata keys are also aggregated, and this forms the seed (the “Cell-Key”) of the perturbation. Thus, the same cell will always have the same perturbation. Define the following:
Attaining cell consistency has less protection than attaining query consistency. For instance, in the extreme scenario of table differencing for explicit tables that differ by one case, attaining cell consistency has the consequence that the potential identification of the true attribute can be found because all cells but one have zero sum of weights in the implicit tables from differencing. This is why one needs to ensure the building blocks (hypercubes) as the input of the table generator, which will not allow for this extreme scenario.
When applying differential privacy to a flexible table builder, there needs to be reflection on what lower-level margins would be available. In general, if one allows four dimensions in the tables, this leads to 1 four-way table, 4 three-way margins, 6 two-way margins, and 4 one-way margins, meaning that an individual can appear multiple times in the table. This means that the sensitivity is now d = 24 − 1 = 15, thus changing the sensitivity by d. In general, this means that one needs to define an overall privacy budget ε as (or at least ensure that the different ε’s across margins add up to the overall privacy budget according to the Composition theorem). This can lead to a rather large overall privacy budget. Therefore, preliminary work needs to be undertaken as to what margins will be released, thus lowering the sensitivity of the privacy budget. Alternatively, one can direct more research on using correlated noise to ensure marginal distributions (in expectation) as evidenced in the early disclosure avoidance literature by changing from the Laplace distribution to the Normal distribution (Shlomo & De Waal, 2008) or placing the property of invariance on the perturbation vectors (Shlomo & Young, 2008).
Since the Survey of Income and Program Participation is a probability sample and has survey weights, one can adjust for the weighted counts in the tables, as shown in Shlomo et al. (2019). The perturbation p is applied to the sample counts. Then, one adds (or subtracts) p × w from the weighted sample count where w is the average weight.
For continuous variables, such as sums, averages, quantiles, and correlations, one can use the same concept of the microdata keys to obtain the same perturbations, but more research is required on how this is actually implemented in a differentially private setting.
In a non-differentially private setting, one can add multiplicative noise to the statistic by multiplying the statistic by (1 + p) and p is determined by the microdata keys where the perturbation vector is in a pre-set range, for example [−0.2, . . ., +0.2]. As an example, for a weighted sum Ŷ, one perturbs as follows: Ŷ + Ŷ × p × w (note that p can take a positive or negative value). Similarly, one can use this approach for averages, where the denominator is now ŵ + p × w (Shlomo et al., 2019).
For more advanced modeling in the remote analysis server, one can add noise p to the estimating equations; that is, instead of setting the score functions to 0 one solves them to s = p*max (residual). For a simple regression model, the solution for a perturbed regression coefficient is βPert = βOrig + (X9X)−1s (Shlomo, 2020).
This page intentionally left blank.