Dropdown items
My Academies

Personal Library

Account settings

A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation (2024)

Chapter: Appendix C: Technical Details for Differential Privacy Table Builder

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 248
Search this publication

Previous Chapter: Appendix B: Inferences Based on Multiple Synthetic Data

Page 213 Cite Bookmark

Suggested Citation: "Appendix C: Technical Details for Differential Privacy Table Builder." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

Appendix C

Technical Details for Differential Privacy Table Builder

The technical details on the use of differential privacy in flexible table generators are described in this appendix using the exponential mechanism with a Discrete Laplace distribution (Rinott et al., 2018). The mechanism M(.) is defined as follows: given a count a ∈ A select a count b ∈ B (where B is the range of the output b) with probability proportional to exp $(\frac{\frac{ε}{2} u}{Δ u})$ where Δu is the sensitivity defined as

$Δ u = \max_{b \in B} \max_{a ~ a' \in A} | u (a, b) - (a', b) | .$

Given the need to cap perturbations, P(M(a) = b) < δ, then for all neighboring a~a' ∈ A, if P(M(a') = b) = 0 implies |a_k − b_k ≤ m for all k, then M(.) satisfies (ε, δ).

Table C-1 displays examples of Discrete Laplace perturbation probability vectors when the sensitivity Δu is 1 for the internal cells of the table. For each level of ε and δ the amount of cell perturbation is shown, depending on the draw of a random uniform number, the probability of perturbation, and the cumulative vector of probabilities of perturbation. For example, for ε = 0.5, δ = 0.008, if one draws a random uniform number of 0.8, which falls between the range of 0.77 and 0.87, the cell value will be perturbed by adding 2 to the cell total. Note that if the mechanism leads to perturbations of negative counts, they can be set to zero without invalidating the property of differential privacy.

Page 214 Cite Bookmark

TABLE C-1 Examples of Two Discrete Laplace Perturbation Vectors for ε = 1.5, δ = 0.00002 and ε = 0.5, δ = 0.008

Amount of cell perturbation	ε = 1.5, δ = 0.00002		ε = 0.5, δ = 0.008
Amount of cell perturbation	Probability of perturbation	Cumulative probability	Probability of perturbation	Cumulative probability
−7	0.00002	0.00002	0.0076	0.00760
−6	0.00008	0.00010	0.0125	0.02010
−5	0.00035	0.00045	0.0206	0.04070
−4	0.00157	0.00202	0.0339	0.07460
−3	0.00706	0.00908	0.0559	0.13050
−2	0.03162	0.04070	0.0922	0.22270
−1	0.14172	0.18242	0.1520	0.37470
0	0.63516	0.81758	0.2506	0.62530
1	0.14172	0.95930	0.1520	0.77730
2	0.03162	0.99092	0.0922	0.86950
3	0.00706	0.99798	0.0559	0.92540
4	0.00157	0.99955	0.0339	0.95930
5	0.00035	0.99990	0.0206	0.97990
6	0.00008	0.99998	0.0125	0.99240
7	0.00002	1.00000	0.0076	1.00000

The way to ensure a single privacy budget is to ensure the property that any time a cell is aggregated in any table, the “seed” determining the perturbation amount is fixed; that is, the “same cell-same perturbation” rule is applied. This is carried out by assigning to each individual in the microdata a random number, the microdata key. When aggregating individuals into a cell, the microdata keys are also aggregated, and this forms the seed (the “Cell-Key”) of the perturbation. Thus, the same cell will always have the same perturbation. Define the following:

Cell consistency — Across multiple users, if the same set of records contributes to a table cell, the same results are attained. This is attained by a simple sum of the microdata key among cell members.
Query consistency — Across multiple users using the same query path (e.g., same specification for universe definition and requested table), the same results are attained. This is attained by a function of the simple sum of the microdata key for the cell, marginals associated with the cell, and the table universe.

Page 215 Cite Bookmark

Attaining cell consistency has less protection than attaining query consistency. For instance, in the extreme scenario of table differencing for explicit tables that differ by one case, attaining cell consistency has the consequence that the potential identification of the true attribute can be found because all cells but one have zero sum of weights in the implicit tables from differencing. This is why one needs to ensure the building blocks (hypercubes) as the input of the table generator, which will not allow for this extreme scenario.

When applying differential privacy to a flexible table builder, there needs to be reflection on what lower-level margins would be available. In general, if one allows four dimensions in the tables, this leads to 1 four-way table, 4 three-way margins, 6 two-way margins, and 4 one-way margins, meaning that an individual can appear multiple times in the table. This means that the sensitivity is now d = 2⁴ − 1 = 15, thus changing the sensitivity by d. In general, this means that one needs to define an overall privacy budget ε as $\frac{ε}{d}$ (or at least ensure that the different ε’s across margins add up to the overall privacy budget according to the Composition theorem). This can lead to a rather large overall privacy budget. Therefore, preliminary work needs to be undertaken as to what margins will be released, thus lowering the sensitivity of the privacy budget. Alternatively, one can direct more research on using correlated noise to ensure marginal distributions (in expectation) as evidenced in the early disclosure avoidance literature by changing from the Laplace distribution to the Normal distribution (Shlomo & De Waal, 2008) or placing the property of invariance on the perturbation vectors (Shlomo & Young, 2008).

Since the Survey of Income and Program Participation is a probability sample and has survey weights, one can adjust for the weighted counts in the tables, as shown in Shlomo et al. (2019). The perturbation p is applied to the sample counts. Then, one adds (or subtracts) p × w from the weighted sample count where w is the average weight.

For continuous variables, such as sums, averages, quantiles, and correlations, one can use the same concept of the microdata keys to obtain the same perturbations, but more research is required on how this is actually implemented in a differentially private setting.

In a non-differentially private setting, one can add multiplicative noise to the statistic by multiplying the statistic by (1 + p) and p is determined by the microdata keys where the perturbation vector is in a pre-set range, for example [−0.2, . . ., +0.2]. As an example, for a weighted sum Ŷ, one perturbs as follows: Ŷ + Ŷ × p × w (note that p can take a positive or negative value). Similarly, one can use this approach for averages, where the denominator is now ŵ + p × w (Shlomo et al., 2019).

For more advanced modeling in the remote analysis server, one can add noise p to the estimating equations; that is, instead of setting the score functions to 0 one solves them to s = p*max (residual). For a simple regression model, the solution for a perturbed regression coefficient is β^Pert = β^Orig + (X9X)⁻¹s (Shlomo, 2020).