Maximum Spanning Tree (MST)

MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. “Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

Before using MST, install Private-PGM :

pip install git+https://github.com/ryan112358/private-pgm.git

And call like this:

import pandas as pd
from snsynth import Synthesizer

pums = pd.read_csv("PUMS.csv")

synth = Synthesizer.create("mst", epsilon=3.0, verbose=True)
synth.fit(pums, preprocessor_eps=1.0)
pums_synth = synth.sample(1000)

For more, see the sample notebook

Parameters

class snsynth.mst.mst.MSTSynthesizer(epsilon=0.1, delta=1e-09, *ignore, verbose=False)[source]

Maximum Spanning Tree synthesizer, uses Private PGM.

Parameters
  • epsilon (float) – privacy budget for the synthesizer

  • delta (float) – privacy parameter. Should be small, in the range of 1/(n * sqrt(n))

  • verbose (bool) – print diagnostic information during processing

Reuses code and modifies it lightly from https://github.com/ryan112358/private-pgm/blob/master/mechanisms/mst.py. Awesome work McKenna et. al!