Maximum Spanning Tree (MST)#

MST achieves state of the art results for marginals over categorical data, and does well even with small source data. From McKenna et al. “Winning the NIST Contest: A scalable and general approach to differentially private synthetic data

Before using MST, install Private-PGM :

pip install git+

And call like this:

import pandas as pd
from snsynth import Synthesizer

pums = pd.read_csv("PUMS.csv")

synth = Synthesizer.create("mst", epsilon=3.0, verbose=True), preprocessor_eps=1.0)
pums_synth = synth.sample(1000)

For more, see the sample notebook


class snsynth.mst.mst.MSTSynthesizer(epsilon=0.1, delta=1e-09, *ignore, verbose=False)[source]#

Maximum Spanning Tree synthesizer, uses Private PGM.

  • epsilon (float) – privacy budget for the synthesizer

  • delta (float) – privacy parameter. Should be small, in the range of 1/(n * sqrt(n))

  • verbose (bool) – print diagnostic information during processing

Reuses code and modifies it lightly from Awesome work McKenna et. al!