This 3-course training series will provide an overview of data privacy and synthetic data methods with the goal of protecting privacy while maintaining data utility. These trainings will cover the theory and concepts behind synthetic data and equip you with tools to apply these data privacy techniques to datasets. Specifically:
This training series is provided in partnership with Allegheny County and the Western Pennsylvania Regional Data Center (WPRDC). While synthetic data has been used at the federal level, local governments and organizations often do not have the human or computational resources required to implement synthetic data as a privacy-preserving technique. As part of a pilot program intended to understand and target the specific privacy-related needs of local governments, the Urban Institute is offering these trainings to any local stakeholders wishing to learn more about creation, applications, and limitations of synthetic data.
This course assumes some knowledge of general statistical concepts such as summary statistics and basic regression. No coding background is needed, but optional coding exercises will be provided in R and Python for interested users.
Day 1: Intro to Data Privacy and Data Synthesis |
Day 2: Synthetic Data Methods |
Day 3: Disclosure Risk and Utility Metrics and more Case Studies |
Day 1: Intro to Data Privacy and Data Synthesis |
Day 2: Synthetic Data Methods |
Day 3: Disclosure Risk and Utility Metrics and more Case Studies |
Unfortunately our Day 3 recording was not saved properly. So we have linked a recording from another training session we gave with mostly identical content.
In order to run the R scripts in
lessons_follow_along_code
, you will need the following R
packages installed:
remotes::install_github("UrbanInstitute/urbnthemes")
)In order to run the python scripts in
lessons_follow_along_code
, you will need the following
python modules installed: