CORE Data Pipeline Guide
1 CORE Data Pipeline Guide
The nccs-data-core pipeline produces NCCS’s CORE Series: harmonized panels of select Form 990 / 990-EZ / 990-PF fields, built from IRS Statistics of Income (SOI) extracts (2012-present) and raw legacy NCCS files (1989-2011).
Note
TODO: Overview, audience, and how to navigate this book.
1.1 Outputs
Per (tax_year, form) CSV plus a per-output data dictionary and quality report:
990— full 990 schedule, 990 filers only.990ez— full 990-EZ schedule, current-only (no pre-2012 source).990pf— full 990-PF schedule, private foundations.990combined— 990 + 990-EZ stacked on the 53 shared harmonized columns.
1.2 Where to start
- New contributor → Developer Guide
- Looking up a column → Output Schema or Crosswalks
- Running the pipeline → Configuration and EC2 Batch Processing