Overview
The nccsdata package provides access to nonprofit
organization data from the National Center for Charitable Statistics
(NCCS). It reads IRS Business Master File (BMF) data stored as parquet
files in a public S3 bucket, with support for efficient filtering by
state, county, NTEE subsector, and exempt organization type.
The package requires no API keys or authentication — the data is publicly accessible.
Exploring the Data Dictionary
Before querying data, you can explore the 97 available columns using
nccs_dictionary():
library(nccsdata)
# See all available columns
nccs_dictionary()
# Find geocoding-related columns
nccs_dictionary("geo")
# Find NTEE classification columns
nccs_dictionary("ntee")Discovering Filter Values
Use nccs_catalog() to see the valid values for each
filter before querying:
# NTEE v2 subsector codes
nccs_catalog("ntee_subsector")
#> [1] "ART" "EDU" "ENV" "HEL" "HMS" "HOS" "IFA" "MMB" "PSB" "REL" "UNI" "UNU"
# State and territory codes
nccs_catalog("state")
# Exempt organization types (e.g., 501(c)(3), 501(c)(4), etc.)
nccs_catalog("exempt_org_type")Reading Data
The core function is nccs_read(), which reads BMF data
from S3 with predicate-pushdown filtering for efficient reads.
Selecting Columns
The BMF parquet file contains 97 columns and is over 400 MB. By
default, nccs_read() returns a curated subset of commonly
needed columns. You can customize this:
Lazy Evaluation
Set collect = FALSE to get a lazy Arrow query instead of
a tibble. This is useful for building custom dplyr chains before
collecting:
Summarizing
nccs_summary() produces grouped count summaries:
pa <- nccs_read(state = "PA")
# Total count
nccs_summary(pa)
# Count by county
nccs_summary(pa, group_by = "geo_county")
# Count by county and NTEE subsector
nccs_summary(pa, group_by = c("geo_county", "nteev2_subsector"))Saving Results
Write summary results to CSV:
pa <- nccs_read(
state = "PA",
county = c("Lackawanna County", "Luzerne County", "Wayne County")
)
nccs_summary(
pa,
group_by = c("geo_county", "nteev2_subsector"),
output_csv = "nepa_nonprofit_counts.csv"
)