Overview
Theget_data() function from nccsdata
downloads NCCS legacy data sets hosted on publicly accessible S3 buckets
and processes them for the user.
In this vignette, we provide several examples of how this function can be used to retrieve this legacy data.
Downloading Core Data
We can define the type of data, range of data (in years),
organization type, and form type using the arguments
dsname, time, scope.orgtype, and
scope.formtype respectively.
core <- get_data(dsname = "core",
time = "2015",
scope.orgtype = "NONPROFIT",
scope.formtype = "PZ")
#> Valid inputs detected. Retrieving data.
#> Downloading core data
#> Requested files have a total size of 115 MB. Proceed
#> with download? Enter Y/N (Yes/no/cancel)
#> Core data downloadedThe function downloads NCCS core data from the year 2015 for all non-profits that file both full 990s and 990EZs. Other possible argument values are:
-
scope.orgtype-
CHARITIES: All charities -
NONPROFIT: All nonprofits -
PRIVFOUND: All private foundations
-
-
scope.formtype-
PC: Nonprofits that file the full IRS Form 990 -
EZ: Nonprofits that file 990EZs only -
PZ: Nonprofits that file both full Form 990s and 990EZs -
PF: Private foundation filings
-
The data is available from the years 1989 to 2019.
get_data() also provides prompts with the size of the
requested data downloads.
Filtering data using NTEE codes
We can also pull only a subset of the data based on NTEE
classifications using the various ntee associated arguments
in get_data().
core_art <- get_data(dsname = "core",
time = "2015",
scope.orgtype = "NONPROFIT",
scope.formtype = "PZ",
ntee = c("ART"))
#> Valid inputs detected. Retrieving data.
#> Collecting Matching Industry Groups
#> Collecting Matching Industry Division and Subdivisions
#> Collecting Matching Organization Types
#> Downloading core data
#> Requested files have a total size of 115 MB. Proceed
#> with download? Enter Y/N (Yes/no/cancel)
#> Core data downloadedIn the above code snippet, we pull the same dataset but only select
rows belonging to nonprofits involved in the Arts, Culture and
Humanities. A full description of NTEE codes is available here.
These descriptions can also be accessed using
ntee_preview().
Filtering Data By Geography
We can subset the data by geographic units with the geo
arguments from get_data().
core_NYC <- get_data(dsname = "core",
time = "2015",
scope.orgtype = "NONPROFIT",
scope.formtype = "PZ",
geo.state = "NY",
geo.city = "New York City")
#> Valid inputs detected. Retrieving data.
#> Downloading core data
#> Requested files have a total size of 115 MB. Proceed
#> with download? Enter Y/N (Yes/no/cancel)
#> Core data downloadedThe code above returns rows belonging to Nonprofits from New York
City, NY. Additional arguments geo arguments can be used to
subset the data by county (geo.county) and region
(geo.region).
geo arguments must be used in conjunction with one
another:
-
geo.state= “IN”,geo.county= “Allen” for “Allen, IN” -
geo.state= “CA”,geo.city= “San Francisco” for “San Francisco, CA”
get_data() layers these filters to subset the data by
the desired geographic unit. Using only 1 argument will return all
geographic units that fall within it (e.g. geo.region =
“south” returns all rows from the southern states or
geo.city = “Lebanon” returns all rows belonging to cities
with the name ‘Lebanon’).
Appending BMF Data to Core Data
get_data() automatically appends NTEE metadata to the
requested data set. Appending metadata from the IRS Business Master File
(BMF) requires the downloading of an additional download of 185 MB and
can be toggled on/off with append_bmf.
corebmf <- get_data(dsname = "core",
time = "2015",
scope.orgtype = "NONPROFIT",
scope.formtype = "PZ",
append.bmf = TRUE)
#> Valid inputs detected. Retrieving data.
#> Downloading core data
#> Requested files have a total size of 305 MB. Proceed
#> with download? Enter Y/N (Yes/no/cancel)
#> Core data downloaded
#> Downloading bmf data
#> bmf data downloaded. Appending bmfBMF metadata is now appended to the downloaded Core data set.
Downloading BMF Data
The geo and ntee arguments mentioned above
can also be used to download and filter BMF data.
bmf <- get_data(dsname = "bmf",
ntee = c("ART"),
geo.state = c("CA"))
#> Valid inputs detected. Retrieving data.
#> Collecting Matching Industry Groups
#> Collecting Matching Industry Division and Subdivisions
#> Collecting Matching Organization Types
#> Requested files have a total size of 190 MB. Proceed
#> with download? Enter Y/N (Yes/no/cancel)
#> Downloading bmf data
#> bmf data downloaded