vignettes/introducing-educationdata.Rmd
introducing-educationdata.Rmd
The educationdata
package allows the user to retrieve
data from the Urban Institute’s Education Data API as a
data.frame
for analysis. The package contains one major
function, get_education_data
, which will get data from a
specified API endpoint and return a data.frame
to the
user.
NOTE: By downloading and using this programming package, you agree to abide by the Data Policy and Terms of Use of the Education Data Portal. For more information, see https://educationdata.urban.org/documentation/#terms
The get_education_data
function will return a
data.frame
from a call to the Education Data API.
library(educationdata)
get_education_data(level, source, topic, by, filters, add_labels, csv)
where:
list
of grouping parameters
for an API call.list
query to filter the
results from an API call.FALSE
.FALSE
.This simple example will obtain ‘college-university’
level
data from the ‘ipeds’ source
for the
‘student-faculty-ratio’ topic
:
library(educationdata)
df <- get_education_data(
level = 'college-university',
source = 'ipeds',
topic = 'student-faculty-ratio'
)
head(df)
#> unitid year fips student_faculty_ratio
#> 1 100654 2009 1 14
#> 2 100663 2009 1 17
#> 3 100690 2009 1 10
#> 4 100706 2009 1 17
#> 5 100724 2009 1 17
#> 6 100751 2009 1 20
A somewhat more complex example will obtain ‘school’
level
data from the ‘ccd’ source
for the
‘enrollment’ topic
, broken out by
‘race’ and
‘sex’. The API query is subset with filters
for the ‘year’
2008, ‘grade’ 9 through 12, and a ‘ncessch’ code of 340606000122.
Finally, the add_labels
flag will map integer codes to
their factor labels (‘race’ and ‘sex’ in this instance).
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 2008,
grade = 9:12,
ncessch = '340606000122'),
add_labels = TRUE)
#> Warning in get_education_data(level = "schools", source = "ccd", topic = "enrollment", : The `by` argument has been deprecated in favor of `subtopic`.
#> Please update your script to use `subtopic` instead.
head(df)
#> year ncessch ncessch_num grade race sex
#> 1 2008 340606000122 3.40606e+11 9 Black Male
#> 2 2008 340606000122 3.40606e+11 9 Hispanic Male
#> 3 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Female
#> 4 2008 340606000122 3.40606e+11 9 American Indian or Alaska Native Male
#> 5 2008 340606000122 3.40606e+11 9 Black Female
#> 6 2008 340606000122 3.40606e+11 9 Asian Female
#> enrollment fips leaid
#> 1 41 New Jersey 3406060
#> 2 39 New Jersey 3406060
#> 3 0 New Jersey 3406060
#> 4 0 New Jersey 3406060
#> 5 46 New Jersey 3406060
#> 6 32 New Jersey 3406060
Level | Source | Topic | By | Main Filters | Years Available |
---|---|---|---|---|---|
college-university | fsa | 90-10-revenue-percentages | NA | year | 2014–2017 |
college-university | fsa | campus-based-volume | NA | year | 2001–2017 |
college-university | fsa | financial-responsibility | NA | year | 2006–2016 |
college-university | fsa | grants | NA | year | 1999–2018 |
college-university | fsa | loans | NA | year | 1999–2018 |
college-university | ipeds | academic-libraries | NA | year | 2013–2020 |
college-university | ipeds | academic-year-room-board-other | NA | year | 1999–2021 |
college-university | ipeds | academic-year-tuition-prof-program | NA | year | 1986–2008, 2010–2021 |
college-university | ipeds | academic-year-tuition | NA | year | 1986–2021 |
college-university | ipeds | admissions-enrollment | NA | year | 2001–2021 |
college-university | ipeds | admissions-requirements | NA | year | 1990–2021 |
college-university | ipeds | completers | NA | year | 2011–2021 |
college-university | ipeds | completions-cip-2 | NA | year | 1991–2021 |
college-university | ipeds | completions-cip-6 | NA | year | 1983–2021 |
college-university | ipeds | directory | NA | year | 1980, 1984–2021 |
college-university | ipeds | enrollment-full-time-equivalent | NA | year, level_of_study | 1997–2018 |
college-university | ipeds | enrollment-headcount | NA | year, level_of_study | 1996–2021 |
college-university | ipeds | fall-enrollment | age, sex | year, level_of_study | 1991, 1993, 1995, 1997, 1999–2020 |
college-university | ipeds | fall-enrollment | race, sex | year, level_of_study | 1986–2020 |
college-university | ipeds | fall-enrollment | residence | year | 1986, 1988, 1992, 1994, 1996, 1998, 2000–2020 |
college-university | ipeds | fall-retention | NA | year | 2003–2020 |
college-university | ipeds | finance | NA | year | 1979, 1983–2017 |
college-university | ipeds | grad-rates-200pct | NA | year | 2007–2017 |
college-university | ipeds | grad-rates-pell | NA | year | 2015–2017 |
college-university | ipeds | grad-rates | NA | year | 1996–2017 |
college-university | ipeds | institutional-characteristics | NA | year | 1980, 1984–2021 |
college-university | ipeds | outcome-measures | NA | year | 2015–2020 |
college-university | ipeds | program-year-room-board-other | NA | year | 1999–2021 |
college-university | ipeds | program-year-tuition-cip | NA | year | 1987–2021 |
college-university | ipeds | salaries-instructional-staff | NA | year | 1980, 1984, 1985, 1987, 1989–1999, 2001–2021 |
college-university | ipeds | salaries-noninstructional-staff | NA | year | 2012–2021 |
college-university | ipeds | sfa-all-undergraduates | NA | year | 2007–2017 |
college-university | ipeds | sfa-by-living-arrangement | NA | year | 2008–2017 |
college-university | ipeds | sfa-by-tuition-type | NA | year | 1999–2017 |
college-university | ipeds | sfa-ftft | NA | year | 1999–2017 |
college-university | ipeds | sfa-grants-and-net-price | NA | year | 2008–2017 |
college-university | ipeds | student-faculty-ratio | NA | year | 2009–2020 |
college-university | nacubo | endowments | NA | year | 2012–2021 |
college-university | nccs | 990-forms | NA | year | 1993–2016 |
college-university | nhgis | census-1990 | NA | year | 1980, 1984–2021 |
college-university | nhgis | census-2000 | NA | year | 1980, 1984–2021 |
college-university | nhgis | census-2010 | NA | year | 1980, 1984–2021 |
college-university | scorecard | default | NA | year | 1996–2020 |
college-university | scorecard | earnings | NA | year | 2003–2014, 2018 |
college-university | scorecard | institutional-characteristics | NA | year | 1996–2020 |
college-university | scorecard | repayment | NA | year | 2007–2016 |
college-university | scorecard | student-characteristics | aid-applicants | year | 1997–2016 |
college-university | scorecard | student-characteristics | home-neighborhood | year | 1997–2016 |
school-districts | ccd | directory | NA | year | 1986–2021 |
school-districts | ccd | enrollment | NA | year, grade | 1986–2021 |
school-districts | ccd | enrollment | race | year, grade | 1986–2021 |
school-districts | ccd | enrollment | race, sex | year, grade | 1986–2021 |
school-districts | ccd | enrollment | sex | year, grade | 1986–2021 |
school-districts | ccd | finance | NA | year | 1991, 1994–2018 |
school-districts | edfacts | assessments | NA | year, grade_edfacts | 2009–2018, 2020 |
school-districts | edfacts | assessments | race | year, grade_edfacts | 2009–2018, 2020 |
school-districts | edfacts | assessments | sex | year, grade_edfacts | 2009–2018, 2020 |
school-districts | edfacts | assessments | special-populations | year, grade_edfacts | 2009–2018, 2020 |
school-districts | edfacts | grad-rates | NA | year | 2010–2019 |
school-districts | saipe | NA | NA | year | 1995, 1997, 1999–2021 |
schools | ccd | directory | NA | year | 1986–2021 |
schools | ccd | enrollment | NA | year, grade | 1986–2021 |
schools | ccd | enrollment | race | year, grade | 1986–2021 |
schools | ccd | enrollment | race, sex | year, grade | 1986–2021 |
schools | ccd | enrollment | sex | year, grade | 1986–2021 |
schools | crdc | algebra1 | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | algebra1 | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | algebra1 | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-exams | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-exams | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-exams | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-ib-enrollment | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-ib-enrollment | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | ap-ib-enrollment | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | chronic-absenteeism | disability, sex | year | 2013, 2015 |
schools | crdc | chronic-absenteeism | lep, sex | year | 2013, 2015 |
schools | crdc | chronic-absenteeism | race, sex | year | 2013, 2015 |
schools | crdc | credit-recovery | NA | year | 2015, 2017 |
schools | crdc | directory | NA | year | 2011, 2013, 2015, 2017 |
schools | crdc | discipline-instances | NA | year | 2015, 2017 |
schools | crdc | discipline | disability, lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | discipline | disability, race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | discipline | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | dual-enrollment | disability, sex | year | 2015, 2017 |
schools | crdc | dual-enrollment | lep, sex | year | 2015, 2017 |
schools | crdc | dual-enrollment | race, sex | year | 2015, 2017 |
schools | crdc | enrollment | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | enrollment | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | enrollment | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | harassment-or-bullying | allegations | year | 2013, 2015, 2017 |
schools | crdc | harassment-or-bullying | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | harassment-or-bullying | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | harassment-or-bullying | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | math-and-science | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | math-and-science | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | math-and-science | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | offenses | NA | year | 2015, 2017 |
schools | crdc | offerings | NA | year | 2011, 2013, 2015, 2017 |
schools | crdc | restraint-and-seclusion | disability, lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | restraint-and-seclusion | disability, race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | restraint-and-seclusion | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | restraint-and-seclusion | instances | year | 2013, 2015, 2017 |
schools | crdc | retention | disability, sex | year, grade | 2011, 2013, 2015, 2017 |
schools | crdc | retention | lep, sex | year, grade | 2011, 2013, 2015, 2017 |
schools | crdc | retention | race, sex | year, grade | 2011, 2013, 2015, 2017 |
schools | crdc | sat-act-participation | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | sat-act-participation | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | sat-act-participation | race, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | school-finance | NA | year | 2011, 2013, 2015, 2017 |
schools | crdc | suspensions-days | disability, sex | year | 2015, 2017 |
schools | crdc | suspensions-days | lep, sex | year | 2015, 2017 |
schools | crdc | suspensions-days | race, sex | year | 2015, 2017 |
schools | crdc | teachers-staff | NA | year | 2011, 2013, 2015, 2017 |
schools | edfacts | assessments | NA | year, grade_edfacts | 2009–2018, 2020 |
schools | edfacts | assessments | race | year, grade_edfacts | 2009–2018, 2020 |
schools | edfacts | assessments | sex | year, grade_edfacts | 2009–2018, 2020 |
schools | edfacts | assessments | special-populations | year, grade_edfacts | 2009–2018, 2020 |
schools | edfacts | grad-rates | NA | year | 2010–2019 |
schools | meps | NA | NA | year | 2013–2018 |
schools | nhgis | census-1990 | NA | year | 1986–2021 |
schools | nhgis | census-2000 | NA | year | 1986–2021 |
schools | nhgis | census-2010 | NA | year | 1986–2021 |
Due to the way the API is set-up, the variables listed within ‘main filters’ are often the fastest way to subset an API call.
In addition to year
, the other main filters for certain
endpoints accept the following values:
Filter Argument | Grade |
---|---|
grade = 'grade-pk' |
Pre-K |
grade = 'grade-k' |
Kindergarten |
grade = 'grade-1' |
Grade 1 |
grade = 'grade-2' |
Grade 2 |
grade = 'grade-3' |
Grade 3 |
grade = 'grade-4' |
Grade 4 |
grade = 'grade-5' |
Grade 5 |
grade = 'grade-6' |
Grade 6 |
grade = 'grade-7' |
Grade 7 |
grade = 'grade-8' |
Grade 8 |
grade = 'grade-9' |
Grade 9 |
grade = 'grade-10' |
Grade 10 |
grade = 'grade-11' |
Grade 11 |
grade = 'grade-12' |
Grade 12 |
grade = 'grade-13' |
Grade 13 |
grade = 'grade-14' |
Adult Education |
grade = 'grade-15' |
Ungraded |
grade = 'grade-16' |
K-12 |
grade = 'grade-20' |
Grades 7 and 8 |
grade = 'grade-21' |
Grade 9 and 10 |
grade = 'grade-22' |
Grades 11 and 12 |
grade = 'grade-99' |
Total |
Let’s build up some examples, from the following set of endpoints.
Level | Source | Topic | By | Main Filters | Years Available |
---|---|---|---|---|---|
schools | ccd | enrollment | NA | year, grade | 1986–2021 |
schools | ccd | enrollment | race | year, grade | 1986–2021 |
schools | ccd | enrollment | race, sex | year, grade | 1986–2021 |
schools | ccd | enrollment | sex | year, grade | 1986–2021 |
schools | crdc | enrollment | disability, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | enrollment | lep, sex | year | 2011, 2013, 2015, 2017 |
schools | crdc | enrollment | race, sex | year | 2011, 2013, 2015, 2017 |
NA | NA | NA | NULL | NULL | NA |
The following will return a data.frame
across all years
and grades:
library(educationdata)
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment')
Note that this endpoint is also callable by
certain
variables:
These variables can be added to the by
argument:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'))
You may also filter the results of an API call. In this case
year
and grade
will provide the most
time-efficient subsets, and can be vectorized:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8))
Additional variables can also be passed to filters
to
subset further:
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'))
Finally, the add_labels
flag will map variables to a
factor
from their labels in the API.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE)
Finally, the csv
flag can be set to download the full
.csv
data frame. In general, the csv
functionality is much faster when retrieving the full data frame (or a
large subset) and much slower when retrieving a small subset of a data
frame (especially ones with a lot of filters
added). In
this example, the full csv
for 2008 must be downloaded and
then subset to the 96 observations.
df <- get_education_data(level = 'schools',
source = 'ccd',
topic = 'enrollment',
by = list('race', 'sex'),
filters = list(year = 1988:1990,
grade = 6:8,
ncessch = '010000200277'),
add_labels = TRUE,
csv = TRUE)