4  Transform Reference

5 Transform Functions

This chapter documents all 24 transform functions in the BMF pipeline. Each function follows a consistent pattern:

  1. Input validation
  2. Safe copy of input data
  3. Transformation logic
  4. Output validation
  5. Return transformed data.table

5.1 Function Signature Pattern

transform_<field> <- function(
  dt,                    # data.table with required columns
  input_col = "COLUMN",  # Source column name
  lookup = <lookup_table> # Reference table (if applicable)
)

6 Identity Transforms

6.1 transform_ein()

File: R/ein.R

Formats EIN (Employer Identification Number) to standard XX-XXXXXXX format.

transform_ein(dt, input_col = "EIN")
Parameter Type Default Description
dt data.table required BMF data with EIN column
input_col character “EIN” Source column name

Output Columns:

Column Type Description
ein_raw character Original EIN value
ein character Formatted as XX-XXXXXXX

Validation: - Removes non-numeric characters - Left-pads to 9 digits - Warns if format mismatch after processing

Format Note: Some data providers use an EIN-XX-XXXXXXX format with an explicit prefix, but this pipeline uses the XX-XXXXXXX format to align with IRS conventions as displayed on Form 990 and official determination letters. This format is also the de facto standard in the nonprofit research ecosystem (used by NCCS, Foundation Center/Candid, and ProPublica), which facilitates cross-dataset linkage without requiring prefix manipulation.


6.2 transform_organization_name()

File: R/organization_name.R

Cleans and standardizes organization names, extracting legal suffixes.

transform_organization_name(dt, input_col = "NAME", name_lookup_path = NULL)
Parameter Type Default Description
dt data.table required BMF data with NAME column
input_col character “NAME” Source column name
name_lookup_path character NULL Optional path to manual overrides

Output Columns:

Column Type Description
org_name_raw character Original organization name
org_name_join character Cleaned name for matching/joining
org_name_display character Title-cased display name
org_parent_name character Parent organization name (if matched via GEN or abbreviation)
org_legal_suffix character Extracted suffix (INC, CORP, LLC, etc.)

Processing Steps:

  1. Extract legal suffixes using pattern matching
  2. Apply word standardizations (Assn → Association, etc.)
  3. Apply parent organization lookup (matches abbreviations or GEN to parent orgs)
  4. Apply manual overrides (if lookup provided)
  5. Title-case display names

6.3 transform_bmf_ico_name()

File: R/ico_name.R

The In-Care-Of (ICO) field identifies an individual or entity designated to receive correspondence on behalf of the exempt organization. Per IRS instructions for Form 990, this field is used when mail should be directed to a specific person (such as a principal officer, fiduciary, or registered agent) or routed through another organization. In the BMF, the ICO value is prefixed with “%” in the raw data.

Cleans In-Care-Of name field.

transform_bmf_ico_name(dt, input_col = "ICO")
Parameter Type Default Description
dt data.table required BMF data with ICO column
input_col character “ICO” Source column name

Output Columns:

Column Type Description
in_care_of_name_raw character Original ICO value
in_care_of_name_clean character Cleaned and title-cased
in_care_of_name_provided logical TRUE if ICO was provided

6.4 transform_bmf_group_exemption_number()

File: R/group_exemption_number.R

A Group Exemption Number (GEN) is a 4-digit identifier assigned by the IRS to a central organization that has obtained a group exemption letter under Revenue Procedure 80-27. This allows the central organization to extend its tax-exempt status to subordinate organizations (such as local chapters, lodges, or posts) without each subordinate filing a separate application. Organizations with GEN = 0 are independent and not part of any group exemption arrangement.

Standardizes Group Exemption Number (GEN).

transform_bmf_group_exemption_number(dt, input_col = "GROUP")
Parameter Type Default Description
dt data.table required BMF data with GROUP column
input_col character “GROUP” Source column name

Output Columns:

Column Type Description
group_exemption_number_raw character Original GROUP value
group_exemption_number character Padded 4-digit GEN
group_exemption_is_member logical TRUE if org is part of group exemption (GEN ≠ 0)

6.5 transform_bmf_ruling_date()

File: R/ruling_date.R

The ruling date (also called the determination date) is the year and month when the IRS issued the organization’s determination letter formally recognizing its tax-exempt status. This date establishes when the organization’s exemption became effective and is recorded in YYYYMM format in the BMF.

Parses IRS ruling/determination date.

transform_bmf_ruling_date(dt, input_col = "RULING")
Parameter Type Default Description
dt data.table required BMF data with RULING column
input_col character “RULING” Source column name

Output Columns:

Column Type Description
ruling_date_ym_str character Original YYYYMM string
ruling_date Date Parsed date (YYYY-MM-01)
ruling_date_is_missing logical TRUE if missing/invalid

Sentinel Value: Missing dates are set to 1900-01-01

Design Note: Both the original YYYYMM string and a parsed Date object are retained to serve different use cases. The string format preserves the source data exactly as received from the IRS for auditability and is useful for year-month grouping operations. The Date column enables date arithmetic, filtering by date ranges, and integration with time-series analysis tools that expect proper date types.


6.6 transform_dba_name()

File: R/dba_name.R

A “doing business as” (DBA) name, also known as a trade name or fictitious business name, is an alternate name under which an organization operates that differs from its legal name. The IRS SORT_NAME field captures this secondary name when an organization conducts activities or is publicly known by a name other than its registered legal name.

Transforms the SORT_NAME column (doing-business-as name) into cleaned DBA name fields.

transform_dba_name(dt, input_col = "SORT_NAME")
Parameter Type Default Description
dt data.table required BMF data with SORT_NAME column
input_col character "SORT_NAME" Source column name

Output Columns:

Column Type Description
dba_name_raw character Original SORT_NAME value (empty strings converted to NA)
dba_name character Title-cased, whitespace-squished DBA name

Processing Notes:

  • Most organizations (~95%) do not have a secondary/DBA name
  • Empty strings are converted to NA to distinguish from missing values
  • Title case applied for display consistency

6.7 transform_address()

File: R/address.R

The address fields in the BMF represent the organization’s mailing address as reported to the IRS, typically on Form 990 or the exemption application (Form 1023/1024). This is the address where official IRS correspondence is sent and may differ from the organization’s physical location or principal place of business.

Standardizes address components using USPS conventions and generates geocoding-ready output with quality flags.

transform_address(dt, street_col = "STREET", city_col = "CITY",
                  state_col = "STATE", zip_col = "ZIP")
Parameter Type Default Description
dt data.table required BMF data with address columns
street_col character “STREET” Street address column
city_col character “CITY” City column
state_col character “STATE” State column
zip_col character “ZIP” ZIP code column

Output Columns (17 total):

Column Type Description
org_addr_street_raw character Original street value
org_addr_street character USPS-standardized street
org_addr_city_raw character Original city value
org_addr_city character Cleaned city name
org_addr_state_raw character Original state value
org_addr_state character Validated 2-letter abbreviation
org_addr_zip_raw character Original ZIP value
org_addr_zip5 character 5-digit ZIP code
org_addr_zip4 character ZIP+4 extension (if present)
org_addr_zip character Full ZIP (XXXXX or XXXXX-XXXX)
org_addr_full character Full formatted address string
org_addr_is_missing logical TRUE if all address fields empty
org_addr_is_po_box logical TRUE if P.O. Box address
org_addr_is_rural_route logical TRUE if rural route address
org_addr_has_special_chars logical TRUE if unusual characters
org_addr_missing_number logical TRUE if no street number
org_addr_state_invalid logical TRUE if not valid US state/territory

USPS Standardizations Applied: - 165+ street type abbreviations (STREET → ST, AVENUE → AVE) - Directional standardization (NORTH → N, SOUTHWEST → SW) - Unit type abbreviations (APARTMENT → APT, SUITE → STE) - State name to abbreviation conversion - ZIP code parsing and validation


7 Classification Transforms

7.1 transform_bmf_subsection_classification_codes()

File: R/subsection_classification_codes.R

The subsection code identifies the paragraph of IRC Section 501(c) under which an organization is exempt (e.g., 501(c)(3), 501(c)(4), 501(c)(6)). The classification code is a secondary descriptor indicating the type of organization within that subsection—for example, under 501(c)(3), classification codes distinguish between charitable organizations, educational institutions, religious organizations, and scientific entities. An organization may have multiple classification codes.

Transforms SUBSECTION and CLASSIFICATION columns with dimension table creation. The dimension table is created internally.

transform_bmf_subsection_classification_codes(dt, orgtype_lookup = subsection_orgtype_lookup)
Parameter Type Default Description
dt data.table required BMF data with SUBSECTION and CLASSIFICATION columns
orgtype_lookup data.table subsection_orgtype_lookup Lookup mapping subsection codes to organization types

Helper Function: create_cl_code_dim_table(dt, lookup, year) creates the SCD Type 2 dimension table internally.

Output Columns:

Column Type Description
subsection_code character IRC subsection code
classification_code character Raw classification codes
exempt_organization_type character Human-readable org type
all_classifications_string character Semicolon-separated descriptions

Code Definitions: See the Lookup Tables Reference for complete subsection and classification code combinations.


7.2 transform_bmf_affiliation_code()

File: R/affiliation_code.R

The affiliation code describes an organization’s relationship within a group exemption structure. It distinguishes between central/parent organizations that hold the group exemption, intermediate organizations that coordinate subordinates, independent organizations with no group affiliation, and subordinate organizations covered under a parent’s group ruling.

transform_bmf_affiliation_code(dt, input_col = "AFFILIATION", lookup = affiliation_code_lookup)

Output Columns:

Column Type Description
affiliation_code integer Affiliation type code
affiliation_code_definition character Affiliation description

Affiliation Codes:

  • 1 = Central organization
  • 2 = Intermediate organization
  • 3 = Independent organization
  • 6 = Central organization (group ruling)
  • 7 = Intermediate organization (group ruling)
  • 8 = Central organization (no group ruling)
  • 9 = Subordinate organization (group ruling)

7.3 transform_bmf_deductibility_code()

File: R/deductibility_code.R

The deductibility code indicates whether contributions to the organization are tax-deductible under IRC Section 170. Not all tax-exempt organizations qualify for deductible contributions—for example, 501(c)(3) organizations generally do, while 501(c)(4) social welfare organizations generally do not. The code also indicates any limitations on deductibility.

transform_bmf_deductibility_code(dt, input_col = "DEDUCTIBILITY", lookup = deductibility_code_lookup)

Output Columns:

Column Type Description
deductibility_code integer Deductibility status
deductibility_code_definition character Deductibility description

Code Definitions: See the Lookup Tables Reference for complete deductibility code definitions.


7.4 transform_bmf_foundation_code()

File: R/foundation_code.R

The foundation code classifies 501(c)(3) organizations based on their public charity or private foundation status under IRC Section 509(a). Public charities receive broad public support and face fewer restrictions, while private foundations (typically funded by a single source) are subject to additional rules on self-dealing, minimum distributions, and excess business holdings. The code identifies which subsection of 509(a) applies or whether the organization is a private operating foundation.

transform_bmf_foundation_code(dt, input_col = "FOUNDATION", lookup = foundation_code_lookup)

Output Columns:

Column Type Description
foundation_code integer Foundation type code
foundation_code_definition character Foundation type description

Foundation Code Definitions:

Code Definition
0 4947(a)(1)
2 Private operating foundation exempt from payment of section 4940 taxes on investment income
3 Private operating foundation
4 Private non-operating foundation
9 Suspense (a specific type not identified)
10 Church—IRC Section 170(b)(1)(A)(i)
11 School—IRC Section 170(b)(1)(A)(ii)
12 Hospital—IRC Section 170(b)(1)(A)(iii)
13 Organizations operated for the benefit of a college or university—IRC Section 170(b)(1)(A)(iv)
14 Federal, State or local government unit—IRC Section 170(b)(1)(A)(v)
15 Organization receiving support from governmental unit or general public—IRC Section 170(b)(1)(A)(vi)
16 General, public charity—IRC Section 509(a)(2)
17 Public charity supporting (FC 09–15)—IRC Section 509(a)(3)
18 Public safety—IRC Section 509(a)(4)
21 Supporting organization - IRC 509(a)(3) - Type I
22 Supporting organization - IRC 509(a)(3) - Type II
23 Supporting organization - IRC 509(a)(3) - Type III functionally integrated
24 Supporting organization - IRC 509(a)(3) - Type III not functionally integrated

7.5 transform_bmf_organization_code()

File: R/organization_code.R

The organization code indicates the legal form or structure of the exempt organization as recognized by the IRS. Common forms include corporation, trust, association, and cooperative. This reflects how the organization was established under state law and affects governance requirements and liability structures.

transform_bmf_organization_code(dt, input_col = "ORGANIZATION", lookup = organization_code_lookup)

Output Columns:

Column Type Description
organization_code integer Organization structure code
organization_code_definition character Organization type description

Organization Code Definitions

Code Definition
0 Unassigned
1 Corporation
2 Trust
3 Co-operative
4 Partnership
5 Association
6 Non-exempt Charitable Trust

7.6 transform_bmf_status_code()

File: R/status_code.R

The status code indicates the current standing of an organization’s tax-exempt recognition with the IRS. Values distinguish between unconditional exemption (fully recognized), conditional exemption (provisional status), and various termination states including voluntary termination, merger, and automatic revocation for failure to file required returns for three consecutive years.

transform_bmf_status_code(dt, input_col = "STATUS", lookup = status_code_lookup)

Output Columns:

Column Type Description
status_code integer IRS exempt status code
status_code_definition character Status description

Status Code Definitions

Code Definition
1 Unconditional Exemption
2 Conditional Exemption
12 Trust described in section 4947(a)(2) of the Internal Revenue Code
25 Organization terminating its private foundation status under section 507(b)(1)(B) of the Code

8 Activity Transforms

8.1 transform_bmf_activity_code()

File: R/activity_code.R

Activity codes are 3-digit codes from the IRS activity code list that describe the primary purposes or activities of an exempt organization. Each organization may report up to three activity codes, which were self-selected on the original exemption application (Form 1023 or 1024). These codes predate the NTEE system and provide a complementary classification based on organizational activities rather than mission area.

Transforms activity codes by creating a dimension table internally and aggregating back to the main table.

transform_bmf_activity_code(dt)
Parameter Type Default Description
dt data.table required BMF data with ein and ACTIVITY columns

Helper Function: create_activity_code_dim_table(dt, lookup, input_col) creates the SCD Type 2 dimension table internally.

Output Columns:

Column Type Description
activity_code_definitions character Semicolon-separated activity descriptions
activity_code_categories character Semicolon-separated activity categories

Dimension Table Schema:

Column Type Description
ein character Employer ID
activity_code character 3-character activity code
activity_code_definition character Activity description
activity_code_category character Activity category

Code Definitions: There are 306 activity codes. See the Lookup Tables Reference for complete definitions.


8.2 transform_ntee_code()

File: R/transform_ntee_code.R

The National Taxonomy of Exempt Entities (NTEE) is a classification system developed by the National Center for Charitable Statistics (NCCS) to categorize nonprofit organizations by their primary purpose. NTEE codes consist of a letter indicating the major group (e.g., A=Arts, B=Education, E=Health) followed by digits for more specific categorization. The IRS adopted NTEE for the BMF, and NCCS has since developed NTEE Version 2 (NTEEv2) with improved granularity.

Complex NTEE code transformation with multiple lookups.

transform_ntee_code(
  dt,
  ntee_code_lookup,
  ntee_major_group_lookup,
  activity_code_lookup,
  input_ntee_col = "NTEE_CD",
  year,
  write_scd = FALSE
)

Output Columns:

Column Type Description
ntee_code character Cleaned NTEE code
ntee_code_definition character NTEE category description
ntee_code_major_group character Major group (first character)
naics_code character Mapped NAICS industry code
nteev2_code character NTEE Version 2 code
nteev2_subsector character NTEEV2 subsector
nteev2_org_type character NTEEV2 organization type

Code Definitions: See the Lookup Tables Reference for complete NTEE code definitions, major groups, and common codes.


9 Temporal Transforms

9.1 transform_tax_period()

File: R/transform_tax_period.R

The tax period represents the ending date of the organization’s most recent tax year for which the IRS has processed a return. This reflects when the organization last filed Form 990, 990-EZ, 990-N, or 990-PF, and is stored in YYYYMM format. The tax period helps identify organizations that may be delinquent in their filing obligations.

transform_tax_period(dt, input_col = "TAX_PERIOD")

Output Columns:

Column Type Description
tax_period_ymd Date Tax period end date
tax_period_is_missing logical TRUE if missing/invalid
tax_period_ym_str String Original YYYMM string

Sentinel Value: Missing dates set to 1900-01-01

Design Note: Both the original YYYYMM string and a parsed Date object are retained to serve different use cases. The string format preserves the source data exactly as received from the IRS for auditability and is useful for year-month grouping operations. The Date column enables date arithmetic, filtering by date ranges, and integration with time-series analysis tools that expect proper date types.


9.2 transform_accounting_period()

File: R/accounting_period.R

The accounting period indicates the month in which the organization’s fiscal year ends (1-12, where 12 = December). Most nonprofits use a calendar year (December year-end), but organizations may choose a different fiscal year-end that aligns with their program cycles or funding patterns. This determines when annual Form 990 filings are due (the 15th day of the 5th month after fiscal year-end).

transform_accounting_period(dt, input_col = "ACCT_PD", lookup = lookup_ls$accounting_period)

Output Columns:

Column Type Description
accounting_period_code integer Fiscal year end month (1-12)
accounting_period_definition character Month name

10 Financial Transforms

10.1 transform_bmf_asset_code() / transform_bmf_income_code()

File: R/financial_codes.R

Asset and income codes are single-digit codes (0-9) that place organizations into size categories based on their total assets and total income as reported on Form 990. These codes provide a quick indicator of organizational size without requiring access to the full financial data. The IRS assigns these codes based on the most recently processed return.

transform_bmf_asset_code(dt, lookup = lookup_ls$asset_code)
transform_bmf_income_code(dt, lookup = lookup_ls$income_code)

Output Columns:

Transform Code Column Definition Column
Asset asset_code asset_code_definition
Income income_code income_code_definition

Code Ranges (Asset/Income):

  • 0 = $0
  • 1 = $1 - $9,999
  • 2 = $10,000 - $24,999
  • 3 = $25,000 - $99,999
  • 4 = $100,000 - $499,999
  • 5 = $500,000 - $999,999
  • 6 = $1,000,000 - $4,999,999
  • 7 = $5,000,000 - $9,999,999
  • 8 = $10,000,000 - $49,999,999
  • 9 = $50,000,000 or more

10.2 transform_asset_amount() / transform_income_amount() / transform_revenue_amount()

File: R/asset_amount.R

These fields contain the actual dollar amounts from the organization’s most recently processed Form 990: total assets (end-of-year book value from Part X), total income (gross receipts), and total revenue (Part VIII total). Unlike the coded fields, these amounts provide precise financial figures. Note that income and revenue may be negative due to investment losses or prior-period adjustments.

transform_asset_amount(dt)   # Warns on negative values
transform_income_amount(dt)  # Allows negative (losses)
transform_revenue_amount(dt) # Allows negative (adjustments)

Output Columns:

Transform Output Column Allows Negative
Asset asset_amount No (warns)
Income income_amount Yes
Revenue revenue_amount Yes

11 Filing Transforms

11.1 transform_bmf_filing_requirement_code()

File: R/filing_requirement_code.R

The filing requirement code indicates which annual information return the organization is required to file with the IRS. Options include Form 990 (for larger organizations), Form 990-EZ (for mid-sized organizations), Form 990-N (e-Postcard for small organizations with gross receipts ≤$50,000), or no return required (such as churches and certain religious organizations that are exempt from filing under IRC Section 6033).

Transforms the FILING_REQ_CD column into a standardized filing requirement code with definition.

transform_bmf_filing_requirement_code(dt, input_col = "FILING_REQ_CD", lookup = filing_requirement_code_lookup)
Parameter Type Default Description
dt data.table required BMF data with FILING_REQ_CD column
input_col character “FILING_REQ_CD” Source column name
lookup data.table filing_requirement_code_lookup Lookup table with codes and definitions

Output Columns:

Column Type Description
filing_requirement_code integer Standardized filing requirement code
filing_requirement_code_definition character Human-readable definition

Filing Requirement Code Definitions

Code Definition
0 990 - Not required to file (all other)
1 990 (all other) or 990EZ return
2 990 - Required to file Form 990-N - Income less than $50,000 per year
3 990 - Group return
4 990 - Required to file Form 990-BL, Black Lung Trusts
6 990 - Not required to file (church)
7 990 - Government 501(c)(1)
13 990 - Not required to file (religious organization)
14 990 - Not required to file (instrumentalities of states or political subdivisions)

11.2 transform_bmf_pf_filing_requirement_code()

File: R/filing_requirement_code.R

The private foundation filing requirement code applies specifically to organizations classified as private foundations under IRC Section 509(a). Private foundations must file Form 990-PF annually regardless of their size, and this code indicates their specific filing obligations including any requirements related to excise taxes on investment income (IRC Section 4940).

Transforms the PF_FILING_REQ_CD column into a standardized private foundation filing requirement code with definition.

PF Filing Requirement Code Definitions

Code Definition
0 No 990-PF return
1 990-PF return
2 990-PF return
3 990-PF return
transform_bmf_pf_filing_requirement_code(dt, input_col = "PF_FILING_REQ_CD", lookup = pf_filing_requirement_code_lookup)
Parameter Type Default Description
dt data.table required BMF data with PF_FILING_REQ_CD column
input_col character “PF_FILING_REQ_CD” Source column name
lookup data.table pf_filing_requirement_code_lookup Lookup table with codes and definitions

Output Columns:

Column Type Description
pf_filing_requirement_code integer Standardized PF filing requirement code
pf_filing_requirement_code_definition character Human-readable definition

11.3 transform_code() (Generic Helper)

File: R/transform_code.R

Generic helper function used internally by the filing requirement transforms.

transform_code(
  dt,
  input_col,
  lookup_key,
  definition_col,
  type_conversion_func = as.integer,
  lookup
)