4 Transform Reference
5 Transform Functions
This chapter documents all 24 transform functions in the BMF pipeline. Each function follows a consistent pattern:
- Input validation
- Safe copy of input data
- Transformation logic
- Output validation
- Return transformed data.table
5.1 Function Signature Pattern
transform_<field> <- function(
dt, # data.table with required columns
input_col = "COLUMN", # Source column name
lookup = <lookup_table> # Reference table (if applicable)
)6 Identity Transforms
6.1 transform_ein()
File: R/ein.R
Formats EIN (Employer Identification Number) to standard XX-XXXXXXX format.
transform_ein(dt, input_col = "EIN")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with EIN column |
input_col |
character | “EIN” | Source column name |
Output Columns:
| Column | Type | Description |
|---|---|---|
ein_raw |
character | Original EIN value |
ein |
character | Formatted as XX-XXXXXXX |
Validation: - Removes non-numeric characters - Left-pads to 9 digits - Warns if format mismatch after processing
Format Note: Some data providers use an EIN-XX-XXXXXXX format with an explicit prefix, but this pipeline uses the XX-XXXXXXX format to align with IRS conventions as displayed on Form 990 and official determination letters. This format is also the de facto standard in the nonprofit research ecosystem (used by NCCS, Foundation Center/Candid, and ProPublica), which facilitates cross-dataset linkage without requiring prefix manipulation.
6.2 transform_organization_name()
File: R/organization_name.R
Cleans and standardizes organization names, extracting legal suffixes.
transform_organization_name(dt, input_col = "NAME", name_lookup_path = NULL)| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with NAME column |
input_col |
character | “NAME” | Source column name |
name_lookup_path |
character | NULL | Optional path to manual overrides |
Output Columns:
| Column | Type | Description |
|---|---|---|
org_name_raw |
character | Original organization name |
org_name_join |
character | Cleaned name for matching/joining |
org_name_display |
character | Title-cased display name |
org_parent_name |
character | Parent organization name (if matched via GEN or abbreviation) |
org_legal_suffix |
character | Extracted suffix (INC, CORP, LLC, etc.) |
Processing Steps:
- Extract legal suffixes using pattern matching
- Apply word standardizations (Assn → Association, etc.)
- Apply parent organization lookup (matches abbreviations or GEN to parent orgs)
- Apply manual overrides (if lookup provided)
- Title-case display names
6.3 transform_bmf_ico_name()
File: R/ico_name.R
The In-Care-Of (ICO) field identifies an individual or entity designated to receive correspondence on behalf of the exempt organization. Per IRS instructions for Form 990, this field is used when mail should be directed to a specific person (such as a principal officer, fiduciary, or registered agent) or routed through another organization. In the BMF, the ICO value is prefixed with “%” in the raw data.
Cleans In-Care-Of name field.
transform_bmf_ico_name(dt, input_col = "ICO")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with ICO column |
input_col |
character | “ICO” | Source column name |
Output Columns:
| Column | Type | Description |
|---|---|---|
in_care_of_name_raw |
character | Original ICO value |
in_care_of_name_clean |
character | Cleaned and title-cased |
in_care_of_name_provided |
logical | TRUE if ICO was provided |
6.4 transform_bmf_group_exemption_number()
File: R/group_exemption_number.R
A Group Exemption Number (GEN) is a 4-digit identifier assigned by the IRS to a central organization that has obtained a group exemption letter under Revenue Procedure 80-27. This allows the central organization to extend its tax-exempt status to subordinate organizations (such as local chapters, lodges, or posts) without each subordinate filing a separate application. Organizations with GEN = 0 are independent and not part of any group exemption arrangement.
Standardizes Group Exemption Number (GEN).
transform_bmf_group_exemption_number(dt, input_col = "GROUP")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with GROUP column |
input_col |
character | “GROUP” | Source column name |
Output Columns:
| Column | Type | Description |
|---|---|---|
group_exemption_number_raw |
character | Original GROUP value |
group_exemption_number |
character | Padded 4-digit GEN |
group_exemption_is_member |
logical | TRUE if org is part of group exemption (GEN ≠ 0) |
6.5 transform_bmf_ruling_date()
File: R/ruling_date.R
The ruling date (also called the determination date) is the year and month when the IRS issued the organization’s determination letter formally recognizing its tax-exempt status. This date establishes when the organization’s exemption became effective and is recorded in YYYYMM format in the BMF.
Parses IRS ruling/determination date.
transform_bmf_ruling_date(dt, input_col = "RULING")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with RULING column |
input_col |
character | “RULING” | Source column name |
Output Columns:
| Column | Type | Description |
|---|---|---|
ruling_date_ym_str |
character | Original YYYYMM string |
ruling_date |
Date | Parsed date (YYYY-MM-01) |
ruling_date_is_missing |
logical | TRUE if missing/invalid |
Sentinel Value: Missing dates are set to 1900-01-01
Design Note: Both the original YYYYMM string and a parsed Date object are retained to serve different use cases. The string format preserves the source data exactly as received from the IRS for auditability and is useful for year-month grouping operations. The Date column enables date arithmetic, filtering by date ranges, and integration with time-series analysis tools that expect proper date types.
6.6 transform_dba_name()
File: R/dba_name.R
A “doing business as” (DBA) name, also known as a trade name or fictitious business name, is an alternate name under which an organization operates that differs from its legal name. The IRS SORT_NAME field captures this secondary name when an organization conducts activities or is publicly known by a name other than its registered legal name.
Transforms the SORT_NAME column (doing-business-as name) into cleaned DBA name fields.
transform_dba_name(dt, input_col = "SORT_NAME")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with SORT_NAME column |
input_col |
character | "SORT_NAME" |
Source column name |
Output Columns:
| Column | Type | Description |
|---|---|---|
dba_name_raw |
character | Original SORT_NAME value (empty strings converted to NA) |
dba_name |
character | Title-cased, whitespace-squished DBA name |
Processing Notes:
- Most organizations (~95%) do not have a secondary/DBA name
- Empty strings are converted to NA to distinguish from missing values
- Title case applied for display consistency
6.7 transform_address()
File: R/address.R
The address fields in the BMF represent the organization’s mailing address as reported to the IRS, typically on Form 990 or the exemption application (Form 1023/1024). This is the address where official IRS correspondence is sent and may differ from the organization’s physical location or principal place of business.
Standardizes address components using USPS conventions and generates geocoding-ready output with quality flags.
transform_address(dt, street_col = "STREET", city_col = "CITY",
state_col = "STATE", zip_col = "ZIP")| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with address columns |
street_col |
character | “STREET” | Street address column |
city_col |
character | “CITY” | City column |
state_col |
character | “STATE” | State column |
zip_col |
character | “ZIP” | ZIP code column |
Output Columns (17 total):
| Column | Type | Description |
|---|---|---|
org_addr_street_raw |
character | Original street value |
org_addr_street |
character | USPS-standardized street |
org_addr_city_raw |
character | Original city value |
org_addr_city |
character | Cleaned city name |
org_addr_state_raw |
character | Original state value |
org_addr_state |
character | Validated 2-letter abbreviation |
org_addr_zip_raw |
character | Original ZIP value |
org_addr_zip5 |
character | 5-digit ZIP code |
org_addr_zip4 |
character | ZIP+4 extension (if present) |
org_addr_zip |
character | Full ZIP (XXXXX or XXXXX-XXXX) |
org_addr_full |
character | Full formatted address string |
org_addr_is_missing |
logical | TRUE if all address fields empty |
org_addr_is_po_box |
logical | TRUE if P.O. Box address |
org_addr_is_rural_route |
logical | TRUE if rural route address |
org_addr_has_special_chars |
logical | TRUE if unusual characters |
org_addr_missing_number |
logical | TRUE if no street number |
org_addr_state_invalid |
logical | TRUE if not valid US state/territory |
USPS Standardizations Applied: - 165+ street type abbreviations (STREET → ST, AVENUE → AVE) - Directional standardization (NORTH → N, SOUTHWEST → SW) - Unit type abbreviations (APARTMENT → APT, SUITE → STE) - State name to abbreviation conversion - ZIP code parsing and validation
7 Classification Transforms
7.1 transform_bmf_subsection_classification_codes()
File: R/subsection_classification_codes.R
The subsection code identifies the paragraph of IRC Section 501(c) under which an organization is exempt (e.g., 501(c)(3), 501(c)(4), 501(c)(6)). The classification code is a secondary descriptor indicating the type of organization within that subsection—for example, under 501(c)(3), classification codes distinguish between charitable organizations, educational institutions, religious organizations, and scientific entities. An organization may have multiple classification codes.
Transforms SUBSECTION and CLASSIFICATION columns with dimension table creation. The dimension table is created internally.
transform_bmf_subsection_classification_codes(dt, orgtype_lookup = subsection_orgtype_lookup)| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with SUBSECTION and CLASSIFICATION columns |
orgtype_lookup |
data.table | subsection_orgtype_lookup | Lookup mapping subsection codes to organization types |
Helper Function: create_cl_code_dim_table(dt, lookup, year) creates the SCD Type 2 dimension table internally.
Output Columns:
| Column | Type | Description |
|---|---|---|
subsection_code |
character | IRC subsection code |
classification_code |
character | Raw classification codes |
exempt_organization_type |
character | Human-readable org type |
all_classifications_string |
character | Semicolon-separated descriptions |
Code Definitions: See the Lookup Tables Reference for complete subsection and classification code combinations.
7.2 transform_bmf_affiliation_code()
File: R/affiliation_code.R
The affiliation code describes an organization’s relationship within a group exemption structure. It distinguishes between central/parent organizations that hold the group exemption, intermediate organizations that coordinate subordinates, independent organizations with no group affiliation, and subordinate organizations covered under a parent’s group ruling.
transform_bmf_affiliation_code(dt, input_col = "AFFILIATION", lookup = affiliation_code_lookup)Output Columns:
| Column | Type | Description |
|---|---|---|
affiliation_code |
integer | Affiliation type code |
affiliation_code_definition |
character | Affiliation description |
Affiliation Codes:
- 1 = Central organization
- 2 = Intermediate organization
- 3 = Independent organization
- 6 = Central organization (group ruling)
- 7 = Intermediate organization (group ruling)
- 8 = Central organization (no group ruling)
- 9 = Subordinate organization (group ruling)
7.3 transform_bmf_deductibility_code()
File: R/deductibility_code.R
The deductibility code indicates whether contributions to the organization are tax-deductible under IRC Section 170. Not all tax-exempt organizations qualify for deductible contributions—for example, 501(c)(3) organizations generally do, while 501(c)(4) social welfare organizations generally do not. The code also indicates any limitations on deductibility.
transform_bmf_deductibility_code(dt, input_col = "DEDUCTIBILITY", lookup = deductibility_code_lookup)Output Columns:
| Column | Type | Description |
|---|---|---|
deductibility_code |
integer | Deductibility status |
deductibility_code_definition |
character | Deductibility description |
Code Definitions: See the Lookup Tables Reference for complete deductibility code definitions.
7.4 transform_bmf_foundation_code()
File: R/foundation_code.R
The foundation code classifies 501(c)(3) organizations based on their public charity or private foundation status under IRC Section 509(a). Public charities receive broad public support and face fewer restrictions, while private foundations (typically funded by a single source) are subject to additional rules on self-dealing, minimum distributions, and excess business holdings. The code identifies which subsection of 509(a) applies or whether the organization is a private operating foundation.
transform_bmf_foundation_code(dt, input_col = "FOUNDATION", lookup = foundation_code_lookup)Output Columns:
| Column | Type | Description |
|---|---|---|
foundation_code |
integer | Foundation type code |
foundation_code_definition |
character | Foundation type description |
Foundation Code Definitions:
| Code | Definition |
|---|---|
| 0 | 4947(a)(1) |
| 2 | Private operating foundation exempt from payment of section 4940 taxes on investment income |
| 3 | Private operating foundation |
| 4 | Private non-operating foundation |
| 9 | Suspense (a specific type not identified) |
| 10 | Church—IRC Section 170(b)(1)(A)(i) |
| 11 | School—IRC Section 170(b)(1)(A)(ii) |
| 12 | Hospital—IRC Section 170(b)(1)(A)(iii) |
| 13 | Organizations operated for the benefit of a college or university—IRC Section 170(b)(1)(A)(iv) |
| 14 | Federal, State or local government unit—IRC Section 170(b)(1)(A)(v) |
| 15 | Organization receiving support from governmental unit or general public—IRC Section 170(b)(1)(A)(vi) |
| 16 | General, public charity—IRC Section 509(a)(2) |
| 17 | Public charity supporting (FC 09–15)—IRC Section 509(a)(3) |
| 18 | Public safety—IRC Section 509(a)(4) |
| 21 | Supporting organization - IRC 509(a)(3) - Type I |
| 22 | Supporting organization - IRC 509(a)(3) - Type II |
| 23 | Supporting organization - IRC 509(a)(3) - Type III functionally integrated |
| 24 | Supporting organization - IRC 509(a)(3) - Type III not functionally integrated |
7.5 transform_bmf_organization_code()
File: R/organization_code.R
The organization code indicates the legal form or structure of the exempt organization as recognized by the IRS. Common forms include corporation, trust, association, and cooperative. This reflects how the organization was established under state law and affects governance requirements and liability structures.
transform_bmf_organization_code(dt, input_col = "ORGANIZATION", lookup = organization_code_lookup)Output Columns:
| Column | Type | Description |
|---|---|---|
organization_code |
integer | Organization structure code |
organization_code_definition |
character | Organization type description |
Organization Code Definitions
| Code | Definition |
|---|---|
| 0 | Unassigned |
| 1 | Corporation |
| 2 | Trust |
| 3 | Co-operative |
| 4 | Partnership |
| 5 | Association |
| 6 | Non-exempt Charitable Trust |
7.6 transform_bmf_status_code()
File: R/status_code.R
The status code indicates the current standing of an organization’s tax-exempt recognition with the IRS. Values distinguish between unconditional exemption (fully recognized), conditional exemption (provisional status), and various termination states including voluntary termination, merger, and automatic revocation for failure to file required returns for three consecutive years.
transform_bmf_status_code(dt, input_col = "STATUS", lookup = status_code_lookup)Output Columns:
| Column | Type | Description |
|---|---|---|
status_code |
integer | IRS exempt status code |
status_code_definition |
character | Status description |
Status Code Definitions
| Code | Definition |
|---|---|
| 1 | Unconditional Exemption |
| 2 | Conditional Exemption |
| 12 | Trust described in section 4947(a)(2) of the Internal Revenue Code |
| 25 | Organization terminating its private foundation status under section 507(b)(1)(B) of the Code |
8 Activity Transforms
8.1 transform_bmf_activity_code()
File: R/activity_code.R
Activity codes are 3-digit codes from the IRS activity code list that describe the primary purposes or activities of an exempt organization. Each organization may report up to three activity codes, which were self-selected on the original exemption application (Form 1023 or 1024). These codes predate the NTEE system and provide a complementary classification based on organizational activities rather than mission area.
Transforms activity codes by creating a dimension table internally and aggregating back to the main table.
transform_bmf_activity_code(dt)| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with ein and ACTIVITY columns |
Helper Function: create_activity_code_dim_table(dt, lookup, input_col) creates the SCD Type 2 dimension table internally.
Output Columns:
| Column | Type | Description |
|---|---|---|
activity_code_definitions |
character | Semicolon-separated activity descriptions |
activity_code_categories |
character | Semicolon-separated activity categories |
Dimension Table Schema:
| Column | Type | Description |
|---|---|---|
ein |
character | Employer ID |
activity_code |
character | 3-character activity code |
activity_code_definition |
character | Activity description |
activity_code_category |
character | Activity category |
Code Definitions: There are 306 activity codes. See the Lookup Tables Reference for complete definitions.
8.2 transform_ntee_code()
File: R/transform_ntee_code.R
The National Taxonomy of Exempt Entities (NTEE) is a classification system developed by the National Center for Charitable Statistics (NCCS) to categorize nonprofit organizations by their primary purpose. NTEE codes consist of a letter indicating the major group (e.g., A=Arts, B=Education, E=Health) followed by digits for more specific categorization. The IRS adopted NTEE for the BMF, and NCCS has since developed NTEE Version 2 (NTEEv2) with improved granularity.
Complex NTEE code transformation with multiple lookups.
transform_ntee_code(
dt,
ntee_code_lookup,
ntee_major_group_lookup,
activity_code_lookup,
input_ntee_col = "NTEE_CD",
year,
write_scd = FALSE
)Output Columns:
| Column | Type | Description |
|---|---|---|
ntee_code |
character | Cleaned NTEE code |
ntee_code_definition |
character | NTEE category description |
ntee_code_major_group |
character | Major group (first character) |
naics_code |
character | Mapped NAICS industry code |
nteev2_code |
character | NTEE Version 2 code |
nteev2_subsector |
character | NTEEV2 subsector |
nteev2_org_type |
character | NTEEV2 organization type |
Code Definitions: See the Lookup Tables Reference for complete NTEE code definitions, major groups, and common codes.
9 Temporal Transforms
9.1 transform_tax_period()
File: R/transform_tax_period.R
The tax period represents the ending date of the organization’s most recent tax year for which the IRS has processed a return. This reflects when the organization last filed Form 990, 990-EZ, 990-N, or 990-PF, and is stored in YYYYMM format. The tax period helps identify organizations that may be delinquent in their filing obligations.
transform_tax_period(dt, input_col = "TAX_PERIOD")Output Columns:
| Column | Type | Description |
|---|---|---|
tax_period_ymd |
Date | Tax period end date |
tax_period_is_missing |
logical | TRUE if missing/invalid |
tax_period_ym_str |
String | Original YYYMM string |
Sentinel Value: Missing dates set to 1900-01-01
Design Note: Both the original YYYYMM string and a parsed Date object are retained to serve different use cases. The string format preserves the source data exactly as received from the IRS for auditability and is useful for year-month grouping operations. The Date column enables date arithmetic, filtering by date ranges, and integration with time-series analysis tools that expect proper date types.
9.2 transform_accounting_period()
File: R/accounting_period.R
The accounting period indicates the month in which the organization’s fiscal year ends (1-12, where 12 = December). Most nonprofits use a calendar year (December year-end), but organizations may choose a different fiscal year-end that aligns with their program cycles or funding patterns. This determines when annual Form 990 filings are due (the 15th day of the 5th month after fiscal year-end).
transform_accounting_period(dt, input_col = "ACCT_PD", lookup = lookup_ls$accounting_period)Output Columns:
| Column | Type | Description |
|---|---|---|
accounting_period_code |
integer | Fiscal year end month (1-12) |
accounting_period_definition |
character | Month name |
10 Financial Transforms
10.1 transform_bmf_asset_code() / transform_bmf_income_code()
File: R/financial_codes.R
Asset and income codes are single-digit codes (0-9) that place organizations into size categories based on their total assets and total income as reported on Form 990. These codes provide a quick indicator of organizational size without requiring access to the full financial data. The IRS assigns these codes based on the most recently processed return.
transform_bmf_asset_code(dt, lookup = lookup_ls$asset_code)
transform_bmf_income_code(dt, lookup = lookup_ls$income_code)Output Columns:
| Transform | Code Column | Definition Column |
|---|---|---|
| Asset | asset_code |
asset_code_definition |
| Income | income_code |
income_code_definition |
Code Ranges (Asset/Income):
- 0 = $0
- 1 = $1 - $9,999
- 2 = $10,000 - $24,999
- 3 = $25,000 - $99,999
- 4 = $100,000 - $499,999
- 5 = $500,000 - $999,999
- 6 = $1,000,000 - $4,999,999
- 7 = $5,000,000 - $9,999,999
- 8 = $10,000,000 - $49,999,999
- 9 = $50,000,000 or more
10.2 transform_asset_amount() / transform_income_amount() / transform_revenue_amount()
File: R/asset_amount.R
These fields contain the actual dollar amounts from the organization’s most recently processed Form 990: total assets (end-of-year book value from Part X), total income (gross receipts), and total revenue (Part VIII total). Unlike the coded fields, these amounts provide precise financial figures. Note that income and revenue may be negative due to investment losses or prior-period adjustments.
transform_asset_amount(dt) # Warns on negative values
transform_income_amount(dt) # Allows negative (losses)
transform_revenue_amount(dt) # Allows negative (adjustments)Output Columns:
| Transform | Output Column | Allows Negative |
|---|---|---|
| Asset | asset_amount |
No (warns) |
| Income | income_amount |
Yes |
| Revenue | revenue_amount |
Yes |
11 Filing Transforms
11.1 transform_bmf_filing_requirement_code()
File: R/filing_requirement_code.R
The filing requirement code indicates which annual information return the organization is required to file with the IRS. Options include Form 990 (for larger organizations), Form 990-EZ (for mid-sized organizations), Form 990-N (e-Postcard for small organizations with gross receipts ≤$50,000), or no return required (such as churches and certain religious organizations that are exempt from filing under IRC Section 6033).
Transforms the FILING_REQ_CD column into a standardized filing requirement code with definition.
transform_bmf_filing_requirement_code(dt, input_col = "FILING_REQ_CD", lookup = filing_requirement_code_lookup)| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with FILING_REQ_CD column |
input_col |
character | “FILING_REQ_CD” | Source column name |
lookup |
data.table | filing_requirement_code_lookup | Lookup table with codes and definitions |
Output Columns:
| Column | Type | Description |
|---|---|---|
filing_requirement_code |
integer | Standardized filing requirement code |
filing_requirement_code_definition |
character | Human-readable definition |
Filing Requirement Code Definitions
| Code | Definition |
|---|---|
| 0 | 990 - Not required to file (all other) |
| 1 | 990 (all other) or 990EZ return |
| 2 | 990 - Required to file Form 990-N - Income less than $50,000 per year |
| 3 | 990 - Group return |
| 4 | 990 - Required to file Form 990-BL, Black Lung Trusts |
| 6 | 990 - Not required to file (church) |
| 7 | 990 - Government 501(c)(1) |
| 13 | 990 - Not required to file (religious organization) |
| 14 | 990 - Not required to file (instrumentalities of states or political subdivisions) |
11.2 transform_bmf_pf_filing_requirement_code()
File: R/filing_requirement_code.R
The private foundation filing requirement code applies specifically to organizations classified as private foundations under IRC Section 509(a). Private foundations must file Form 990-PF annually regardless of their size, and this code indicates their specific filing obligations including any requirements related to excise taxes on investment income (IRC Section 4940).
Transforms the PF_FILING_REQ_CD column into a standardized private foundation filing requirement code with definition.
PF Filing Requirement Code Definitions
| Code | Definition |
|---|---|
| 0 | No 990-PF return |
| 1 | 990-PF return |
| 2 | 990-PF return |
| 3 | 990-PF return |
transform_bmf_pf_filing_requirement_code(dt, input_col = "PF_FILING_REQ_CD", lookup = pf_filing_requirement_code_lookup)| Parameter | Type | Default | Description |
|---|---|---|---|
dt |
data.table | required | BMF data with PF_FILING_REQ_CD column |
input_col |
character | “PF_FILING_REQ_CD” | Source column name |
lookup |
data.table | pf_filing_requirement_code_lookup | Lookup table with codes and definitions |
Output Columns:
| Column | Type | Description |
|---|---|---|
pf_filing_requirement_code |
integer | Standardized PF filing requirement code |
pf_filing_requirement_code_definition |
character | Human-readable definition |
11.3 transform_code() (Generic Helper)
File: R/transform_code.R
Generic helper function used internally by the filing requirement transforms.
transform_code(
dt,
input_col,
lookup_key,
definition_col,
type_conversion_func = as.integer,
lookup
)