Generated: 2026-05-06T20:46:11+0000 Quality gate: PASSED Total rows (unique EINs): 3,672,933 Source files stacked: 117 Vintages contributing rows: 113
Summary
The Master BMF is a single-row-per-EIN consolidation drawn from the most-recent vintage in which each EIN appears across both the current monthly BMF pipeline and the legacy 501CX-NONPROFIT-PX pipeline. See chapter 11 for the full design.
| Total rows |
3,672,933 |
| Distinct EINs |
3,672,933 |
| Rows == distinct EINs (uniqueness gate) |
✅ yes |
| Input CSV files |
117 |
| Vintages with at least one surviving row |
113 |
Rows by source pipeline
| current |
2,193,075 |
59.71 |
| legacy |
1,479,858 |
40.29 |
First / last year coverage
| Earliest first_year_in_bmf |
1989 |
| Latest first_year_in_bmf |
2026 |
| Earliest last_year_in_bmf |
1989 |
| Latest last_year_in_bmf |
2026 |
Vintages observed per EIN
How many distinct BMF vintages did each EIN appear in?
| Minimum |
1 |
| Average |
49.8 |
| Maximum |
213 |
| EINs that appear in only one vintage |
263,399 |
Vintage histogram
Number of surviving master rows contributed by each vintage. A vintage with many rows contributed many EINs whose latest sighting was in that month.
| 2026-05 |
current |
1,952,238 |
| 1989-06 |
legacy |
193,984 |
| 2010-11 |
legacy |
135,506 |
| 2016-08 |
legacy |
97,834 |
| 2020-04 |
legacy |
94,227 |
| 2022-01 |
legacy |
54,788 |
| 2024-08 |
current |
46,513 |
| 2019-08 |
legacy |
42,746 |
| 1996-06 |
legacy |
42,536 |
| 2011-06 |
legacy |
42,426 |
| 2025-08 |
current |
37,546 |
| 1998-09 |
legacy |
34,539 |
| 2000-05 |
legacy |
32,319 |
| 2023-08 |
current |
28,365 |
| 2011-12 |
legacy |
26,594 |
| 2022-08 |
legacy |
25,109 |
| 1995-08 |
legacy |
23,316 |
| 1997-10 |
legacy |
22,987 |
| 2007-09 |
legacy |
22,466 |
| 2014-06 |
legacy |
20,218 |
| 2007-04 |
legacy |
20,011 |
| 2012-12 |
legacy |
19,608 |
| 2016-04 |
legacy |
19,385 |
| 2013-07 |
legacy |
18,737 |
| 2004-12 |
legacy |
18,217 |
| 2003-11 |
legacy |
18,166 |
| 2003-01 |
legacy |
18,142 |
| 2004-04 |
legacy |
16,696 |
| 2013-02 |
legacy |
16,395 |
| 2018-12 |
legacy |
16,307 |
| 1999-12 |
legacy |
15,078 |
| 2015-07 |
legacy |
13,439 |
| 2006-05 |
legacy |
12,673 |
| 2023-07 |
current |
12,497 |
| 2026-01 |
current |
12,400 |
| 2010-08 |
legacy |
11,730 |
| 2008-01 |
legacy |
10,791 |
| 2002-07 |
legacy |
10,687 |
| 2008-10 |
legacy |
10,672 |
| 2009-10 |
legacy |
10,533 |
| 2001-07 |
legacy |
9,981 |
| 2026-03 |
current |
9,820 |
| 2024-03 |
current |
9,077 |
| 2010-07 |
legacy |
8,990 |
| 2006-01 |
legacy |
8,963 |
| 2010-05 |
legacy |
8,946 |
| 2012-08 |
legacy |
8,895 |
| 2008-04 |
legacy |
8,791 |
| 2007-01 |
legacy |
8,770 |
| 2002-01 |
legacy |
8,570 |
| 2024-04 |
current |
8,457 |
| 2024-02 |
current |
8,300 |
| 2015-02 |
legacy |
8,240 |
| 2025-03 |
current |
8,048 |
| 2009-04 |
legacy |
8,046 |
| 2012-06 |
legacy |
7,988 |
| 2003-07 |
legacy |
7,964 |
| 2014-02 |
legacy |
7,694 |
| 2014-12 |
legacy |
7,541 |
| 2023-12 |
current |
6,984 |
| 2010-04 |
legacy |
6,934 |
| 2015-09 |
legacy |
6,894 |
| 2009-01 |
legacy |
6,615 |
| 2008-06 |
legacy |
6,585 |
| 2011-09 |
legacy |
6,505 |
| 2025-02 |
current |
6,419 |
| 2012-04 |
legacy |
6,302 |
| 2015-05 |
legacy |
6,224 |
| 2014-04 |
legacy |
6,138 |
| 2015-12 |
legacy |
6,060 |
| 2006-11 |
legacy |
6,009 |
| 2009-07 |
legacy |
5,812 |
| 2011-08 |
legacy |
5,755 |
| 2005-07 |
legacy |
5,634 |
| 2024-12 |
current |
5,572 |
| 2005-11 |
legacy |
5,423 |
| 2013-12 |
legacy |
5,396 |
| 2016-03 |
legacy |
5,026 |
| 2014-09 |
legacy |
4,656 |
| 2011-07 |
legacy |
4,608 |
| 2023-11 |
current |
4,576 |
| 2013-05 |
legacy |
4,529 |
| 2008-12 |
legacy |
4,225 |
| 2025-06 |
current |
4,131 |
| 2012-07 |
legacy |
4,071 |
| 2013-09 |
legacy |
3,745 |
| 2013-10 |
legacy |
3,740 |
| 2025-12 |
current |
3,656 |
| 2012-02 |
legacy |
3,628 |
| 2010-01 |
legacy |
3,578 |
| 2015-11 |
legacy |
3,542 |
| 2024-06 |
current |
3,537 |
| 2025-11 |
current |
3,511 |
| 2013-06 |
legacy |
3,408 |
| 2012-10 |
legacy |
3,378 |
| 2025-04 |
current |
3,370 |
| 2015-04 |
legacy |
3,261 |
| 2025-07 |
current |
3,172 |
| 2013-08 |
legacy |
3,110 |
| 2025-05 |
current |
3,108 |
| 2013-03 |
legacy |
3,079 |
| 2011-10 |
legacy |
3,052 |
| 2011-11 |
legacy |
2,966 |
| 2012-03 |
legacy |
2,874 |
| 2012-11 |
legacy |
2,768 |
| 2023-10 |
current |
2,679 |
| 2024-11 |
current |
2,451 |
| 2023-09 |
current |
2,436 |
| 2013-04 |
legacy |
2,434 |
| 2025-09 |
current |
2,143 |
| 2024-07 |
current |
2,069 |
| 2014-11 |
legacy |
2,058 |
| 2016-02 |
legacy |
1,595 |
Column completeness
Top 30 most-complete columns (highest non-null fraction):
| ein_raw |
100.00 |
| ein |
100.00 |
| org_name_raw |
100.00 |
| org_name_join |
100.00 |
| org_name_display |
100.00 |
| org_addr_zip_raw |
100.00 |
| org_addr_state_invalid |
100.00 |
| ntee_code_clean |
100.00 |
| ntee_common_code |
100.00 |
| ntee_common_code_definition |
100.00 |
| ntee_code_major_group |
100.00 |
| naics_code |
100.00 |
| ntee_code_definition |
100.00 |
| nteev2_code |
100.00 |
| nteev2_subsector |
100.00 |
| nteev2_org_type |
100.00 |
| nteev2 |
100.00 |
| bmf_source |
100.00 |
| combined_first_vintage_ym |
100.00 |
| combined_last_vintage_ym |
100.00 |
| bmf_vintages_observed |
100.00 |
| first_vintage_ym |
100.00 |
| last_vintage_ym |
100.00 |
| bmf_vintage_ym |
100.00 |
| first_year_in_bmf |
100.00 |
| last_year_in_bmf |
100.00 |
| org_addr_city_raw |
99.92 |
| org_addr_city |
99.92 |
| org_addr_state_raw |
99.88 |
| org_addr_state |
99.81 |
Bottom 30 least-complete columns:
| org_parent_name |
3.08 |
| activity_code_definitions |
20.94 |
| activity_code_categories |
20.94 |
| dba_name_raw |
25.09 |
| dba_name |
25.09 |
| org_legal_suffix |
36.41 |
| in_care_of_name_clean |
38.45 |
| in_care_of_name_raw |
38.53 |
| revenue_amount |
52.28 |
| all_classifications_string |
59.68 |
| in_care_of_name_provided |
59.71 |
| org_addr_street_raw |
59.71 |
| org_addr_has_special_chars |
59.71 |
| org_addr_is_po_box |
59.71 |
| org_addr_is_rural_route |
59.71 |
| org_addr_street |
59.71 |
| org_addr_zip4 |
59.71 |
| org_addr_full |
59.71 |
| org_addr_is_missing |
59.71 |
| org_addr_missing_number |
59.71 |
| classification_code |
59.71 |
| affiliation_code |
59.71 |
| affiliation_code_definition |
59.71 |
| deductibility_code |
59.71 |
| deductibility_code_definition |
59.71 |
| organization_code |
59.71 |
| organization_code_definition |
59.71 |
| status_code |
59.71 |
| status_code_definition |
59.71 |
| activity_code |
59.71 |
Notes
- All output columns are VARCHAR. The master build forces every column to string type so DuckDB’s
UNION ALL BY NAME can combine the legacy slim per-vintage schema with the current full schema without per-column type-mismatch errors. Cast in your downstream tool: as.numeric(asset_amount) in R, pd.to_numeric(df['asset_amount'], errors='coerce') in pandas, etc. Lat/lon in the geocoded master output are explicitly cast to DOUBLE.
- Legacy-only EINs have sparse columns. EINs whose latest appearance is in a pre-2014 legacy vintage carry only the legacy slim schema; current-pipeline-only columns (e.g. NTEEv2 fields, modern address quality flags) are NULL. The
bmf_source column flags the source of each surviving row.