You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 01_data_import_pre_geo.Rmd
+25-37Lines changed: 25 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -15,73 +15,65 @@ library(janitor)
15
15
filing req code - don't have to file any forms or not 01 = 990, 02 = 990 but less than 25000 income, 03 = group return (don't know what that means), 06 = church, 07 = gov, 00 = not required to file
16
16
17
17
18
-
updated date is 2025-04-13 00:10:44
19
-
20
-
download date 5/5/2025:
18
+
download date 9/4/2025:
21
19
https://www.irs.gov/downloads/irs-soi -->then search for eo_md.csv
22
-
more about it: https://www.irs.gov/statistics/soi-tax-stats-about-soi
20
+
more about soi files: https://www.irs.gov/statistics/soi-tax-stats-about-soi
EIN, Tax Year , Organization Name, Gross receipts not greater than, Organization has terminated, Tax Period Begin Date , Tax Period End Date , Website URL , Principal Officer’s Name , Principal Officer’s Address Line 1, Principal Officer’s Address Line 2, Principal Officer’s Address City , Principal Officer’s Address Province, Principal Officer’s Address State , Principal Officer’s Address Zip Code , Principal Officer’s Address Country ,Organization Mailing Address Line 1 , Organization Mailing Address Line 2 , Organization Mailing Address City , Organization Mailing Address Province , Organization Mailing Address State , Organization Mailing Address Zip Code, Organization Mailing Address Country ,Organization Doing Business as Name 1, Organization Doing Business as Name 2 , Organization Doing Business as Name 3 ,
EIN, Tax Year , Organization Name, Gross receipts not greater than, Organization has terminated, Tax Period Begin Date , Tax Period End Date , Website URL , Principal Officer’s Name , Principal Officer’s Address Line 1, Principal Officer’s Address Line 2, Principal Officer’s Address City , Principal Officer’s Address Province, Principal Officer’s Address State , Principal Officer’s Address Zip Code , Principal Officer’s Address Country ,Organization Mailing Address Line 1 , Organization Mailing Address Line 2 , Organization Mailing Address City , Organization Mailing Address Province , Organization Mailing Address State , Organization Mailing Address Zip Code, Organization Mailing Address Country ,Organization Doing Business as Name 1, Organization Doing Business as Name 2 , Organization Doing Business as Name 3
43
35
44
-
#trying a different way to read in the data with different column lengths
<!-- # checking the original version of the file for one of the problematic rows -->
58
-
<!-- <!-- awk 'FNR>=9294 && FNR<=9294' New_version_data/data-download-epostcard_2024.txt # first problematic row is actually originally on line 9294 --> -->
<!-- <!-- awk 'FNR>=1 && FNR<=3' New_version_data/data-download-epostcard_2024.txt # checking line 1-3 to ensure that there were some empty lines for the first two lines - yup looks like it! --> -->
63
-
<!-- ``` -->
64
48
65
49
### Get row names:
66
50
```{r}
67
-
colnames(epost) <- c("EIN", "Tax Year" , "Organization Name", "Gross receipts not greater than", "Organization has terminated", "Tax Period Begin Date" , "Tax Period End Date" , "Website URL" , "Principal Officer’s Name" , "Principal Officer’s Address Line 1", "Principal Officer’s Address Line 2", "Principal Officer’s Address City" , "Principal Officer’s Address Province", "Principal Officer’s Address State" , "Principal Officer’s Address Zip Code" , "Principal Officer’s Address Country" ,"Organization Mailing Address Line 1" , "Organization Mailing Address Line 2" , "Organization Mailing Address City" , "Organization Mailing Address Province" , "Organization Mailing Address State" , "Organization Mailing Address Zip Code", "Organization Mailing Address Country" ,"Organization Doing Business as Name 1", "Organization Doing Business as Name 2", "Organization Doing Business as Name 3")
51
+
colnames(epost) <- c("EIN", "Tax Year" , "Organization Name", "Gross receipts not greater than", "Organization has terminated", "Tax Period Begin Date" , "Tax Period End Date" , "Website URL" , "Principal Officer’s Name" , "Principal Officer’s Address Line 1", "Principal Officer’s Address Line 2", "Principal Officer’s Address City" , "Principal Officer’s Address Province", "Principal Officer’s Address State" , "Principal Officer’s Address Zip Code" , "Principal Officer’s Address Country" ,"Organization Mailing Address Line 1" , "Organization Mailing Address Line 2" , "Organization Mailing Address City" , "Organization Mailing Address Province" , "Organization Mailing Address State" , "Organization Mailing Address Zip Code", "Organization Mailing Address Country" ,"Organization Doing Business as Name 1", "Organization Doing Business as Name 2", "Organization Doing Business as Name 3", "extra_1", "extra_2")
68
52
69
53
70
54
epost <-clean_names(epost)# removing spaces from names
71
55
72
-
# checking the rows that have extra columns - only one for MD but will keep it in the dataset
epost <- epost %>% filter(principal_officer_s_address_state == "MD") #filter for only MD
84
75
76
+
epost <-epost %>% mutate(ein = as.character(ein))
85
77
epost_only <-anti_join(epost,irs_new, by ="ein")
86
78
nrow(epost_only) # not in irs_new
87
79
nrow(irs_new)
@@ -107,9 +99,8 @@ nrow(IRS) == nrow(irs_new)# test if dimensions are still the same
107
99
108
100
https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-bulk-data-downloads --> click on automatic revocation or exemption list link
109
101
110
-
Direct link downloaded 5_8_25 (this dataset will change overtime): https://apps.irs.gov/pub/epostcard/data-download-revocation.zip
102
+
Direct link downloaded 9/4/2025 (this dataset will change overtime): https://apps.irs.gov/pub/epostcard/data-download-revocation.zip
111
103
112
-
last updated Dec,9, 2024
113
104
114
105
Under Internal Revenue Code Section 6033(j)(1)(A), the IRS will revoke the tax-exempt status of nonprofit organizations automatically when organizations with a filing requirement do not file their required Forms 990 for three consecutive years. When such a revocation occurs, it is effective as of the filing deadline, which is typically May 15 for organizations filing on a calendar year basis. This year, however, the Form 990 filing deadline was delayed to July 15 due to the COVID-19 pandemic
115
106
@@ -120,10 +111,7 @@ more info: https://www.irs.gov/pub/irs-pdf/p4991.pdf
0 commit comments