Difference between revisions of "Federal Grant Data"
(→NIH) |
|||
(30 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
− | == | + | {{Project |
+ | |Has project output=Data | ||
+ | |Has sponsor=McNair Center | ||
+ | |Has title=Federal Grant Data | ||
+ | |Has owner=Catherine Kirby, Jeemin Sim, Avesh Krishna, | ||
+ | |Has keywords=Data | ||
+ | |Has project status=Active | ||
+ | |Does subsume=NSF Data, NIH Data, | ||
+ | }} | ||
+ | <onlyinclude>The [[Federal Grant Data]] project collects and processes [[NIH Data]], [[NSF Data]], and other federal grant information from structured government sources and imports it into a relational database for use. See also: The [[Trial Data Project]] and the [[FDA Trials Data]] project.</onlyinclude> | ||
− | + | ==Latest Update== | |
+ | |||
+ | In April 2019, Ed updated the NSF and NIH data. | ||
+ | |||
+ | ===NSF=== | ||
+ | |||
+ | The update process was: | ||
+ | *Visit https://www.nsf.gov/awardsearch/download.jsp | ||
+ | *Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF | ||
+ | *Extract them to E:\projects\grants\NSF\XML | ||
+ | *Remove 3 bad XML files that cause the parser to crash | ||
+ | *Move the whole thing back to E:\mcnair\Projects\Federal Grant Data\NSF\NSF Extracted Data | ||
+ | *Fix up and run Jeemin_NSF_XML_Parser.py, so that it also takes ProgramElement (see below) and make: | ||
+ | ** | ||
+ | *Run LoadNSF.sql to produce tables NSFGeneral, NSFInvestigator and NSFInstitution in dbase '''grants''' | ||
+ | |||
+ | Note that '''program_code'''s are available from | ||
+ | *STTR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*STTR* | ||
+ | *SBIR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*SBIR* | ||
+ | |||
+ | Combined SBIR/STTR codes | ||
+ | '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266','5370','168E','1591','1505','Z408' | ||
+ | |||
+ | SBIR Codes: | ||
+ | '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266' | ||
+ | STTR Codes: | ||
+ | '5370','168E','1591','1505','Z408' | ||
+ | |||
+ | Unfortunately, we need the Program Element code, shown below, but our extractor doesn't currently pull it. | ||
+ | |||
+ | <xsd:element maxOccurs="unbounded" name="ProgramElement"> | ||
+ | <xsd:complexType> | ||
+ | <xsd:sequence> | ||
+ | <xsd:element name="Code" type="xsd:int"/> | ||
+ | <xsd:element name="Text" type="xsd:string"/> | ||
+ | </xsd:sequence> | ||
+ | </xsd:complexType> | ||
+ | </xsd:element> | ||
+ | |||
+ | Correction: This appears fixed. There is a programelementcode in nsfgeneral. However, the code is XXXX in 42.6% of cases. | ||
+ | |||
+ | ===NIH=== | ||
+ | |||
+ | The 2018 update process was: | ||
+ | *Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx | ||
+ | *Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files | ||
+ | *Combine them into RePORTER_PRJ_C_FY2016-17-18.csv, open in excel and save as tdt | ||
+ | *Add old data files to same directory | ||
+ | *Run LoadNIH.sql to product table nih in dbase '''grants''' | ||
+ | |||
+ | Supporting info for NIH grants: | ||
+ | *A data dictionary for the NIH grants is here: https://exporter.nih.gov/about.aspx | ||
+ | *'''Activity''' Codes are here: https://grants.nih.gov/grants/funding/ac_search_results.htm | ||
+ | *STTR codes are: R41,R42,UT1,UT2 | ||
+ | *SBIR codes are: R43,R44,U43,U44 | ||
+ | *SBIR/STTR Joint codes: RC3,SB1,UB1,UC3 | ||
+ | *'''Total_Cost''' is only available in certain instances but is the best cost variable. | ||
+ | |||
+ | The 2022 update process was: | ||
+ | *Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx | ||
+ | *Retrieve 2019, 2020, and 2021 csvs to E:\projects\grants\NIH\Source Files | ||
+ | *Open each in excel, copy and paste to a tab-delimited text file. | ||
+ | *Remove quotes from the file using regex (in textpad) and save as UTF8 | ||
+ | *Run clean.py to fix the bad records | ||
+ | *Run LoadNIH.sql to load to the grants database (put input files in z:/bulk/grants) | ||
+ | |||
+ | Example processing code (for selected stats for Vermont): | ||
+ | CREATE TABLE nih_vt_stats AS | ||
+ | SELECT extract(year from award_notice_date) as year, direct_cost_amt+indirect_cost_amt as sum_amounts, total_cost, full_project_num, | ||
+ | CASE WHEN direct_cost_amt+indirect_cost_amt>total_cost THEN direct_cost_amt+indirect_cost_amt ELSE total_cost END AS amount, | ||
+ | CASE WHEN org_name LIKE 'UNIVERSITY OF VERMONT%' THEN 1 ELSE 0 END AS uvm | ||
+ | FROM nih WHERE org_state='VT'; | ||
+ | --Updated Rows 9493 | ||
+ | |||
+ | SELECT year, uvm, sum(amount), count(DISTINCT full_project_num) | ||
+ | FROM nih_vt_stats GROUP BY year, uvm ORDER BY uvm,year; | ||
==grants.gov== | ==grants.gov== | ||
Lists agency, estimated funding, opportunity title. opportunity number, and the date. Can be used to look up who was awarded the grants. I downloaded all grants offered by the NIH since 2005 (when the data was available) | Lists agency, estimated funding, opportunity title. opportunity number, and the date. Can be used to look up who was awarded the grants. I downloaded all grants offered by the NIH since 2005 (when the data was available) | ||
− | + | E:\McNair\Projects\Federal Grant Data\NIH | |
==NSF Grant Data== | ==NSF Grant Data== | ||
Line 15: | Line 99: | ||
Data is located in: | Data is located in: | ||
− | + | E:\McNair\Projects\Federal Grant Data\NSF | |
Sample XML schema | Sample XML schema | ||
Line 93: | Line 177: | ||
This does not specify what the grant was for, but it does list the amount and organization who received the grant. Additionally, the city and state of the organization are included. The number of awards received by each organization in a given year is listed as well. Excel Tables can be downloaded for each each year at this site. I downloaded the table for 2016 as a sample. | This does not specify what the grant was for, but it does list the amount and organization who received the grant. Additionally, the city and state of the organization are included. The number of awards received by each organization in a given year is listed as well. Excel Tables can be downloaded for each each year at this site. I downloaded the table for 2016 as a sample. | ||
+ | |||
+ | CK: It is possible to get data. https://exporter.nih.gov/ExPORTER_Catalog.aspx | ||
+ | E:\McNair\Projects\Federal Grant Data\NIH | ||
+ | |||
+ | |||
+ | ==DOD== | ||
+ | |||
+ | DOD grant data folder (compiled for Augusta Ecosystem project) was copied from E:\McNair\Projects\Innovation Districts\Augusta Startup Ecosystem to E:\McNair\Projects\Federal Grant Data. It contains yearly data that can be filtered, and a few files on Augusta-related grants. |
Latest revision as of 16:58, 10 April 2022
Federal Grant Data | |
---|---|
Project Information | |
Has title | Federal Grant Data |
Has owner | Catherine Kirby, Jeemin Sim, Avesh Krishna |
Has start date | |
Has deadline date | |
Has keywords | Data |
Has project status | Active |
Does subsume | NSF Data, NIH Data |
Has sponsor | McNair Center |
Has project output | Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
The Federal Grant Data project collects and processes NIH Data, NSF Data, and other federal grant information from structured government sources and imports it into a relational database for use. See also: The Trial Data Project and the FDA Trials Data project.
Contents
Latest Update
In April 2019, Ed updated the NSF and NIH data.
NSF
The update process was:
- Visit https://www.nsf.gov/awardsearch/download.jsp
- Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF
- Extract them to E:\projects\grants\NSF\XML
- Remove 3 bad XML files that cause the parser to crash
- Move the whole thing back to E:\mcnair\Projects\Federal Grant Data\NSF\NSF Extracted Data
- Fix up and run Jeemin_NSF_XML_Parser.py, so that it also takes ProgramElement (see below) and make:
- Run LoadNSF.sql to produce tables NSFGeneral, NSFInvestigator and NSFInstitution in dbase grants
Note that program_codes are available from
- STTR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*STTR*
- SBIR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*SBIR*
Combined SBIR/STTR codes '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266','5370','168E','1591','1505','Z408'
SBIR Codes: '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266' STTR Codes: '5370','168E','1591','1505','Z408'
Unfortunately, we need the Program Element code, shown below, but our extractor doesn't currently pull it.
<xsd:element maxOccurs="unbounded" name="ProgramElement"> <xsd:complexType> <xsd:sequence> <xsd:element name="Code" type="xsd:int"/> <xsd:element name="Text" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Correction: This appears fixed. There is a programelementcode in nsfgeneral. However, the code is XXXX in 42.6% of cases.
NIH
The 2018 update process was:
- Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
- Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files
- Combine them into RePORTER_PRJ_C_FY2016-17-18.csv, open in excel and save as tdt
- Add old data files to same directory
- Run LoadNIH.sql to product table nih in dbase grants
Supporting info for NIH grants:
- A data dictionary for the NIH grants is here: https://exporter.nih.gov/about.aspx
- Activity Codes are here: https://grants.nih.gov/grants/funding/ac_search_results.htm
- STTR codes are: R41,R42,UT1,UT2
- SBIR codes are: R43,R44,U43,U44
- SBIR/STTR Joint codes: RC3,SB1,UB1,UC3
- Total_Cost is only available in certain instances but is the best cost variable.
The 2022 update process was:
- Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
- Retrieve 2019, 2020, and 2021 csvs to E:\projects\grants\NIH\Source Files
- Open each in excel, copy and paste to a tab-delimited text file.
- Remove quotes from the file using regex (in textpad) and save as UTF8
- Run clean.py to fix the bad records
- Run LoadNIH.sql to load to the grants database (put input files in z:/bulk/grants)
Example processing code (for selected stats for Vermont):
CREATE TABLE nih_vt_stats AS SELECT extract(year from award_notice_date) as year, direct_cost_amt+indirect_cost_amt as sum_amounts, total_cost, full_project_num, CASE WHEN direct_cost_amt+indirect_cost_amt>total_cost THEN direct_cost_amt+indirect_cost_amt ELSE total_cost END AS amount, CASE WHEN org_name LIKE 'UNIVERSITY OF VERMONT%' THEN 1 ELSE 0 END AS uvm FROM nih WHERE org_state='VT'; --Updated Rows 9493 SELECT year, uvm, sum(amount), count(DISTINCT full_project_num) FROM nih_vt_stats GROUP BY year, uvm ORDER BY uvm,year;
grants.gov
Lists agency, estimated funding, opportunity title. opportunity number, and the date. Can be used to look up who was awarded the grants. I downloaded all grants offered by the NIH since 2005 (when the data was available)
E:\McNair\Projects\Federal Grant Data\NIH
NSF Grant Data
The NSF grant data are in folders by year, each grant is an xml file. The format of a typical file is shown below. There is data available from 1959 onwards, and full records after 1976. In 2016 there were ~10,000 awards.
Each year can be downloaded from: https://www.nsf.gov/awardsearch/download.jsp
Data is located in:
E:\McNair\Projects\Federal Grant Data\NSF
Sample XML schema
<rootTag> <Award> <AwardTitle> Collaborative Research: Hybrid Organic-Inorganic Thermoelectric Materials </AwardTitle> <AwardEffectiveDate>08/31/2016</AwardEffectiveDate> <AwardExpirationDate>06/30/2017</AwardExpirationDate> <AwardAmount>95461</AwardAmount> <AwardInstrument> <Value>Standard Grant</Value> </AwardInstrument> <Organization>07030000
<Directorate> <LongName>Directorate For Engineering</LongName> </Directorate> <Division> <LongName>Div Of Civil, Mechanical, & Manufact Inn</LongName> </Division> </Organization> <ProgramOfficer> <SignBlockName>Thomas F. Kuech</SignBlockName> </ProgramOfficer> <AbstractNarration> Thermoelectric materials are materials which can be used to convert thermal energy directly to electricity. The performance of a thermoelectric material is measured by the "figure of merit", termed ZT. There has been much research into increasing thermoelectric materials, figure of merit, however, progress in this area has been slow and most of the researched thermoelectric materials up to now are suffering from either high fabrication cost, usage of rare earth or toxic elements, or poor mechanical properties. Organic thermoelectric materials (OTEs) have recently attracted attention for low temperature applications (< 300K), especially cooling purposes, as they are flexible, low-cost and abundant, and low-cost fabrication methods for synthesizing them exist... A large class of semiconducting nanostructures (e.g. Si, CdTe, Bi, and PbTe nanowires and holely structures) combined with conjugated polymers (e.g., chemically-modified PEDOT and low bandgap polymers) and organic molecules (specifically charged chemical species attached to molecules such as CF3- substituted styrene molecules) will be simulated, synthesized and optimized to identify new hybrid materials with a potentially high ZT. </AbstractNarration> <MinAmdLetterDate>12/22/2016</MinAmdLetterDate> <MaxAmdLetterDate>12/22/2016</MaxAmdLetterDate> <ARRAAmount/> <AwardID>1723353</AwardID> <Investigator> <FirstName>Mona</FirstName> <LastName>Zebarjadi</LastName> <EmailAddress>mz6g@virginia.edu</EmailAddress> <StartDate>12/22/2016</StartDate> <EndDate/> <RoleCode>Principal Investigator</RoleCode> </Investigator> <Institution> <Name>University of Virginia Main Campus</Name> <CityName>CHARLOTTESVILLE</CityName> <ZipCode>229044195</ZipCode> <PhoneNumber>4349244270</PhoneNumber> <StreetAddress>P.O. BOX 400195</StreetAddress> <CountryName>United States</CountryName> <StateName>Virginia</StateName> <StateCode>VA</StateCode> </Institution> <ProgramElement>8092
<Text>Materials Eng. & Processing</Text> </ProgramElement> <ProgramReference>024E
<Text>MATERIALS DESIGN</Text> </ProgramReference> <ProgramReference>036E
<Text>CIVIL INFRASTRUCTURE</Text> </ProgramReference> </Award> </rootTag>
Survey of Federal Funds for Research and Development Fiscal Years 2014–16
Excel spreadsheets with data regarding Federal Agency budgets and spending on R&D [1]
E:\McNair\Projects\Federal Grant Data\NSF\NSF Data Tables
NIH
NIH grants for a given year can be downloaded from https://report.nih.gov/award/index.cfm
This does not specify what the grant was for, but it does list the amount and organization who received the grant. Additionally, the city and state of the organization are included. The number of awards received by each organization in a given year is listed as well. Excel Tables can be downloaded for each each year at this site. I downloaded the table for 2016 as a sample.
CK: It is possible to get data. https://exporter.nih.gov/ExPORTER_Catalog.aspx
E:\McNair\Projects\Federal Grant Data\NIH
DOD
DOD grant data folder (compiled for Augusta Ecosystem project) was copied from E:\McNair\Projects\Innovation Districts\Augusta Startup Ecosystem to E:\McNair\Projects\Federal Grant Data. It contains yearly data that can be filtered, and a few files on Augusta-related grants.