Difference between revisions of "LBO Data Description"

From edegan.com
Jump to navigation Jump to search
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Project
 +
|Has project output=Content
 +
|Has sponsor=McNair Center
 +
|Has title=LBO Data Description
 +
|Has owner=James Chen,Brian Ayash,Ed Egan
 +
|Has project status=Complete
 +
}}
 +
 +
 
This page is included in the project [[Leveraged Buyout Innovation (Academic Paper)]]
 
This page is included in the project [[Leveraged Buyout Innovation (Academic Paper)]]
 
==Work Scripts==
 
==Work Scripts==
 
===SQL===
 
===SQL===
*<code>E:\McNair\Projects\LBO\LBO academic paper script pre compustat</code>
+
*<code>E:\McNair\Projects\LBO\patent data script</code>
 
*<code>E:\McNair\Projects\LBO\LBO academic paper script post compustat</code>
 
*<code>E:\McNair\Projects\LBO\LBO academic paper script post compustat</code>
 
===Stata===
 
===Stata===
Line 10: Line 19:
 
==Work Description==
 
==Work Description==
 
===LBO Data and Patent Data===
 
===LBO Data and Patent Data===
See script for exact work.
+
See script for exact work: <code>E:\McNair\Projects\LBO\patent data script</code>
  
 
*Get the source file for the LBOs
 
*Get the source file for the LBOs
Line 30: Line 39:
 
*Load the LBO data into a dbase
 
*Load the LBO data into a dbase
  
*match the LBO data to the patent data
+
*Match the LBO data to the patent data
 
:<code>Z:\allpatentsprocessed\LBO Patent matched</code>
 
:<code>Z:\allpatentsprocessed\LBO Patent matched</code>
  
*Note - all data in allpatentsprocessed database
+
*Note - all data in '''allpatentsprocessed''' database
 
:access it by logging on to <code>researcher@McNair DBServ:/bulk/allpatentsprocessed</code>
 
:access it by logging on to <code>researcher@McNair DBServ:/bulk/allpatentsprocessed</code>
  
*Join patent data to assignee data, creating firstjoin_cleaned. firstjoin_cleaned shows organization names of assignees from the patent data and their patent numbers
+
*Create Table firstjoin_cleaned: Join patent data to assignee data, creating firstjoin_cleaned. firstjoin_cleaned shows organization names of assignees from the patent data and their patent numbers
:again, just look inside database
 
  
*join firstjoin_cleaned to matchassignees data, creating secondjoin_cleaned. secondjoin_cleaned shows standardized organization names, organization names, and patent numbers
+
*Create Table secondjoin_cleaned: join firstjoin_cleaned to matchassignees data, creating secondjoin_cleaned. secondjoin_cleaned shows standardized organization names, organization names, and patent numbers
  
*join secondjoin_cleaned to lbopatentmatch data, creating thirdjoin_cleaned. thirdjoin_cleaned shows LBO's names, standardized organization names, and their patent numbers in order to show lbo's that had patents
+
OLD PROCESS (NOT USED ANYMORE)
:a text file of the final copy can be found here
+
join secondjoin_cleaned to lbopatentmatch data, creating thirdjoin_cleaned. thirdjoin_cleaned shows LBO's names, standardized organization names, and their patent numbers in order to show lbo's that had patents
::<code>Z:\allpatentsprocessed\LBO Patent Data Joined</code>
+
  a text file of the final copy can be found here
 +
  <code>Z:\allpatentsprocessed\LBO Patent Data Joined</code>
 +
 +
create table lbopatentinfo showing lbo names, patent numbers, and grant dates of the patents
 +
 +
use lbopatentinfo to create table lbopatentsummary to show lbo names, amount of patents they have, minimum grant date years, maximum grant date years, and average grant date years
 +
 +
import lbo data matched against itself into table(lbodistinct), then use that table and lbopatentsummary to create to create lbofinal. lbofinal then shows distinct lbonames, number of patents, minimum grant year, maximum grant year, and average grant year
 +
 +
use lbofinal to create lboreallyfinal. lboreallyfinal is the same as lbofinal but with null space in number of patents replaced with 0
 +
 +
FINAL DATA IN THE FOLLOWING FOLDER
 +
  <code>E:\McNair\Projects\LBO</code>
 +
 +
#After getting lboreallyfinal, create a table with all of its columns in addition to the lbo deal numbers, and whether it was public to private or private to private in the following steps
 +
#import LBOcleaneddata.txt into table dealnumbxcleannames. it will have the deal numbers and cleaned original lbo names (duplicates not removed)
 +
#import Originallbomatch.txt into table lbonamesxcleannames. it will have the matched standardized lbo names and the cleaned original lbo names
 +
#join those two tables together to reach the standardized lbo names with the deal numbers in table lbonamesxdealnumb, and create a column that says whether the deal was public to private or private to private in the table lboxpriv
 +
#join lbo_list_and_data and lbopatentmatch together using lboxpriv as a go between. this provides a reference in case we want to join using stdorgname to the patent database
 +
#join lbo_list_and_data and lboreallyfinal together using lboxpriv as a go between. this is the final table it can be found in the following folder under the name lbopatentsreallyfinal
 +
  :<code>E:\McNair\Projects\LBO\Data Sets</code>
  
*create table lbopatentinfo showing lbo names, patent numbers, and grant dates of the patents
+
===Introduction of Compustat and Preparation for Stata===
 +
See script for exact work: <code>E:\McNair\Projects\LBO\LBO academic paper script post compustat</code>
  
*use lbopatentinfo to create table lbopatentsummary to show lbo names, amount of patents they have, minimum grant date years, maximum grant date years, and average grant date years
+
Create tables as follows:
  
*import lbo data matched against itself into table(lbodistinct), then use that table and lbopatentsummary to create to create lbofinal. lbofinal then shows distinct lbonames, number of patents, minimum grant year, maximum grant year, and average grant year
+
*'''SDC''' with SDC data imported from the initial LBO data.
  
*use lbofinal to create lboreallyfinal. lboreallyfinal is the same as lbofinal but with null space in number of patents replaced with 0
+
*'''LBOCOMPUSTAT4''' with Compustat data imported.
  
*FINAL DATA IN THE FOLLOWING FOLDER
+
*'''dealnumbxgvkey''' selecting deal numbers and gvkeys from table sdc.
:<code>E:\McNair\Projects\LBO</code>
 
  
*After getting lboreallyfinal, create a table with all of its columns in addition to the lbo deal numbers, and whether it was public to private or private to private in the following steps
+
*'''compustatnamematch''' with compustat names matched against themselves imported
#import LBOcleaneddata.txt into table dealnumbxcleannames. it will have the deal numbers and cleaned original lbo names (duplicates not removed)
 
#import Originallbomatch.txt into table lbonamesxcleannames. it will have the matched standardized lbo names and the cleaned original lbo names
 
#join those two tables together to reach the standardized lbo names with the deal numbers in table lbonamesxdealnumb, and create a column that says whether the deal was public to private or private to private in the table lboxpriv
 
#join lbo_list_and_data and lbopatentmatch together using lboxpriv as a go between. this provides a reference in case we want to join using stdorgname to the patent database
 
#join lbo_list_and_data and lboreallyfinal together using lboxpriv as a go between. this is the final table it can be found in the following folder under the name lbopatentsreallyfinal
 
:<code>E:\McNair\Projects\LBO\Data Sets</code>
 
  
===Introduction of Compustat and Preparation for Stata===
+
*'''stdcompnamestdorgnamematched''' with standardized compustat names matched against stdorgname
See script for exact work.
 
  
*Create table sdc with SDC data imported from the initial LBO data.
+
*'''matchedlbocompnames''' with standardized original lbo names matched to standardized compustat names
  
*Create table LBOCOMPUSTAT4 with Compustat data imported.
+
*'''compnamematchlbocompnames''' with standardized compustat names, compustatnames, and standardized original lbo names
**Note: There were 3 prior Compustat pulls. The 1st is used to create several cross reference tables that are not made obsolete by the subsequent pulls.
 
  
*Create table dealnumbxgvkey selecting deal numbers and gvkeys from table sdc.
+
*'''stdcompnamepatentjoin''' with variables standardized compustat names and patent numbers
  
*Create table compustatnamematch with compustat names matched against themselves imported
+
*'''compustatpatentinfo''' with variables standardized compustat names, patent numbers, and grant dates
  
*Create table stdcompnamestdorgnamematched with standardized compustat names matched against stdorgname
+
*'''stdcompnamexgvkeycleaned''' with variables standardized compustat names and gvkeys
  
*Create table stdcompnamepatentjoin with variables standardized compustat names and patent numbers
+
*'''ALLGVKey''' with all distinct gvkeys from stdcompnamexgvkeycleaned
  
*Create table compustatpatentinfo with variables standardized compustat names, patent numbers, and grant dates
+
*'''AllYears''' with all years from 1970 to 2016
  
*Create table stdcompnamexgvkeycleaned with variables standardized compustat names and gvkeys
+
*'''KeysXYears''' with variables gvkey and year. Each gvkey corresponds to 47 years.
  
*Create table ALLGVKey with all distinct gvkeys from stdcompnamexgvkeycleaned
+
*'''keysxyearsxstdcompname''' with variables gvkeys, years, and standardized compustat names
  
*Create table AllYears with all years from 1970 to 2016
+
*'''compupatentyearsummary''' with variables standardized compustat names, gvkeys, years, and number of patents per year
  
*Create table KeysXYears with variables gvkey and year. Each gvkey corresponds to 47 years.
+
*'''dealnumbxgvkeyxlboyeartemp''' with variables deal numbers, gvkeys, LBO years, and exit years
  
*Create table keysxyearsxstdcompname with variables gvkeys, years, and standardized compustat names
+
*'''dealnumbxgvkeyxlboyear''' with variables deal numbers, gvkeys, lbo years, exit years, entry year 1, entry year 2, exit year 1, and exit year 2
  
*Create table compupatentyearsummary with variables standardized compustat names, gvkeys, years, and number of patents per year
+
*'''compupatentlboyearsummary''' with variables standard compustat names, gvkeys, years, number of patents per year, and lbo entries/exits in each year
  
*Create table dealnumbxgvkeyxlboyeartemp with variables deal numbers, gvkeys, LBO years, and exit years
+
*'''statastaging''' with all variables from compupatentlboyearsummary in addition to all the pulled compustat variables
  
*Create table dealnumbxgvkeyxlboyear with variables deal numbers, gvkeys, lbo years, exit years, entry year 1, entry year 2, exit year 1, and exit year 2
+
*'''sicnaicsconversion''' with sic codes and naics codes as a conversion table for adding naics that correspond to the sic codes in blank rows. Table from https://www.census.gov/eos/www/naics/concordances/concordances.html
  
*Create table compupatentlboyearsummary with variables standard compustat names, gvkeys, years, number of patents per year, and lbo entries/exits in each year
+
*'''statastagingnew''' with two extra columns, concordsic and concordnaics, then export to stata
  
*Create table statastaging with all variables from compupatentlboyearsummary in addition to all the pulled compustat variables
+
===Stata Work===
 
+
See script for exact work: <code>E:\McNair\Projects\LBO\STATAdatasetup2</code>
*Create table sicnaicsconversion with sic codes and naics codes as a conversion table for adding naics that correspond to the sic codes in blank rows
 
  
*Create table statastagingnew with two extra columns, concordsic and concordnaics, then export to stata
 
 
===Stata Work===
 
See script for exact work.
 
 
====Setup====
 
====Setup====
 
*Label Compustat/patent variables imported from SQL  
 
*Label Compustat/patent variables imported from SQL  
Line 118: Line 133:
 
*Find first fiscal year in which data appears for each company
 
*Find first fiscal year in which data appears for each company
 
*Find last fiscal year in which data appears for each company, or set it to 2016
 
*Find last fiscal year in which data appears for each company, or set it to 2016
 +
*Recode the industry (based on NAICS2) to Manufacturing, Retail, Prof. Services, and other
 
*Generate additional lagged variables
 
*Generate additional lagged variables
  
 
====Analysis====
 
====Analysis====
 +
See script for exact work: <code>E:\McNair\Projects\LBO\STATAanalysis2</code>
 
*Summary of variables intended for use in hazard model (mean, count)
 
*Summary of variables intended for use in hazard model (mean, count)
 
*Run Cox Proportional Hazard Model multiple times to determine statistically significant variables
 
*Run Cox Proportional Hazard Model multiple times to determine statistically significant variables
 
*Regress on variables that are questionable to determine which may be correlated
 
*Regress on variables that are questionable to determine which may be correlated
 
*Run Cox Hazard again with final list of variables
 
*Run Cox Hazard again with final list of variables
 +
*Run diff-and-diff
  
 
==Regressions & Tables==  
 
==Regressions & Tables==  
Line 223: Line 241:
 
RE -- Retained Earnings
 
RE -- Retained Earnings
  
==Resources==
+
==SQL Fixes==
WRDS USER: mcnair
+
(Moved here from deleted page: LBO Fix List)
WRDS PASS: 9Mil2015
+
 
 +
*Improve SIC to NAICS fill in further
 +
**Manually add in conversions from three digit SIC (e.g., 1110) to three digit NAICS (e.g., 134)
 +
**Re-import into STATA
 +
 
  
http://www.axial.net/forum/how-private-equity-screens-lbo-candidates/ - Variables that private equity firms examine to decide whether or not to lbo
+
[[category:Internal]]

Latest revision as of 16:44, 5 October 2020


Project
LBO Data Description
Project logo 02.png
Project Information
Has title LBO Data Description
Has owner James Chen, Brian Ayash, Ed Egan
Has start date
Has deadline date
Has project status Complete
Has sponsor McNair Center
Has project output Content
Copyright © 2019 edegan.com. All Rights Reserved.


This page is included in the project Leveraged Buyout Innovation (Academic Paper)

Work Scripts

SQL

  • E:\McNair\Projects\LBO\patent data script
  • E:\McNair\Projects\LBO\LBO academic paper script post compustat

Stata

  • E:\McNair\Projects\LBO\STATAdatasetup2
  • E:\McNair\Projects\LBO\STATAanalysis2

Work Description

LBO Data and Patent Data

See script for exact work: E:\McNair\Projects\LBO\patent data script

  • Get the source file for the LBOs
E:\McNair\Projects\LBO\LBO_List_and_data.xlsx Original data source
  • Clean it up
E:\McNair\Projects\LBO\LBO_cleaned_data.xlsx Names of LBO's/deal numbers with extraneous symbols and words removed
  • Match it against itself
E:\McNair\Software\Scripts\Matcher\Output\LBO_cleaned_data.txt-LBO_cleaned_data.txt.matched Original matched data
E:\McNair\Projects\LBO\LBO_cleaned_data_duplicates_removed
  • Get the patent data
Draw the distinct assignees
Z:\allpatentsprocessed\DistinctAssignees2
Match them against themselves
Z:\allpatentsprocessed\DistinctAssignees2matched
  • Load the LBO data into a dbase
  • Match the LBO data to the patent data
Z:\allpatentsprocessed\LBO Patent matched
  • Note - all data in allpatentsprocessed database
access it by logging on to researcher@McNair DBServ:/bulk/allpatentsprocessed
  • Create Table firstjoin_cleaned: Join patent data to assignee data, creating firstjoin_cleaned. firstjoin_cleaned shows organization names of assignees from the patent data and their patent numbers
  • Create Table secondjoin_cleaned: join firstjoin_cleaned to matchassignees data, creating secondjoin_cleaned. secondjoin_cleaned shows standardized organization names, organization names, and patent numbers
OLD PROCESS (NOT USED ANYMORE)
join secondjoin_cleaned to lbopatentmatch data, creating thirdjoin_cleaned. thirdjoin_cleaned shows LBO's names, standardized organization names, and their patent numbers in order to show lbo's that had patents
  a text file of the final copy can be found here
  Z:\allpatentsprocessed\LBO Patent Data Joined

create table lbopatentinfo showing lbo names, patent numbers, and grant dates of the patents

use lbopatentinfo to create table lbopatentsummary to show lbo names, amount of patents they have, minimum grant date years, maximum grant date years, and average grant date years

import lbo data matched against itself into table(lbodistinct), then use that table and lbopatentsummary to create to create lbofinal. lbofinal then shows distinct lbonames, number of patents, minimum grant year, maximum grant year, and average grant year

use lbofinal to create lboreallyfinal. lboreallyfinal is the same as lbofinal but with null space in number of patents replaced with 0

FINAL DATA IN THE FOLLOWING FOLDER
  E:\McNair\Projects\LBO

#After getting lboreallyfinal, create a table with all of its columns in addition to the lbo deal numbers, and whether it was public to private or private to private in the following steps
#import LBOcleaneddata.txt into table dealnumbxcleannames. it will have the deal numbers and cleaned original lbo names (duplicates not removed)
#import Originallbomatch.txt into table lbonamesxcleannames. it will have the matched standardized lbo names and the cleaned original lbo names
#join those two tables together to reach the standardized lbo names with the deal numbers in table lbonamesxdealnumb, and create a column that says whether the deal was public to private or private to private in the table lboxpriv
#join lbo_list_and_data and lbopatentmatch together using lboxpriv as a go between. this provides a reference in case we want to join using stdorgname to the patent database
#join lbo_list_and_data and lboreallyfinal together using lboxpriv as a go between. this is the final table it can be found in the following folder under the name lbopatentsreallyfinal
  :E:\McNair\Projects\LBO\Data Sets

Introduction of Compustat and Preparation for Stata

See script for exact work: E:\McNair\Projects\LBO\LBO academic paper script post compustat

Create tables as follows:

  • SDC with SDC data imported from the initial LBO data.
  • LBOCOMPUSTAT4 with Compustat data imported.
  • dealnumbxgvkey selecting deal numbers and gvkeys from table sdc.
  • compustatnamematch with compustat names matched against themselves imported
  • stdcompnamestdorgnamematched with standardized compustat names matched against stdorgname
  • matchedlbocompnames with standardized original lbo names matched to standardized compustat names
  • compnamematchlbocompnames with standardized compustat names, compustatnames, and standardized original lbo names
  • stdcompnamepatentjoin with variables standardized compustat names and patent numbers
  • compustatpatentinfo with variables standardized compustat names, patent numbers, and grant dates
  • stdcompnamexgvkeycleaned with variables standardized compustat names and gvkeys
  • ALLGVKey with all distinct gvkeys from stdcompnamexgvkeycleaned
  • AllYears with all years from 1970 to 2016
  • KeysXYears with variables gvkey and year. Each gvkey corresponds to 47 years.
  • keysxyearsxstdcompname with variables gvkeys, years, and standardized compustat names
  • compupatentyearsummary with variables standardized compustat names, gvkeys, years, and number of patents per year
  • dealnumbxgvkeyxlboyeartemp with variables deal numbers, gvkeys, LBO years, and exit years
  • dealnumbxgvkeyxlboyear with variables deal numbers, gvkeys, lbo years, exit years, entry year 1, entry year 2, exit year 1, and exit year 2
  • compupatentlboyearsummary with variables standard compustat names, gvkeys, years, number of patents per year, and lbo entries/exits in each year
  • statastaging with all variables from compupatentlboyearsummary in addition to all the pulled compustat variables
  • statastagingnew with two extra columns, concordsic and concordnaics, then export to stata

Stata Work

See script for exact work: E:\McNair\Projects\LBO\STATAdatasetup2

Setup

  • Label Compustat/patent variables imported from SQL
  • Generate compound variables and label them
  • Set the data as panel data and a time series
  • Set the data for survival analysis
  • Generate variable of first two numbers in naics code/lagged variable of first two numbers
  • Generate log of non-ratio variables
  • Generate lagged variables
  • Find first fiscal year in which data appears for each company
  • Find last fiscal year in which data appears for each company, or set it to 2016
  • Recode the industry (based on NAICS2) to Manufacturing, Retail, Prof. Services, and other
  • Generate additional lagged variables

Analysis

See script for exact work: E:\McNair\Projects\LBO\STATAanalysis2

  • Summary of variables intended for use in hazard model (mean, count)
  • Run Cox Proportional Hazard Model multiple times to determine statistically significant variables
  • Regress on variables that are questionable to determine which may be correlated
  • Run Cox Hazard again with final list of variables
  • Run diff-and-diff

Regressions & Tables

The following is a summary of our variables intended for use in a Cox Proportional Hazards Regression as well as one regression.

  • E:\McNair\Projects\LBO\Saved STATA Tables.txt

Compustat Variables

Company Name

Ticker Symbol

CUSIP

Stock Exchange Code

Fiscal Year-End

ADDZIP -- Postal Code

CITY -- City

IPODATE -- Company Initial Public Offering Date

LOC -- Current ISO Country Code - Headquarters

NAICS -- North American Industry Classification Code

SIC -- Standard Industry Classification Code

STATE -- State/Province

FYEAR -- Data Year - Fiscal

ACT -- Current Assets - Total

ARTFS -- Accounts Receivable/Debtors - Total

AT -- Assets - Total

CEQ -- Common/Ordinary Equity - Total

CHE -- Cash and Short-Term Investments

DLC -- Debt in Current Liabilities - Total

DLTT -- Long-Term Debt - Total

LCT -- Current Liabilities - Total

LT -- Liabilities - Total

TXDITC -- Deferred Taxes and Investment Tax Credit

EBITDA -- Earnings Before Interest

NI -- Net Income (Loss)

OIBDP -- Operating Income Before Depreciation

OPITI -- Operating Income - Total

REVT -- Revenue - Total

SALE -- Sales/Turnover (Net)

TXT -- Income Taxes - Total

XINT -- Interest and Related Expense - Total

XRD -- Research and Development Expense

CDVC -- Cash Dividends on Common Stock (Cash Flow)

ESUBC -- Equity in Net Loss Earnings

IBC -- Income Before Extraordinary Items (Cash Flow)

PDVC -- Cash Dividends on Preferred/Preference Stock (Cash Flow)

XIDOC -- Extraordinary Items and Discontinued Operations (Cash Flow)

DT -- Total Debt Including Current

PRCC_F -- Price Close - Annual - Fiscal

XSTFWS -- Staff Expense - Wages and Salaries

EMP -- Employees

CSHO -- Common Shares Outstanding

XSGA -- Selling, General and Administrative Expense

INTAN -- Intangible Assets - Total

RE -- Retained Earnings

SQL Fixes

(Moved here from deleted page: LBO Fix List)

  • Improve SIC to NAICS fill in further
    • Manually add in conversions from three digit SIC (e.g., 1110) to three digit NAICS (e.g., 134)
    • Re-import into STATA