Changes

Jump to navigation Jump to search
no edit summary
|Has paper status=Working paper
}}
 =Current WorkVersion 2=
==How to do a run==
6 Edit the excel files (some are xml files that have to be opened from inside of Excel), optionally moving columns up to create alignment and using regexes to get rid of unwanted rows. Create and apply a chart template to simplify the standardized production of figures. Use the excel2latex plugin to export tables to LaTeX.
==Review of Old Build== This section was written as review of the old build to assist with the current build. Readers can skip it unless they are interested in legacy construction. ===First things=== Restore '''allpatentsprocessed''' (as researcher): cd /bulk/mcnair/backups createdb allpatentsprocessed pg_restore -Fc -d allpatentsprocessed allpatentsprocessed_Fc_230818.dump It looks like DataExtension.sql in E:\mcnair\Projects\LBO\Clean\Ed Run used '''usptopad''' on the old rdp to make tables for all of COMPUSTAT and import them into '''allpatentsprocessed'''. usptopad was not archived in the Big Move -- we might want to back it (and other rdp dbases) up! It also looks like some development notes are missing. I can't find them in either E:\mcnair\Projects\LBO\Clean\Ed Run or the LBO dropbox, or Dropbox\coauthoredprojects\NonMcNair. However, Analysis.do loads/merges the following:*PreMergeRelevant.dta, which is produced by Datasetup.do, after a ''keep if savefilter''. The sole input to Datasetup.do is '''masterv1-0.txt'''*matchresultsV7, which comes from '''matchresults - V7 nofe lagps 92 R0 indu3 20pc 5yr.txt''', which is generated by running LBOmatchscript.jl on MatchInputV7.txt, which in turn is made by Datasetup.do*PatentStatistics.dta, which needs to be produced using PatentMeasures.sql on '''allpatentsprocessed''' once all matches have been identified*CPI.dta  export delimited gvkey year lboentry lboexit patentflow patentstock lagpatentstock indu3 indu5 indu6 decade logitpreg logitpregwindu3fe logitpregwfe hadlbo matchfilter somepatenting using "E:\McNair\Projects\LBO\Clean\Ed Run\MatchInputV7.txt", delimiter(tab) nolabel quote replace ===PatentMeasures.sql=== PatentMeasures.sql takes MatchedGVKeys.txt (184) and builds the patent measures (see below) in '''allpatentsprocessed''', producing MatchedMeasuresPortfolio.txt (8468), which is merged in Analysis.do after being resaved as PatentStatistics.dta.  DROP TABLE MatchedMeasuresPortfolio; CREATE TABLE MatchedMeasuresPortfolio AS SELECT gvkeyint as gvkey, year, avg(exploitive) as avgexploitive, avg(Explorative70) as avgexplorative70, avg(Explorative80) as avgexplorative80, avg(Explorative90) as avgexplorative90, avg(gen) as avggen, avg(genunbiased) as avggenub, avg(org) as avgorg, avg(orgunbiased) as avgorgub, avg(citesmade) as avgcitesmade, avg(claims) as avgclaims FROM MatchedMeasuresBase GROUP BY gvkeyint, year ORDER BY gvkeyint, year; --8648 Required input tables are:*HistMasterFile, which is the NBER historical masterfile FROM 'historical_masterfile.csv'*citations*patents All three are in '''allpatentsprocessed''' ===PatentExpiry=== The '''patentexpiry''' table was built in '''allpatentsprocessed''' by running PatentExpiry.sql, which is in E:\mcnair\Projects\LBO\Clean\Ed Run and was last updated in March 2018. Most of the script is comment. The key elements are: The stack is as follows:*'''patents''' is an input*Maintfeeevents2 FROM 'MaintFeeEvents_20170410b.txt' (COUNT 14708262)*PatentExpirationRules2 FROM 'patentExpirationRules2.txt' (COUNT 27)*Maintfeeevents2,patentexpirationrules -> patentfeebaseevents (4402215)*FULL OUTER (patents, patentfeebaseevents) -> patentfeebasefull (5463947) -> PatentExpiryBase (5463947) -> PatentExpiry (5463947) Note that patentfeebaseevents is made from PatentExpirationRules and not PatentExpirationRules2. Both list the same fee codes and description, but PatentExpirationRules2 has a termlength column whereas PatentExpirationRules has daysincrement and daysduration columns. daysduration is the same as termlength, except when daysduration=7300 then termlength=6205. It seems from the code that PatentExpirationRules is correct, as there is a manual fix when max(daysduration)=7300.  DROP TABLE PatentExpiry; CREATE TABLE PatentExpiry AS SELECT patentnumber, FilingDate, grantdate, maxdaysduration, nomaintenance, maxtermfix1, maxtermfix2, CASE --Old patents get: grant plus 17yrs WHEN nomaintenance = 1 THEN grantdate + 6205 -- Patents grandfathered in during transition in 1995 to new file date + 20 year term limit get: Later of issue date + 17 years or file date + 20 years used WHEN (maxtermfix1 = 1 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) --Patents after June 8 1995 but before May 28 2000 get: file date + 20 years WHEN (maxtermfix2 = 1 AND maxdaysduration = 7300) THEN FilingDate + 7300 --Patents after May 28 2000 (this might not be automatic and require an appeal though) get: later of issue date + 17 years or file date + 20 years WHEN (maxtermfix2 = 2 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) ELSE grantdate + maxdaysduration END AS expdate, CASE WHEN nomaintenance = 1 OR maxdaysduration = 7300 THEN 1::int ELSE 0::int END AS HeldToTerm FROM PatentExpiryBase; --5463947 Note: It also looks like Ed did the same fixes in:E:\mcnair\Projects\LBO\Clean\Ed Run\Building the Patent Start End Table Revised.sql ===Analyzing DataExtension.sql=== DataExtension.sql was last saved in July 2018. The output of this script is masterv1-0.txt (last produced in March 2018), which is the sole input to Datasetup.do. This section therefore traces the construction of masterv1-0.txt. Key stacks:*'''StataStaging4''' is an input*gvkeyyearpatentdata (395213)*statastaging4, (statastaging4->) LBOSeriesProcessed (556) -> LBODetailsWDurr (1691834)*StataStaging4 -> NaicsSICBase (36779) & ExchangeOK (16236)*(StataStaging4,LBODetailsWDurr,gvkeyyearpatentdata,NaicsSICBase | ExchangeOK) -> Master (746856) gvkeyyearpatentdata appears to have been built for all of COMPUSTAT:  --######################### --# Build yearslist, GvkeyYear --######################### DROP TABLE yearslist; CREATE TABLE yearslist AS SELECT generate_series(1975,2017) as year; --43 DROP TABLE Allgvkeys; CREATE TABLE Allgvkeys AS SELECT DISTINCT gvkey FROM patenthistorybase ORDER BY gvkey; --9191 DROP TABLE GvkeyYear; CREATE TABLE GvkeyYear AS SELECT gvkey, year FROM yearslist, Allgvkeys ORDER BY gvkey, year; --395213 Key stacks:*'''patentexpiry''' is an input*cspatentbuys FROM 'cspatentbuys.txt' (304864), which came from usptopad*cspatentsales FROM 'cspatentsales.txt' (451730), which came from usptopad*'''stdcompnames''' is an input*'''assignees''' (is an input) -> Patentsassigneessimple 5426556*(Patentsassigneessimple,MatchOrgnameSelf (loaded from orgnamedctmatched.txt 501232), MatchstdcompnamedctOrgname (loaded from Matchstdcompnamedctorgnamestd.txt 8361,stdcompnames) -> patentgrantgvkey*patentgrantgvkey (1329006), cspatentbuysutlgvkey (272620) ... -> patentgrantbuy*(patentgrantbuy,patentsale,patentexpired)-> patenthistorybase (1513045)*patenthistorybase ...,GvkeyYear -> gvkeyyearpatentdata (395213)  Therefore needed inputs to DataExtension.sql are:*patentexpiry*stdcompnames*assignees*StataStaging4 Note that Patentsassigneessimple is not in '''allpatentsprocessed''' but assignee is, and it can easy be rebuilt: SELECT COUNT(*) FROM assignees WHERE orgname IS NOT NULL AND patentnumber IS NOT NULL; --5426556 There are two obvious master tables that we could export and reuse. The first is patenthistwoprebuys. This is patenthistorybase (1513045) but ''WHERE startdate < enddate''. It has 1,495,486 records and the unit of observation is a patent associated with a firm. The keys are gvkey and patentnumber.  Table "public.patenthistwoprebuys" Column | Type | Collation | Nullable | Default --------------+------------------+-----------+----------+--------- gvkey | text | | | patentnumber | integer | | | startdate | date | | | startyear | double precision | | | grants | integer | | | buy | integer | | | enddate | date | | | endyear | double precision | | | sell | integer | | | term | integer | | | notrenewed | integer | | | unknownexp | integer | | | The second master table is gvkeyyearpatentdata. This is an aggregated table, where the unit of observation is gvkey-year, where year goes from 1975-2017 (see GvkeyYear above) and contains zeros. There are 395,213 observations (i.e., one for each year for 9,191 unique GVKEYs).  Table "public.gvkeyyearpatentdata" Column | Type | Collation | Nullable | Default --------------------+---------+-----------+----------+--------- gvkey | text | | | year | integer | | | nogrants | bigint | | | nobuy | bigint | | | nosell | bigint | | | noterm | bigint | | | nonotnenewed | bigint | | | nounknownexp | bigint | | | portfoliochg | bigint | | | portfolio | numeric | | | portfoliocitations | bigint | | | citesperpatent | numeric | | | ===StataStaging4=== So the dataset is built from StataStaging4, as well as raw tables.  Variables used include:*gvkey, year*lboentry,lboexit*private,ipodate*naics, sic, exchg*datadate, fyear, tic, cusip, name_compustat, fyr, act, artfs, at, ceq, che, csho, dlc, dltt, dt, dvc, dvp, ebitda, emp, esubc, ibc, intan, lct, lt, ni, oibdp, opiti, re, revt, sale, txditc, txt, xidoc, xint, xrd, xsga, xstfws, *exchg, prcc_f, addzip, city, loc It therefore seems that statastaging4 is all of COMPUSTAT with some LBO data included... However, I can't find the code that actually built it. There is code to build StataStaging5 in E:\mcnair\Projects\LBO\Clean\Ed Run\LBO Innovation SQL Script.sql And the best instructions seem to be to run parts 1, 2, and 3A of that script. ==Moving ForwardNotes==
===The Plan===
The propensity score regression is:
. logit lbomatchmarker lvg lvgsq tobinql revgrowthl revgrowth taxtoassets ebitdatoassets revtl if matchfilter==1, robust
Iteration 0: log pseudolikelihood = -11571027.9027 1539 Iteration 1: log pseudolikelihood = -11131008.7267 441 Iteration 2: log pseudolikelihood = -1100988.3734 56148 Iteration 3: log pseudolikelihood = -1097987.5593 85551 Iteration 4: log pseudolikelihood = -1097986.1202 7324 Iteration 5: log pseudolikelihood = -1097986.0625 63042 Iteration 6: log pseudolikelihood = -1096986.365 62504 Iteration 7: log pseudolikelihood = -1094.6987 Iteration 8: log pseudolikelihood = -1094.5913 Iteration 9: log pseudolikelihood = -1094.591 Iteration 10: log pseudolikelihood = -1094986.591 62501
Logistic regression Number of obs = 5161,434090 Wald chi2(76) = 177129.3438
Prob > chi2 = 0.0000
Log pseudolikelihood = -1094986.591 62501 Pseudo R2 = 0.05470395
--------------------------------------------------------------------------------
| Robust
lboregmarker lbomatchmarker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
lvg | 1.999583 12638132 .541985 039153 16.30 74 0.195 -1000 .022652 1870748 5.0218173405516 lvgsq tobinql | -1.459515 1.197452 -1.22 41471 0.223 1566152 -3.806478 .8874482 tobinql | -.5714983 .0526354 -109.86 03 0.000 -1.6746618 72167 -1.4683347107749 revgrowthl revgrowth | -.8786764 0303849 .8096242 0741461 -10.09 41 0.278 682 -2.465511 1757085 .7081581149387 taxtoassets | 3 2.543691 43517 1.064069 126206 32.33 16 0.001 031 1.458154 2278463 54.629227642494 ebitdatoassets | 23.053875 148241 .4988919 7061438 4.12 46 0.000 1.076065 764224 34.031685532257 revtl | .5511462 063724 .0611253 0270392 92.02 36 0.000 018 .4313429 0107282 .67094951167198 _cons | -56.975003 485486 .765584 2214917 -729.80 28 0.000 -76.47552 919602 -46.47448605137
--------------------------------------------------------------------------------
Note: 52 46 failures and 0 successes completely determined.
With the increased sample, we now find that LBOs are associated with reduced likelihood of patenting overall. However, this results is driven by the LBOs in period 1 from 1980 to 1995. In period 2, from 1995 to 2015, there is no statistically significant association. We also tried using decades but this wasn't instructive.
UPDATE CS SET csho=42.266 WHERE gvkey='118321' AND fyear=2005;
==Review of Old Build== This section was written as review of the old build to assist with the current build. Readers can skip it unless they are interested in legacy construction. ===First things=== Restore '''allpatentsprocessed''' (as researcher): cd /bulk/mcnair/backups createdb allpatentsprocessed pg_restore -Fc -d allpatentsprocessed allpatentsprocessed_Fc_230818.dump It looks like DataExtension.sql in E:\mcnair\Projects\LBO\Clean\EdRun used '''s version usptopad''' on the old rdp to make tables for all of COMPUSTAT and import them into '''allpatentsprocessed'''. usptopad was not archived in the Big Move -- we might want to back it (and other rdp dbases) up! It also looks like some development notes are missing. I can't find them in either E:\mcnair\Projects\LBO\Clean\Ed Run or the LBO dropbox, or Dropbox\coauthoredprojects\NonMcNair. However, Analysis.do loads/merges the following:*PreMergeRelevant.dta, which is produced by Datasetup.do, after a ''keep if savefilter''. The sole input to Datasetup.do is '''masterv1-0.txt'''*matchresultsV7, which comes from '''matchresults - V7 nofe lagps 92 R0 indu3 20pc 5yr.txt''', which is generated by running LBOmatchscript.jl on MatchInputV7.txt, which in turn is made by Datasetup.do*PatentStatistics.dta, which needs to be produced using PatentMeasures.sql on '''allpatentsprocessed''' once all matches have been identified*CPI.dta  export delimited gvkey year lboentry lboexit patentflow patentstock lagpatentstock indu3 indu5 indu6 decade logitpreg logitpregwindu3fe logitpregwfe hadlbo matchfilter somepatenting using "E:\McNair\Projects\LBO\Clean\Ed Run\MatchInputV7.txt", delimiter(tab) nolabel quote replace ===PatentMeasures.sql=== PatentMeasures.sql takes MatchedGVKeys.txt (184) and builds the patent measures (see below) in '''allpatentsprocessed''', producing MatchedMeasuresPortfolio.txt (8468), which is merged in Analysis.do after being resaved as PatentStatistics.dta.  DROP TABLE MatchedMeasuresPortfolio; CREATE TABLE MatchedMeasuresPortfolio AS SELECT gvkeyint as gvkey, year, avg(exploitive) as avgexploitive, avg(Explorative70) as avgexplorative70, avg(Explorative80) as avgexplorative80, avg(Explorative90) as avgexplorative90, avg(gen) as avggen, avg(genunbiased) as avggenub, avg(org) as avgorg, avg(orgunbiased) as avgorgub, avg(citesmade) as avgcitesmade, avg(claims) as avgclaims FROM MatchedMeasuresBase GROUP BY gvkeyint, year ORDER BY gvkeyint, year; --8648 Required input tables are:*HistMasterFile, which is the NBER historical masterfile FROM 'historical_masterfile.csv'*citations*patents All three are in '''allpatentsprocessed''' ===PatentExpiry=== The '''patentexpiry''' table was built in '''allpatentsprocessed''' by running PatentExpiry.sql, which is in E:\mcnair\Projects\LBO\Clean\Ed Run and was last updated in March 2018. Most of the script is comment. The key elements are: The stack is as follows:*'''patents''' is an input*Maintfeeevents2 FROM 'MaintFeeEvents_20170410b.txt' (COUNT 14708262)*PatentExpirationRules2 FROM 'patentExpirationRules2.txt' (COUNT 27)*Maintfeeevents2,patentexpirationrules -> patentfeebaseevents (4402215)*FULL OUTER (patents, patentfeebaseevents) -> patentfeebasefull (5463947) -> PatentExpiryBase (5463947) -> PatentExpiry (5463947) Note that patentfeebaseevents is made from PatentExpirationRules and not PatentExpirationRules2. Both list the same fee codes and description, but PatentExpirationRules2 has a termlength column whereas PatentExpirationRules has daysincrement and daysduration columns. daysduration is the same as termlength, except when daysduration=7300 then termlength=6205. It seems from the code that PatentExpirationRules is correct, as there is a manual fix when max(daysduration)=7300.  DROP TABLE PatentExpiry; CREATE TABLE PatentExpiry AS SELECT patentnumber, FilingDate, grantdate, maxdaysduration, nomaintenance, maxtermfix1, maxtermfix2, CASE --Old patents get: grant plus 17yrs WHEN nomaintenance = 1 THEN grantdate + 6205 -- Patents grandfathered in during transition in 1995 to new file date + 20 year term limit get: Later of issue date + 17 years or file date + 20 years used WHEN (maxtermfix1 = 1 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) --Patents after June 8 1995 but before May 28 2000 get: file date + 20 years WHEN (maxtermfix2 = 1 AND maxdaysduration = 7300) THEN FilingDate + 7300 --Patents after May 28 2000 (this might not be automatic and require an appeal though) get: later of issue date + 17 years or file date + 20 years WHEN (maxtermfix2 = 2 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) ELSE grantdate + maxdaysduration END AS expdate, CASE WHEN nomaintenance = 1 OR maxdaysduration = 7300 THEN 1::int ELSE 0::int END AS HeldToTerm FROM PatentExpiryBase; --5463947 Note: It also looks like Ed did the same fixes in:E:\mcnair\Projects\LBO\Clean\Ed Run\Building the Patent Start End Table Revised.sql ===Analyzing DataExtension.sql=== DataExtension.sql was last saved in July 2018. The output of this script is masterv1-0.txt (last produced in March 2018), which is the sole input to Datasetup.do. This section therefore traces the construction of masterv1-0.txt. Key stacks:*'''StataStaging4''' is an input*gvkeyyearpatentdata (395213)*statastaging4, (statastaging4->) LBOSeriesProcessed (556) -> LBODetailsWDurr (1691834)*StataStaging4 -> NaicsSICBase (36779) & ExchangeOK (16236)*(StataStaging4,LBODetailsWDurr,gvkeyyearpatentdata,NaicsSICBase | ExchangeOK) -> Master (746856) gvkeyyearpatentdata appears to have been built for all of COMPUSTAT:  --######################### --# Build yearslist, GvkeyYear --######################### DROP TABLE yearslist; CREATE TABLE yearslist AS SELECT generate_series(1975,2017) as year; --43 DROP TABLE Allgvkeys; CREATE TABLE Allgvkeys AS SELECT DISTINCT gvkey FROM patenthistorybase ORDER BY gvkey; --9191 DROP TABLE GvkeyYear; CREATE TABLE GvkeyYear AS SELECT gvkey, year FROM yearslist, Allgvkeys ORDER BY gvkey, year; --395213 Key stacks:*'''patentexpiry''' is an input*cspatentbuys FROM 'cspatentbuys.txt' (304864), which came from usptopad*cspatentsales FROM 'cspatentsales.txt' (451730), which came from usptopad*'''stdcompnames''' is an input*'''assignees''' (is an input) -> Patentsassigneessimple 5426556*(Patentsassigneessimple,MatchOrgnameSelf (loaded from orgnamedctmatched.txt 501232), MatchstdcompnamedctOrgname (loaded from Matchstdcompnamedctorgnamestd.txt 8361,stdcompnames) -> patentgrantgvkey*patentgrantgvkey (1329006), cspatentbuysutlgvkey (272620) ... -> patentgrantbuy*(patentgrantbuy,patentsale,patentexpired)-> patenthistorybase (1513045)*patenthistorybase ...,GvkeyYear -> gvkeyyearpatentdata (395213)  Therefore needed inputs to DataExtension.sql are:*patentexpiry*stdcompnames*assignees*StataStaging4 Note that Patentsassigneessimple is not in '''allpatentsprocessed''' but assignee is, and it can easy be rebuilt: SELECT COUNT(*) FROM assignees WHERE orgname IS NOT NULL AND patentnumber IS NOT NULL; --5426556 There are two obvious master tables that we could export and reuse. The first is patenthistwoprebuys. This is patenthistorybase (1513045) but ''WHERE startdate < enddate''. It has 1,495,486 records and the unit of observation is a patent associated with a firm. The keys are gvkey and patentnumber.  Table "public.patenthistwoprebuys" Column | Type | Collation | Nullable | Default --------------+------------------+-----------+----------+--------- gvkey | text | | | patentnumber | integer | | | startdate | date | | | startyear | double precision | | | grants | integer | | | buy | integer | | | enddate | date | | | endyear | double precision | | | sell | integer | | | term | integer | | | notrenewed | integer | | | unknownexp | integer | | | The second master table is gvkeyyearpatentdata. This is an aggregated table, where the unit of observation is gvkey-year, where year goes from 1975-2017 (see GvkeyYear above) and contains zeros. There are 395,213 observations (i.e., one for each year for 9,191 unique GVKEYs).  Table "public.gvkeyyearpatentdata" Column | Type | Collation | Nullable | Default --------------------+---------+-----------+----------+--------- gvkey | text | | | year | integer | | | nogrants | bigint | | | nobuy | bigint | | | nosell | bigint | | | noterm | bigint | | | nonotnenewed | bigint | | | nounknownexp | bigint | | | portfoliochg | bigint | | | portfolio | numeric | | | portfoliocitations | bigint | | | citesperpatent | numeric | | | ===StataStaging4=== So the dataset is built from StataStaging4, as well as raw tables.  Variables used include:*gvkey, year*lboentry,lboexit*private,ipodate*naics, sic, exchg*datadate, fyear, tic, cusip, name_compustat, fyr, act, artfs, at, ceq, che, csho, dlc, dltt, dt, dvc, dvp, ebitda, emp, esubc, ibc, intan, lct, lt, ni, oibdp, opiti, re, revt, sale, txditc, txt, xidoc, xint, xrd, xsga, xstfws, *exchg, prcc_f, addzip, city, loc It therefore seems that statastaging4 is all of COMPUSTAT with some LBO data included... However, I can't find the code that actually built it. There is code to build StataStaging5 in E:\mcnair\Projects\LBO\Clean\Ed Run\LBO Innovation SQL Script.sql And the best instructions seem to be to run parts 1, 2, and 3A of that script. =Version 1=
Files are in E:\McNair\Projects\LBO\Clean\Ed Run

Navigation menu