Women in Entrepreneurship (Issue Brief)

From edegan.com
Revision as of 12:40, 21 September 2020 by Ed (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Timeline and plan for summer

  • note: I am using "hopethisworks8.txt" to make my graphs
  • 4-5 artifacts, 4-5 specifications, last artifact a regression table (tables, charts)
    • something about data itself. pie chart (how many companies do we have, how many do we have ceo information for, how many do we have founder) and then relatedly many can we classify as men or women
      • this will influence all remaining graphs - use this info to guide the rest of your graphs
      • classified drs using their first name matched to common first name list, that's why we were able to classify some drs as women, some as men, and some drs but gender unknown
    • graph with time on x axis, % on y axis, women founders, women ceos, women management positions (management position being anything that is vp and above) over time
      • will write up these results-ie most of the women in "womens management positions"
    • conditional on having (ceo identified) how many women, how many men, drs, dr men, dr women, dr unknown.
    • bar graph with women in various industries, y axis will have percentage
      • should also make note of total percentage of VC dollars in each industry
    • regression table
      • variable names, with n and r^2, industry/year fixed effects Y/N
      • each one of these will correspond to a regression
      • variables: IPO, acquisition, exit all correspond to 0,1
      • rounds, dollars invested

Spring 2017

Current plan to complete project

  1. Lit Review (which will be compiled here Women in Entrepreneurship Lit Review). Keep track of the things that have already been done in other research reports, so that we can do something different/original or so we can prove their data wrong. Done 3/28.
  2. Brainstorm ways to fix Dr. problem in data (assigning gender to those who are doctors), recode them. reran new tables with new coded doctors. done 4/4.
  3. Think of new variables to add depending on the guidance from the lit review (might include adding different variables than solely womenceo/womenfounder but will depend on litreview).
    1. add control variable for the people in data who have no gender
  4. I think from there I can do analysis and start to write report!

For analysis

Note that:

  • finaldatasetcode.sql (in 181/Women) has been updated
  • hopethisworks5.txt (in both 181/Women and E/Women) has been created
    • Use this as your dataset now
    • It contains IPO, IPOyear, IPOAmount, MA, MAyear, MAAmount, and Exit
    • IPO, MA, and Exit are 0/1 variables. Year is an int, and Amount is a real.
    • Many MAs will not have amounts.

Build graphs of:

  • Over time (last ten yrs)
    • percent of co's with woman ceo (binary)
    • percent of co's with woman founder (binary)
    • percent of co's with woman clevel (binary)
    • percent of co's with women in top management (vp and above)
    • average fraction of women founders
    • average fraction of women clevel
    • average fraction of women in top management (vp and above)
  • Some of the above by industry
  • Some of the above by state

Regresions:

  • rounds womanvar w/controls
  • inv womanvar w/controls
  • exit womanvar w/controls
  • ipo womanvar w/controls
  • ma womanvar w/controls
  • ipoamount womanvar w/controls
  • maamount womanvar w/controls

Call with SDC Platinum to determine information about the data

  1. How does Thompson get their data?
    1. Why is coverage better for some firms than others?
    2. Is the data self-reported by the companies?
  2. How does Thompson upload executive data?
    1. How often – after last round of funding?
    2. After it’s uploaded, is the data updated continuously?
  3. Better coverage for companies that get more rounds of financing/IPO/M&A?


  1. VC data sourced from government filings, public press releases, and quarterly surveys of private equity firms. If company does not participate in survey, then SDC does not have the data, which is why coverage for some firms is better than others.
  2. Deals team is the one that uploads all the data. As soon as SDC gets an update, or has a source that is updated, they automatically upload that to system as well. It typically takes 24-48 hours for new info to be reflected in database.
  3. Yes, there is better coverage for companies that have IPO/M&A/get more rounds of financing.

TO DO after 2/28

  1. Check cleancos and verify that it is actually clean (yes)
  2. Make Dr. a control variable (yes)
  3. Sum all c-level, etc.
  4. aggregate to company level

We need to build:

  • IPO 1/0
  • M&A 1/0
  • Number of rounds
  • Total invested
  • Is the CEO a woman 1/0
  • Are any of the founders women 1/0
  • Is CEO doctor 1/0
  • Number of founders
  • Number of founders that are doctors
  • Number of women founders
  • State - coded
  • Industry -code them!
  • Year of First investment -extracted

Ed is going to add IPO and Acq to rc1andcp3

Here's the code to add zeros to cleanpeople:

DROP TABLE cleanpeople2;
CREATE TABLE cleanpeople2 AS
SELECT 
prefix,
firstname,
lastname,
jobtitle,
fulltitle,
cname,
CASE WHEN genval IS NOT NULL THEN genval ELSE 0::int END AS women,
CASE WHEN doctor IS NOT NULL THEN doctor ELSE 0::int END AS doctor,
CASE WHEN charman IS NOT NULL THEN charman ELSE 0::int END AS charman,
CASE WHEN ceo IS NOT NULL THEN ceo ELSE 0::int END AS ceo,
CASE WHEN cfo IS NOT NULL THEN cfo ELSE 0::int END AS cfo,
CASE WHEN coo IS NOT NULL THEN coo ELSE 0::int END AS coo,
CASE WHEN cio IS NOT NULL THEN cio ELSE 0::int END AS cio,
CASE WHEN cto IS NOT NULL THEN cto ELSE 0::int END AS cto,
CASE WHEN otherclvl IS NOT NULL THEN otherclvl ELSE 0::int END AS otherclvl,
CASE WHEN boardmember IS NOT NULL THEN boardmember ELSE 0::int END AS boardmember,
CASE WHEN president IS NOT NULL THEN president ELSE 0::int END AS president,
CASE WHEN vp IS NOT NULL THEN vp ELSE 0::int END AS vp,
CASE WHEN founder IS NOT NULL THEN founder ELSE 0::int END AS founder,
CASE WHEN director IS NOT NULL THEN director ELSE 0::int END AS director
FROM cleanpeople WHERE jobtitle IS NOT NULL;

TO DO up to 2/28

  1. Load
    1. company
    2. people
    3. title lookup
    4. state lookup
  2. Refine people
    1. gender 0/1
    2. Join title lookup
    3. Dr.'s?
  3. Aggregate to company level (left join)
    1. Agg. People
    2. Join state lookup
  4. Export!

Done! Here is the code for this part of project:

DROP TABLE titlelookup; CREATE TABLE titlelookup( fulltitle varchar(150), charman int, ceo int, cfo int, coo int, cio int, cto int, otherclvl int, boardmember int, president int, vp int, founder int, director int ); \COPY titlelookup FROM 'Important Titles in Women2017 dataset.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --628


DROP TABLE statelookup; CREATE TABLE statelookup( statename varchar(100), uniquecode int ); \COPY statelookup FROM 'uniquestateval.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --50


DROP TABLE people; CREATE TABLE people( prefix varchar(5), firstname varchar(50), lastname varchar(50), jobtitle varchar(150), cname varchar(150) ); \COPY people FROM 'pull5-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --186181


DROP TABLE cleancos; CREATE TABLE cleancos AS SELECT pull3info.* FROM pull3info JOIN Cleancosbase ON pull3info.companyname=cleancosbase.companyname AND minfirstdate=firstinvestdate; --43534


DROP TABLE genvalpeople; CREATE TABLE genvalpeople AS SELECT *, CASE WHEN prefix='Ms' THEN 1::int WHEN prefix='Mrs' THEN 1::int WHEN prefix='Mr' THEN 0::int ELSE Null::int END AS genval FROM people; --186181

DROP TABLE cleanpeople; CREATE TABLE cleanpeople AS SELECT genvalpeople.*, titlelookup.* FROM genvalpeople LEFT JOIN titlelookup ON genvalpeople.jobtitle=fulltitle; --186181

DROP TABLE uniquevalstate; CREATE TABLE uniquevalstate AS SELECT cleancos.*, statelookup.* FROM cleancos LEFT JOIN statelookup ON cleancos.companystate=statelookup.statename; --43534


DROP TABLE dataset; CREATE TABLE dataset AS SELECT cleanpeople.*, uniquevalstate.* FROM cleanpeople LEFT JOIN uniquevalstate ON cleanpeople.cname=uniquevalstate.companyname; --186181

\COPY people TO 'finaldataset.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --186181

Paper

https://docs.google.com/document/d/1RHC0Namt9VpmUxINmRvHw4U34dDmgYPcUMbwhhthcCA/edit

Timeline

  • Prior
    • review/update lit review
      • should come from both industry and academic sources
  • Intro
    • Update stats to reflect 2017
    • citations
    • add something specifically about the importance of women in leadership in vc companies
  • Thesis/Issue Brief focus
    • This issue brief has two broad goals. First, it will examine the role of women in entrepreneurship fields today by researching their representation in leadership positions of VC-backed companies. Second, it will examine the effectiveness of current policy related to women in entrepreneurship.
  • Research
    • search in SDC to reflect the current date rather than 2016
      • add industry/other variables
    • build SQL tables with the following parameters: group by state, role, city, msa, industry
    • relationship between women in leadership roles and performance of the firm, where performance measured by no rounds invested/ipo
    • state fix-effects
  • Policy
    • Trump's proposed policy changes that will impact vc-backed companies, entrepreneurship/innovation in general, women in the workplace: research can be found at Trump and Women in Entrepreneurship
      • crowd-funding impact

Timeline

  • 1/24 - finish outline, reread tables/data collection process, update lit review
  • 1/25 - SDC pull
  • 1/30 - rerun through tables w new added variables (i.e. industry, etc)
  • 1/31 - read over new data and refine into something presentable, find appropriate parameters for issue brief, etc. maybe start making charts and graphs if time. @ this point check in w ed
  • 2/1 - finish making graphs and charts, research trump policy and write it up
  • 2/3 - start finalizing paper, putting all the pieces of the puzzle together, any additions or time to fix things that went off schedule, last edits, submit paper to anne for review

SDC Search

pull 1

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • venture-related deals
    • Company Nation: U.S.
  • Custom Report
    • Company Founding Date
    • Company IPO Date
    • Date Received Last Investment
    • Date Received First Investment
    • Company Name
    • Company State
    • Company City
    • Company VE Primary Industry Class
    • Company VE Primary Industry Major Group
    • Company 6-digit CUSIP
    • Total amt invested in company
    • No. Financing Rounds Company Received
    • Executive's First Name
    • Executive's Last Name
    • Executive's Job Title
    • Executive's Prefix
    • VE Primary Industry Sub-Group 3
    • VE Primary Industry Minor Group
    • VE Primary Industry Sub-Group 1
    • Company IPO
    • Company Website


  • Saved in 182 in mcnair/projects/women2017 as "pull1"

pull 3

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • Venture related deals - select all venture related deals
    • Company Nation: U.S.
  • custom report
  • company founding date
  • company ipo date
  • date company received last investment
  • date company received first investment
  • company staet
  • company name
  • company industry class
  • company industry major group
  • company industry subgroup 1
  • company industry subgroup 3
  • Company industry minor group
  • company cusip
  • total known amt invested in company
  • no rouds company received investment
  • company ipo status
  • company website
  • saved in 182 in mcnair/projects/women2017 as "pull3"

pull 4

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • Venture-related deals
    • Company Nation: U.S.
  • custom report
  • company name
  • executive's name prefix
  • executive's first name
  • executive's last name
  • executive's job title
  • saved in 182 in mcnair/projects/women2017 as "pull4"

pull 5

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • Venture-related deals
    • Company Nation: U.S.
  • custom report
  • executive's name prefix
  • executive's first name
  • executive's last name
  • executive's job title
  • company name
  • saved in 182 in mcnair/projects/women2017 as "pull5"

pull 6

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • venture-related deals
    • Company Nation: U.S.
  • Custom Report
  • executive's name prefix
  • executive's first name
  • executive's last name
  • executive's job title
  • company name
  • company city
  • company state

pull 7

  • Database Selection
    • VentureXpert
    • Portfolio Companies
  • Search Items:
    • Date that received company received first investment: 01/01/1980 to 01/01/2017
    • venture-related deals
    • Company Nation: U.S.
  • Custom Report
  • executive's name prefix
  • executive's first name
  • executive's last name
  • executive's job title
  • company name
  • Company ve primary industry major group
  • Company ve primary industry minor group
  • Company ve primary industry sub group 1
  • Company ve primary industry sub group 2
  • Company ve primary industry sub group 3


What I have done up to 2/7/17 (in english, not sql)

I have done many SDC pulls with many different variables. The most important tables are nowomenstate, nowomencity, nowomenminorind, and nowomensub2. nowomenstate shows the number of women in "power positions" in each state as a percentage of all power position workers. nowomencity does the same thing at a city level. nowomenminorind shows the number of women in power positions grouped by their company's minor industry group. nowomensub2 does the same except grouped by the company's industry sub group 2.

Tables built up to 02/07/17

SDC pull #1 company founding date company ipo date date company received last investment date company received first investment company staet company name company industry class company industry major group company industry subgroup 1 company industry subgroup 3 Company industry minor group company cusip ttoal known amt invested in company no rouds company received investment company ipo status company website

SDC pull #2 company name executive's name prefix executive's first name executive's last name executive's job title


logging on: researcher@128.42.44.181 password: 9million

cd /bulk cd \Women2017 psql Women2017

DROP TABLE pull3info; CREATE TABLE pull3info( /*taken from pull3.txt*/ ipodate date, lastinvestdate date, firstinvestdate date, companyfoundingdate date, ipostatus varchar(10), companystate varchar(50), companyname varchar(200), industryclass varchar(200), majorgroup varchar(200), minorgroup varchar(200), subgroup1 varchar(200), subgroup3 varchar(200), cusip varchar(6), totalinvested float, numrounds int, website varchar(200) );

\COPY pull3info FROM 'pull3-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV

--COPY 43606

DROP TABLE cleancosbase; CREATE TABLE cleancosbase AS SELECT companyname, min(firstinvestdate) as minfirstdate FROM pull3info WHERE companyname !='Undisclosed Company' OR companyname !='New York Digital Health LLC' GROUP BY companyname; --43534

DROP TABLE cleancos; CREATE TABLE cleancos AS SELECT pull3info.* FROM pull3info JOIN Cleancosbase ON pull3info.companyname=cleancosbase.companyname AND minfirstdate=firstinvestdate; --43534


DROP TABLE pull4info; CREATE TABLE pull4info( /*taken from pull4.txt*/ execfirstname varchar(50), jobtitle varchar(100), execlastname varchar(50), execprefix varchar(5), companyname varchar(200) );

\COPY pull4info FROM 'pull4-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV

/*the job title column of the exec is for some reason combined with the first name even after normalizing the data. I will repull the data from SDC to see if it was simply an error in the pull or something. If that doesn't work, will go to ed.

  • /

SDC pull #5 executive's name prefix executive's first name executive's last name executive's job title company name

DROP TABLE pull5info; CREATE TABLE pull5info( /*taken from pull5.txt*/ execprefix varchar(5), execfirstname varchar(50), execlastname varchar(50), jobtitle varchar(100), companyname varchar(200) );

\COPY pull5info FROM 'pull5-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV

--COPY 186181

DROP TABLE distinctnocompanies; CREATE TABLE distinctcompanies AS SELECT DISTINCT companyname FROM pull3info; /*created this per ed's advice to see how many companies are distinct from each other so we dont get duplicates in the dataset - only really creating a table to get the specific number*/

--COPY 43536 /*this means there are 70 duplicates in the data. could mean that some companies in diff states have the same name, or it could be a for real duplicate. can go back and check on this later but rn seems like a relatively nominal amt.*/


DROP TABLE totaltable; CREATE TABLE totaltable AS SELECT * FROM pull5info LEFT JOIN pull3info ON pull5info.CompanyName = pull3info.companyname; /*represents the table combining pull3info and pull5info, aka all of the info about the companies and the executives' info matched on company name*/

/*WHERE IM AT:

START GROUPING BY WOMEN IN pull3info ed says to rename pull3info but idc like i know what it is

  • /

DROP TABLE commonnames; CREATE TABLE commonnames( /*common male and female names with corresponding 0 or 1*/ name varchar(100), mf int );

DROP TABLE commonnamesclean; CREATE TABLE commonnamesclean AS SELECT name, sum(mf) AS mf FROM commonnames GROUP BY name HAVING count(mf) = 1;

\COPY commonnames FROM 'realcommonnames.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --4654

DROP TABLE impoppl; CREATE TABLE impoppl AS /*selecting the people with important job titles*/ SELECT * FROM pull5info WHERE jobtitle IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', 'Vice President'); --67733

DROP TABLE binarymanwoman; CREATE TABLE binarymanwoman AS /*creates table with a column populated with 1 if prefix is Ms or Mrs and 0 if mr*/ SELECT*, CASE WHEN execprefix='Ms' THEN 1::int WHEN execprefix='Mrs' THEN 1::int WHEN execprefix='Mr' THEN 0::int ELSE Null::int END AS binaryval FROM pull5info; --186181

DROP TABLE binarycommonmatch; CREATE TABLE binarycommonmatch AS /*matching common names to list of execs who have dr prefix/are gender unidentified*/ SELECT execprefix, execfirstname, execlastname, jobtitle, companyname, CASE WHEN binaryval IS NULL AND mf IS NOT NULL THEN mf WHEN binaryval IS NOT NULL THEN binaryval ELSE NULL::int END AS binaryval FROM binarymanwoman LEFT JOIN commonnamesclean ON execfirstname = name WHERE execfirstname IS NOT NULL; --171307

DROP TABLE doctors; CREATE TABLE doctors AS /*represents the leaders from companies who haven't been assigned a gender*/ SELECT * FROM binarycommonmatch WHERE binaryval IS NULL; --7016


DROP TABLE women; CREATE TABLE women AS SELECT * FROM binarycommonmatch WHERE binaryval=1; --18101 /*this means that 18101/171307 of the people in "important" positions are women which is equal to 0.105. Could be +/- 0.04 bc of the doctors/unidentified genders in the data*/

/*next steps: redo SDC pull, group by

state city company exec prefix, first name, last name, job title

also check in w ed about the no of doctos and whether getting within a 4% margin is ok*/

SDC pull #6

company state company city company name exec first name exec last name exec prefix exec job title

DROP TABLE execbylocation; CREATE TABLE execbylocation( state varchar(75), city varchar(75), companyname varchar(100), execjob varchar(100), execprefix varchar(5), firstname varchar(50), lastname varchar(50) );

\COPY execbylocation FROM 'pull6-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --186242


DROP TABLE impopplbylocation; CREATE TABLE impopplbylocation AS /*selecting the people with important job titles*/ SELECT * FROM execbylocation WHERE execjob IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', 'Vice President'); --67750

DROP TABLE impoppllocname; CREATE TABLE impoppllocname AS SELECT * FROM impopplbylocation WHERE execprefix IN ('Ms', 'Mrs'); --5646

/*here's what I want to do: I have a table with all the (identifiable) women, and a separate table with both men and women. I want to group by state in both, then divide the no of females in state (impoppllocname)/total in state (impopplbylocation) where state=state (i.e. females in california in impoppllocname divided by total no of ppl from california in impopplbylocation) to get the percentage of females in each state with "important" positions. however, sql is a turd and won't let me group by state for some odd reason in either of these tables. is there a more efficient way to do this than with two tables? perhaps. But idk! help!*/

DROP TABLE binpplbylocation; CREATE TABLE binpplbylocation AS /*creates table with a column populated with 1 if prefix is Ms or Mrs and 0 if mr*/ SELECT*, CASE WHEN execprefix='Ms' THEN 1::float WHEN execprefix='Mrs' THEN 1::float WHEN execprefix='Mr' THEN 0::float ELSE Null::int END AS binaryval FROM impopplbylocation; --67750

DROP TABLE nowomenstate; CREATE TABLE nowomenstate AS SELECT binpplbylocation.state, sum(binaryval)/count(binaryval) AS percentage FROM binpplbylocation GROUP BY binpplbylocation.state ORDER BY percentage ASC; /*this gives the no of women in each state/all the ppl in a state to yield the percentage of women in each state who are females in power*/ --54 (includes Guam, D.C., etc.) /*Wyoming the lowest, at 0, and Hawaii the highest, at 16.9%, Texas at 8.1%*/

DROP TABLE nowomencity; CREATE TABLE nowomencity AS SELECT binpplbylocation.city, sum(binaryval)/count(binaryval) AS percentage FROM binpplbylocation GROUP BY binpplbylocation.city ORDER BY percentage DESC; --2049

SELECT * FROM nowomencity WHERE city='San Antonio'; --Houston has 7.8% of women in leadership positions in all of its companies --Austin has 9.1% --Dallas has 9.9% --San Antonio has 7.4%

/*next do sdc pull to group by industry minor group 3, industry major group*/

SDC pull #7

Custom Report executive's name prefix executive's first name executive's last name executive's job title company name Company major group company minor group company sub group 1,2,3

perl NormalizeFixedWidth.pl -file="pull7.txt"

DROP TABLE execbyindustry; CREATE TABLE execbyindustry( execprefix varchar(5), firstname varchar(50), lastname varchar(50), jobtitle varchar(50), companyname varchar(100), majorgroup varchar(100), minorgroup varchar(100), subgroup1 varchar(100), subgroup2 varchar(100), subgroup3 varchar(100) );

\COPY execbyindustry FROM 'pull7-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --186272

DROP TABLE impopplbylocation2; CREATE TABLE impopplbylocation2 AS /*selecting the people with important job titles*/ SELECT * FROM execbyindustry WHERE jobtitle IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', 'Vice President'); --67764

DROP TABLE binpplbyindustry; CREATE TABLE binpplbyindustry AS /*creates table with a column populated with 1 if prefix is Ms or Mrs and 0 if mr*/ SELECT*, CASE WHEN execprefix='Ms' THEN 1::float WHEN execprefix='Mrs' THEN 1::float WHEN execprefix='Mr' THEN 0::float ELSE Null::int END AS binaryval FROM impopplbylocation2; --67764

DROP TABLE nowomenmajorind; CREATE TABLE nowomenmajorind AS SELECT binpplbyindustry.majorgroup, sum(binaryval)/count(binaryval) AS percentage FROM binpplbyindustry GROUP BY binpplbyindustry.majorgroup ORDER BY percentage DESC; --6 /*highest is biotechnology, with 11% of company founders being women. lowest is semiconductors/other electric, with 4%*/

DROP TABLE nowomenminorind; CREATE TABLE nowomenminorind AS SELECT binpplbyindustry.minorgroup, sum(binaryval)/count(binaryval) AS percentage FROM binpplbyindustry GROUP BY binpplbyindustry.minorgroup ORDER BY percentage DESC; --10 /*consumer related higher than biotechnology at 11.9% vs 11.5%, with lowest still semiconductors/other electric at 4%)*/

DROP TABLE nowomensub1; CREATE TABLE nowomensub1 AS SELECT binpplbyindustry.subgroup1, sum(binaryval)/count(binaryval) AS percentage FROM binpplbyindustry GROUP BY binpplbyindustry.subgroup1 ORDER BY percentage DESC; --18 /*pretty similar to minorind; not much difference/nothing revolutionary to add*/

DROP TABLE nowomensub2; CREATE TABLE nowomensub2 AS SELECT binpplbyindustry.subgroup2, sum(binaryval)/count(binaryval) AS percentage FROM binpplbyindustry GROUP BY binpplbyindustry.subgroup2 ORDER BY percentage DESC; --69 /*consumer products = 15%, consumer services = 14%, energy enhanced recovery = 14%, med/health services 13%, then biotech research at 13%. Lowest: industrial services at 2.5%, idustrial equipment at 2.9%, power supplies at 3.1%, laser related at 3.3%, optoelectronics at 3.6%*/

DROP TABLE nowomensub3; CREATE TABLE nowomensub3 AS SELECT binpplbyindustry.subgroup3, sum(binaryval)/count(binaryval) AS percentage FROM binpplbyindustry GROUP BY binpplbyindustry.subgroup3 ORDER BY percentage ASC; --560 /*industrial products=50%!, agriculture=33%, ecommerce selling products = 29%, biotech food enzymes and cultures = 29%, computers and hardware=28%. lowest: there are straight up 0 in about 75 different things.*/


/*up next: build table with coname, women ceo&founder (binary 0/1), women in power (0 or 1), #power positions, #women in power, state, year co received first investment, no rounds, total$invested*/

STATA regression info

name text file "big.do" press play - will run in stata


commands in stata: summarize numrounds -> gives mean, skewness, kurtosis summarize numrounds, det -> same as above except w higher level of detail tab numrounds -> frequency count gen -> generate a variable gen invested = totalinvested*1000 = new variable gen abovemeaninv =0 if totalinvested <. (where . is equal to positive infinity, it's what stata puts in when there's a missing value) replace abovemeaninv=1 if totalivnested > mean & totalinvested <.

reg some measure of performance on some measure of women

reg numrounds womeninpower


if 2(std. err.) < coef, you're good The relationship between numrounds and womeninpower is highly statistically significant

- R-squared - says of all the variation in the number of rounds, women in power only explains less than one percent of that variation 0.66% 


if you use .log you can automatically replace it whenever you run the script


rebuild data set to contain men need year variable - extract year from date



multivariate regression y should be perfromance measure x should be the explanatory variables if you run multiple variables with , robust it will be fine

i.year = indicator that takes value 1 if it's the year it's invested

Spring 2016

The Paper

Introduction

Over the past century, the United States has witnessed two large-scale social trends involving women in entrepreneurship that have had a significant impact on the country's economic growth. First, huge numbers of women have made their way into the official labor force. Around this time 50 years ago (February 1966), women's labor force participation rate was 39.6%, but in February 2016, 56.8% of women participated in the labor force [1]. This demonstrates substantial growth in women's entry to the labor force. Second, women are achieving a higher degree of education than in years past. As labor market barriers to women have been lowered, the benefits of a college education have grown more for women than men, and females now outpace males in college enrollment. [2] This has provided women with similar access to the same jobs as men.

However, these trends have slowed from their early twenty-first century spikes; room for growth is smaller now than it was before. Economists predict that economic growth from these two trends is unlikely to be repeated to the same magnitude. With this, everyone is looking for the modern-day economy-boosting equivalent to women's entry to the labor force. Given the rising share of women among educated workers, it seems clear that the future of American economic growth is in the hands of women. Encouraging women to enter into fields of entrepreneurship, particularly high-growth entrepreneurship, might be the United States' silver bullet.


This issue brief aims to examine the role of women in entrepreneurship today by examining their jobs (or lack thereof) in entrepreneurship related fields, and examine the effectiveness of current policy related to women in entrepreneurship.

Status Quo

Our research

We examined all leaders in companies from 1980-2016 who have the titles of either chariman, CEO, CFO, COO, CIO, CTO, board member, President, Vice President, Founder, and Director. Of company members with those titles, only 29.8% were women. Of those women leaders, 22% were CEOs, 14% chairwomen, 27% CFOs, 26% COOs, 31% CIOs, 12% CTOs, 25% board members, 20% presidents, 34% vice presidents, 23% founders, and 39% directors. (from tables, PercentWomenXCompany, where X is job title)

Generally, the percentage of women in "leadership roles" in companies has trended upwards since 1980. (from companywomentot table) However, the number of women CEOs over the past two decades has not dramatically increased or decreased, aside from a dramatic upward spike in women CEOs during 2015. (from PercentWomenCEOCompanybyyear table)

Grouped by state, Michigan has highest percentage of women in leadership in its companies, coming in at 37.5%. Texas has 26.6%. Wyoming has a whopping 0%. (from womenstates)

These data show that of the women listed as "Company Executives", only a very small percentage have positions of power. However, this number has grown since 1980, which demonstrates a slow but positive growth of women in executive positions in general.

Info from external sources

Overall, women-owned businesses account for slightly less than 1/3 of all businesses in the United States. At first glance the statistics portray a positive picture for the field's growth: the number of women-owned firms has grown 68 percent since 2007, compared to only 47 percent for all businesses. [3]. However, these women owned businesses are typically only run by the woman herself; among employer firms, women-owned businesses account for only 16% of the total, and their shares of revenue and employees are in the single digits. [4]

The few women in entrepreneurial leadership roles also have to deal with negative societal perceptions of their jobs. According to data from a 2015 Pew Research Center survey, 43% of Americans believe that the U.S. is not prepared to hire women for top-tier executive positions. A relatively smaller, but still significant, number of Americans (23%) believe that women don’t have the time to hold an executive position, given their “family responsibilities”. [5]

The American public's generally negative perception of women in entrepreneurship presents itself institutionally as well. Even though female founders perform equal to or better than their male counterparts when raising money online, only 10 percent of startups which raised Series A last year had female founders. [6] Texas claims the worst record of supporting women seeking venture capital. Last year, 42 Texas startups got Series A rounds, yet not a single startup had female founders. In terms of venture capital structure, ninety-four percent of decision makers at venture capital funds are male, and 90% of today's venture capital funds never see a female founder. Among chief executives of S.&P. 1500 firms, for each woman, there are four men named John, Robert, William or James. [7] But venture capital firms with women partners are more than twice as likely to invest in companies with a woman on the executive team and three times as likely to invest in companies with women CEOs.

Though at first glance women-owned business growth seems striking, the numbers are deceptive in that their shares of revenue and employees are in single digits. Furthermore, women who do go into entrepreneurship face societal disapproval of their actions, with a plurality of Americans believing not only that the country is not prepared to hire women to executive positions, but also that women are incapable of holding these positions due to their family responsibilities. Further, women get start-up capital for their companies 2-3 times as often from women-led VC funds, but only 6% of decision makers at VC funds are women. It's no wonder women are vastly underrepresented in this field.

What women need

Almost half-of female founders (48 percent) cite a lack of available mentors or advisers as holding them back. A third say lack of capital is a constraint. [8]

Relevant policy

The Obama administration recognized the importance of expanding the role that women-owned businesses play in the national economy. Some of his initiatives include increasing access to capital, incentivizing small business growth and hiring, encouraging women owned business competition for government contracts, and enhancing long term survival of women owned businesses. Two primary pieces of legislation have bolstered the limited growth that the United States has seen in the field of women entrepreneurs, The Small Business Jobs Act of 2010 and the Women’s Equity in Contracting Act. Both of these seek to help businesses headed up by women win more government contracts.

The president has worked on increasing small business access to capital through the Recovery Act and the Small Business Jobs Act. Through this act, the president has reformed SBA lending practices and built on existing infrastructure. SBA loans are 3-5x more likely to go to women and minorities than traditional business loans. Over 12,000 SBA Recovery loans have gone to women-owned small businesses, totaling more than $3 billion. [9] The president has also increased tax cuts with the aim of helping small businesses invest in their firms and create jobs. Specific to women, the SBA opened several new Women's Business Centers in 2009 and 2010, bringing their total to 114.

Jessica Milli, senior research associate at the Institute of Women’s Policy Research, said things like the Jobs Act and SBA's programs for women "really aim to put women-owned businesses...at a competitive advantage. (They) restrict competition, which has really helped to boost revenues and help more businesses get into the industry.” But such policies can’t, and haven't, solved all the challenges inherent to being a female entrepreneur. Even with targeted legislation, women business owners still face a significant wage gap and continually have smaller amounts of start-up capital than their male peers.

Conclusion As women's prevalence in the United States workforce has increased, the United States has experienced an economic boom. However, women's numbers are still small in entrepreneurial fields, and even more microscopic in leadership positions in those fields. Current policy fails because it doesn't properly address women's lack of access to capital and mentorship. Moving forward, the United States must seek strategies to incentivize more women to enter the entrepreneurial workforce.

The Research

  • SQL info can also be found on researcher@128.42.44.181 in the Women folder. Called "women.sql".

creating basic tables

DROP TABLE companyreal; CREATE TABLE companyreal( datefounded date, companyname varchar(100), prefix varchar(5), firstname varchar(100), lastname varchar(100), jobtitle varchar(100), companystate varchar(100), companycity varchar(100), industrygroup varchar(100), amtinvestedthous float, rounds int );

\COPY companyreal FROM 'womencompanieslast-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --129536

DROP TABLE fundreal; CREATE TABLE fundreal( datefounded date, fundname varchar(100), prefix varchar(5), firstname varchar(100), lastname varchar(100), jobtitle varchar(100), fundstate varchar(100), fundcity varchar(100), fundtargetsizemil float, firmname varchar(100) );

\COPY fundreal FROM 'womenfundslast-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV --156005


adding variable where if person is woman, adds value of 1 to their name

DROP TABLE manwomanfund; CREATE TABLE manwomanfund AS SELECT prefix, lastname, datefounded, jobtitle, fundname, fundcity, fundstate, removeinitial(firstname) as firstname, CASE WHEN prefix='Ms' THEN 1::int WHEN prefix='Mr' THEN 0::int ELSE Null::int END AS womanman FROM fundreal; --156005

DROP TABLE manwomancompany; CREATE TABLE manwomancompany AS SELECT prefix, lastname, datefounded, jobtitle, companyname, companycity, companystate, industrygroup, removeinitial(firstname) as firstname, amtinvestedthous, rounds, CASE WHEN prefix='Ms' THEN 1::int WHEN prefix='Mr' THEN 0::int ELSE Null::int END AS womanman FROM companyreal; --129536

for people without a prefix, trying to assign gender variable to them based on list of common names

DROP TABLE manwomanfund2; CREATE TABLE manwomanfund2 AS SELECT datefounded, prefix, firstname, lastname, fundname, jobtitle, womanman AS origwomanman, mf, fundcity, fundstate, CASE WHEN womanman IS NULL AND mf IS NOT NULL THEN mf WHEN womanman IS NOT NULL THEN womanman ELSE NULL::int END AS womanman FROM manwomanfund LEFT JOIN commonnamesclean ON firstname = name WHERE firstname IS NOT NULL; --152819

DROP TABLE manwomancompany2; CREATE TABLE manwomancompany2 AS SELECT datefounded, prefix, firstname, lastname, companyname, jobtitle, womanman AS origwomanman, mf, companycity, companystate, industrygroup, rounds, amtinvestedthous, CASE WHEN womanman IS NULL AND mf IS NOT NULL THEN mf WHEN womanman IS NOT NULL THEN womanman ELSE NULL::int END AS womanman FROM manwomancompany LEFT JOIN commonnamesclean ON lower(firstname) = lower(name) WHERE firstname IS NOT NULL; --123278

--%, state, city, round no

me playing around, these tables not relevant

DROP TABLE fundWomenCEOs; CREATE TABLE fundwomentot AS SELECT EXTRACT(YEAR FROM datefounded), AVG(womanman) AS PercentWomen FROM manwomanfund2 GROUP BY EXTRACT(YEAR FROM datefounded); --37

DROP TABLE companyWomen; CREATE TABLE companywomentot AS SELECT EXTRACT(YEAR FROM datefounded), AVG(womanman) AS PercentWomen FROM manwomancompany2 GROUP BY EXTRACT(YEAR FROM datefounded); --36

DROP TABLE womencompanystate; CREATE TABLE womencompanystate AS SELECT companystate, AVG(womanman) AS WomenPerState FROM manwomancompany2 GROUP BY companystate; --51

DROP TABLE womenfundstate; CREATE TABLE womencfundstate AS SELECT fundstate, AVG(womanman) AS WomenPerState FROM manwomanfund2 GROUP BY fundstate; --51

DROP TABLE womencompanycity; CREATE TABLE womencompanycity AS SELECT companycity, AVG(womanman) AS WomenPerCity FROM manwomancompany2 GROUP BY companycity;

assigning variable to titles

DROP TABLE titles01; CREATE TABLE titles01( title varchar(100), chairman int, CEO int, CFO int, COO int, CIO int, CTO int, boardmember int, president int, vicepresident int, founder int, director int );

\COPY titles01 FROM 'impotitleslast.txt' WITH DELIMITER AS E'\t' HEADER NULL AS CSV

DROP TABLE title02;

massive conglomeration table

DROP TABLE ManWomanCompany3; CREATE TABLE ManWomanCompany3 AS SELECT Manwomancompany2.*, CASE WHEN chairman IS NOT NULL THEN 1::int ELSE 0::int END AS chairman, CASE WHEN CEO IS NOT NULL THEN 1::int ELSE 0::int END AS CEO, CASE WHEN CFO IS NOT NULL THEN 1::int ELSE 0::int END AS CFO, CASE WHEN COO IS NOT NULL THEN 1::int ELSE 0::int END AS COO, CASE WHEN CIO IS NOT NULL THEN 1::int ELSE 0::int END AS CIO, CASE WHEN CTO IS NOT NULL THEN 1::int ELSE 0::int END AS CTO, CASE WHEN boardmember IS NOT NULL THEN 1::int ELSE 0::int END AS boardmember, CASE WHEN president IS NOT NULL THEN 1::int ELSE 0::int END AS president, CASE WHEN vicepresident IS NOT NULL THEN 1::int ELSE 0::int END AS vicepresident, CASE WHEN founder IS NOT NULL THEN 1::int ELSE 0::int END AS founder, CASE WHEN director IS NOT NULL THEN 1::int ELSE 0::int END AS director FROM ManwomanCompany2 LEFT JOIN titles01 ON manwomancompany2.jobtitle=titles01.title;

tables for percent of women in a given job title

DROP TABLE PercentWomenCompany; CREATE TABLE PercentWomenCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3; --.29812

DROP TABLE PercentWomenCEOsCompany; CREATE TABLE PercentWomenCEOsCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE ceo = 1; --.22597

DROP TABLE PercentWomenChairmanCompany; CREATE TABLE PercentWomenChairmanCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE chairman = 1; --.142417

DROP TABLE PercentWomenCFOCompany; CREATE TABLE PercentWomenCFOCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE cfo = 1; --.27087

DROP TABLE PercentWomenCOOCompany; CREATE TABLE PercentWomenCOOCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE coo = 1; --.26353

DROP TABLE PercentWomenCIOCompany; CREATE TABLE PercentWomenCIOCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE cio = 1; --.31707

DROP TABLE PercentWomenCTOCompany; CREATE TABLE PercentWomenCTOCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE cto = 1; --.12588

DROP TABLE PercentWomenboardmemberCompany; CREATE TABLE PercentWomenboardmemberCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE boardmember = 1; --.2545

DROP TABLE PercentWomenPresidentCompany; CREATE TABLE PercentWomenPresidentCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE president = 1; --.19646

DROP TABLE PercentWomenVPCompany; CREATE TABLE PercentWomenVPCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE vicepresident = 1; --.3468

DROP TABLE PercentWomenfounderCompany; CREATE TABLE PercentWomenfounderCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE founder = 1; --.2394

DROP TABLE PercentWomendirectorCompany; CREATE TABLE PercentWomendirectorCompany AS SELECT sum(mf)/count(mf) FROM ManWomanCompany3 WHERE director = 1; --.38625

table for finding percentage of women in companies by state

DROP TABLE womenstates; CREATE TABLE womenstates AS SELECT companystate, sum(mf)/count(mf) FROM ManWomanCompany3 GROUP BY companystate;


Example output data

Year CEOs    Founders   Start-ups   Partners   FundExecs
-------------------------------------------------------------
1980  2%      2%          4%        1%          2%
1981  ...
  • Table on money raised?
  • Data on likelihood of follow-on round

still to do:

  1. this data still discounts a (small) percentage of data (doctors who didn't have a common name)
  2. have data organized, now we need to figure out how to extract it in meaningful terms for the report.