Difference between revisions of "Hubs: Hubs Data"

From edegan.com
Jump to navigation Jump to search
Line 6: Line 6:
 
'''Variables Difficult to Obtain'''
 
'''Variables Difficult to Obtain'''
 
#'''Founding Date'''
 
#'''Founding Date'''
#*'''Difficulty:''' Finding date based on our strategies
+
#*''' ''Difficulty:'' ''' Finding date based on our strategies
#*'''New Approach:'''  
+
#*''' ''New Approach:'' '''  
 
#*#Whois.net Date
 
#*#Whois.net Date
 
#*#Factavia/other press release searches  
 
#*#Factavia/other press release searches  
 
#'''Multiple locations within city + Franchise''' (as of now just addresses)
 
#'''Multiple locations within city + Franchise''' (as of now just addresses)
#*'''Difficulty:''' Company or establishment level will impact measurements
+
#*''' ''Difficulty:'' ''' Company or establishment level will impact measurements
#*'''New Approach:''' Will record all addresses at company level
+
#*''' ''New Approach:'' ''' Will record all addresses at company level
 
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management)
 
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management)
 
#*'' '''Levels:''' '' Binary, list of investors
 
#*'' '''Levels:''' '' Binary, list of investors
#*'''Difficulty:''' Hub website usually does not include investors
+
#*''' ''Difficulty:'' ''' Hub website usually does not include investors
#*'''New Approach:'''  
+
#*''' ''New Approach:'' '''  
 
#*#Google key terms with address of Hub
 
#*#Google key terms with address of Hub
 
#*#Start with partners and use google/crunchbase
 
#*#Start with partners and use google/crunchbase
Line 24: Line 24:
 
#'''Onsite accelerator'''
 
#'''Onsite accelerator'''
 
#*'''Levels:''' Binary, count, list
 
#*'''Levels:''' Binary, count, list
#*'''Difficulty:''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.  
+
#*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.  
#*'''Approach:'''
+
#*''' ''Approach:'' '''
 
#*#Google searches and procedure to use on website yields decent results
 
#*#Google searches and procedure to use on website yields decent results
 
#*#Similar procedure to onsite investors
 
#*#Similar procedure to onsite investors
Line 31: Line 31:
 
#'''Size (# members)'''
 
#'''Size (# members)'''
 
#*'''Levels:''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
 
#*'''Levels:''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
#*'''Difficulty:''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
+
#*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
#*'''Approach:''' For companies that have a list, we will count.  For those with select members, we will count those they listed and try to see if there is a comment about how many they have.  For those that just have a statement "with over," we will write the number and + (e.g. "120+).
+
#*''' ''Approach:'' ''' For companies that have a list, we will count.  For those with select members, we will count those they listed and try to see if there is a comment about how many they have.  For those that just have a statement "with over," we will write the number and + (e.g. "120+).
  
  
Line 38: Line 38:
 
#'''Office hours investors''' and '''Office hours mentor/advisors'''
 
#'''Office hours investors''' and '''Office hours mentor/advisors'''
 
#*'''Levels:''' Binary for OH, binary for two separate OH, list of names/descriptions of OH
 
#*'''Levels:''' Binary for OH, binary for two separate OH, list of names/descriptions of OH
#*'''Difficulty:''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
+
#*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
#*'''Approach:''' Google approach to get to OH pages and then lookup key words in description to separate out
+
#*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out
  
  
Line 45: Line 45:
 
#'''Sponsors/Partners''' (University, Corporate)
 
#'''Sponsors/Partners''' (University, Corporate)
 
#*'''Levels:''' Count, list of sponsors/partners (if exist), separate columns for university and corporate
 
#*'''Levels:''' Count, list of sponsors/partners (if exist), separate columns for university and corporate
#*'''Difficulty:''' Not all companies will list sponsors, partnesrs, or either
+
#*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either
#*'''Approach:''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
+
#*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
  
 
#'''Size (sqft)'''
 
#'''Size (sqft)'''
 
#*'''Levels:''' Number in sqft
 
#*'''Levels:''' Number in sqft
#*'''Difficulty:''' Not all companies list square feet online
+
#*''' ''Difficulty:'' ''' Not all companies list square feet online
#*'''Approach:'''
+
#*''' ''Approach:'' '''
 
#*#Google search with key words
 
#*#Google search with key words
 
#*#If results do not appear, use of press releases is possible
 
#*#If results do not appear, use of press releases is possible
Line 57: Line 57:
 
#'''Onsite Mentors'''
 
#'''Onsite Mentors'''
 
#*'''Levels:''' Count and list of mentors (if exist)
 
#*'''Levels:''' Count and list of mentors (if exist)
#*'''Difficulty:''' Not all companies list mentors  
+
#*''' ''Difficulty:'' ''' Not all companies list mentors  
#*'''Approach:''' Use two different levels and use of google search
+
#*''' ''Approach:'' ''' Use two different levels and use of google search
  
  
Line 68: Line 68:
 
#'''Onsite temporary workshops and Networking Meetups''' (Count)
 
#'''Onsite temporary workshops and Networking Meetups''' (Count)
 
#*'''Levels:'''  
 
#*'''Levels:'''  
#*'''Difficulty:'''
+
#*''' ''Difficulty:'' '''
#*'''Approach:'''
+
#*''' ''Approach:'' '''
  
  
 
#'''Onsite code school''' and '''Curriculum'''
 
#'''Onsite code school''' and '''Curriculum'''
 
#*'''Levels:'''  
 
#*'''Levels:'''  
#*'''Difficulty:'''
+
#*''' ''Difficulty:'' '''
#*'''Approach:'''
+
#*''' ''Approach:'' '''
  
  
 
#'''Alumni Network'''
 
#'''Alumni Network'''
 
#*'''Levels:'''  
 
#*'''Levels:'''  
#*'''Difficulty:'''
+
#*''' ''Difficulty:'' '''
#*'''Approach:'''
+
#*''' ''Approach:'' '''
  
  
Line 88: Line 88:
 
#*''Levels:''  
 
#*''Levels:''  
 
#*''Difficulty:''
 
#*''Difficulty:''
#*'''Approach:'''
+
#*''' ''Approach:'' '''
  
 
===Group 3===
 
===Group 3===
Line 94: Line 94:
 
#'''Twitter activity'''
 
#'''Twitter activity'''
 
#*'''Levels:''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets
 
#*'''Levels:''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets
#*'''Approach:''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
+
#*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
 
#'''Site URL'''
 
#'''Site URL'''
#*'''Level:''' URL
+
#*''' ''Levels:'' ''' URL
#*'''Approach:''' Google using Veeral's code that allows us to search  
+
#*''' ''Approach:'' ''' Google using Veeral's code that allows us to search  
 
#'''Address'''
 
#'''Address'''
#*'''Level:''' Will include all addresses
+
#*''' ''Levels:'' ''' Will include all addresses
#*'''Approach:''' Google key terms (e.g. Contact Us) and URL using Veeral's code
+
#*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code
 
#'''Nonprofit status'''
 
#'''Nonprofit status'''
#*'''Level:''' Binary variable indicating if the potential Hub is a nonprofit organization
+
#*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization
#*'''Approach:''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
+
#*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
 
#'''Mission statement'''
 
#'''Mission statement'''
#*'''Level:''' Official mission statement or description of company (if mission does not exist)
+
#*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist)
#*'''Approach:''' If not explicitly stated mission statement, will include "About" or statements on main page
+
#*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page
 
#'''Specific Industry'''
 
#'''Specific Industry'''
#*'''Level:''' Industry included in statement (no aggregation)
+
#*''' ''Levels:'' ''' Industry included in statement (no aggregation)
#*'''Approach:''' *Based on Mission Statement, not aggregated
+
#*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated
 
#'''Price for a space/office'''
 
#'''Price for a space/office'''
#*'''Level:''' Two prices one for shared, other for private
+
#*''' ''Levels:'' ''' Two prices one for shared, other for private
#*'''Approach:''' Uses google methodology with key terms and URL
+
#*''' ''Approach:'' ''' Uses google methodology with key terms and URL

Revision as of 10:41, 26 July 2016

List of Variables

For a more in-depth of the variables and procedure please see: Hubs: Mechanical Turk. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.

As of Week of 7/25

Group 1

Variables Difficult to Obtain

  1. Founding Date
    • Difficulty: Finding date based on our strategies
    • New Approach:
      1. Whois.net Date
      2. Factavia/other press release searches
  2. Multiple locations within city + Franchise (as of now just addresses)
    • Difficulty: Company or establishment level will impact measurements
    • New Approach: Will record all addresses at company level
  3. Onsite Venture Capital v. Angel Investors (e.g. # and Assets Under Management)
    • Levels: Binary, list of investors
    • Difficulty: Hub website usually does not include investors
    • New Approach:
      1. Google key terms with address of Hub
      2. Start with partners and use google/crunchbase

Group 2

Variables Comfortable, Not Complete (rough order of most difficult to least difficult)

  1. Onsite accelerator
    • Levels: Binary, count, list
    • Difficulty: Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
    • Approach:
      1. Google searches and procedure to use on website yields decent results
      2. Similar procedure to onsite investors
  1. Size (# members)
    • Levels: Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
    • Difficulty: Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
    • Approach: For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+).


  1. Office hours investors and Office hours mentor/advisors
    • Levels: Binary for OH, binary for two separate OH, list of names/descriptions of OH
    • Difficulty: Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
    • Approach: Google approach to get to OH pages and then lookup key words in description to separate out


  1. Sponsors/Partners (University, Corporate)
    • Levels: Count, list of sponsors/partners (if exist), separate columns for university and corporate
    • Difficulty: Not all companies will list sponsors, partnesrs, or either
    • Approach: Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
  1. Size (sqft)
    • Levels: Number in sqft
    • Difficulty: Not all companies list square feet online
    • Approach:
      1. Google search with key words
      2. If results do not appear, use of press releases is possible
  1. Onsite Mentors
    • Levels: Count and list of mentors (if exist)
    • Difficulty: Not all companies list mentors
    • Approach: Use two different levels and use of google search




  1. Onsite temporary workshops and Networking Meetups (Count)
    • Levels:
    • Difficulty:
    • Approach:


  1. Onsite code school and Curriculum
    • Levels:
    • Difficulty:
    • Approach:


  1. Alumni Network
    • Levels:
    • Difficulty:
    • Approach:


  1. Community membership?
    • Levels:
    • Difficulty:
    • Approach:

Group 3

Variables Easy to Obtain

  1. Twitter activity
    • Levels: Twitter Handle, # Tweets in a Month, # Followers, # Retweets
    • Approach: Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
  2. Site URL
    • Levels: URL
    • Approach: Google using Veeral's code that allows us to search
  3. Address
    • Levels: Will include all addresses
    • Approach: Google key terms (e.g. Contact Us) and URL using Veeral's code
  4. Nonprofit status
    • Levels: Binary variable indicating if the potential Hub is a nonprofit organization
    • Approach: http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
  5. Mission statement
    • Levels: Official mission statement or description of company (if mission does not exist)
    • Approach: If not explicitly stated mission statement, will include "About" or statements on main page
  6. Specific Industry
    • Levels: Industry included in statement (no aggregation)
    • Approach: *Based on Mission Statement, not aggregated
  7. Price for a space/office
    • Levels: Two prices one for shared, other for private
    • Approach: Uses google methodology with key terms and URL