Difference between revisions of "Hubs: Hubs Data"
Jump to navigation
Jump to search
Line 6: | Line 6: | ||
'''Variables Difficult to Obtain''' | '''Variables Difficult to Obtain''' | ||
#'''Founding Date''' | #'''Founding Date''' | ||
− | #*'''Difficulty:''' Finding date based on our strategies | + | #*''' ''Difficulty:'' ''' Finding date based on our strategies |
− | #*'''New Approach:''' | + | #*''' ''New Approach:'' ''' |
#*#Whois.net Date | #*#Whois.net Date | ||
#*#Factavia/other press release searches | #*#Factavia/other press release searches | ||
#'''Multiple locations within city + Franchise''' (as of now just addresses) | #'''Multiple locations within city + Franchise''' (as of now just addresses) | ||
− | #*'''Difficulty:''' Company or establishment level will impact measurements | + | #*''' ''Difficulty:'' ''' Company or establishment level will impact measurements |
− | #*'''New Approach:''' Will record all addresses at company level | + | #*''' ''New Approach:'' ''' Will record all addresses at company level |
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) | #'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) | ||
#*'' '''Levels:''' '' Binary, list of investors | #*'' '''Levels:''' '' Binary, list of investors | ||
− | #*'''Difficulty:''' Hub website usually does not include investors | + | #*''' ''Difficulty:'' ''' Hub website usually does not include investors |
− | #*'''New Approach:''' | + | #*''' ''New Approach:'' ''' |
#*#Google key terms with address of Hub | #*#Google key terms with address of Hub | ||
#*#Start with partners and use google/crunchbase | #*#Start with partners and use google/crunchbase | ||
Line 24: | Line 24: | ||
#'''Onsite accelerator''' | #'''Onsite accelerator''' | ||
#*'''Levels:''' Binary, count, list | #*'''Levels:''' Binary, count, list | ||
− | #*'''Difficulty:''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website. | + | #*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website. |
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
#*#Google searches and procedure to use on website yields decent results | #*#Google searches and procedure to use on website yields decent results | ||
#*#Similar procedure to onsite investors | #*#Similar procedure to onsite investors | ||
Line 31: | Line 31: | ||
#'''Size (# members)''' | #'''Size (# members)''' | ||
#*'''Levels:''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members) | #*'''Levels:''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members) | ||
− | #*'''Difficulty:''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..." | + | #*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..." |
− | #*'''Approach:''' For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+). | + | #*''' ''Approach:'' ''' For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+). |
Line 38: | Line 38: | ||
#'''Office hours investors''' and '''Office hours mentor/advisors''' | #'''Office hours investors''' and '''Office hours mentor/advisors''' | ||
#*'''Levels:''' Binary for OH, binary for two separate OH, list of names/descriptions of OH | #*'''Levels:''' Binary for OH, binary for two separate OH, list of names/descriptions of OH | ||
− | #*'''Difficulty:''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor | + | #*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor |
− | #*'''Approach:''' Google approach to get to OH pages and then lookup key words in description to separate out | + | #*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out |
Line 45: | Line 45: | ||
#'''Sponsors/Partners''' (University, Corporate) | #'''Sponsors/Partners''' (University, Corporate) | ||
#*'''Levels:''' Count, list of sponsors/partners (if exist), separate columns for university and corporate | #*'''Levels:''' Count, list of sponsors/partners (if exist), separate columns for university and corporate | ||
− | #*'''Difficulty:''' Not all companies will list sponsors, partnesrs, or either | + | #*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either |
− | #*'''Approach:''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest | + | #*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest |
#'''Size (sqft)''' | #'''Size (sqft)''' | ||
#*'''Levels:''' Number in sqft | #*'''Levels:''' Number in sqft | ||
− | #*'''Difficulty:''' Not all companies list square feet online | + | #*''' ''Difficulty:'' ''' Not all companies list square feet online |
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
#*#Google search with key words | #*#Google search with key words | ||
#*#If results do not appear, use of press releases is possible | #*#If results do not appear, use of press releases is possible | ||
Line 57: | Line 57: | ||
#'''Onsite Mentors''' | #'''Onsite Mentors''' | ||
#*'''Levels:''' Count and list of mentors (if exist) | #*'''Levels:''' Count and list of mentors (if exist) | ||
− | #*'''Difficulty:''' Not all companies list mentors | + | #*''' ''Difficulty:'' ''' Not all companies list mentors |
− | #*'''Approach:''' Use two different levels and use of google search | + | #*''' ''Approach:'' ''' Use two different levels and use of google search |
Line 68: | Line 68: | ||
#'''Onsite temporary workshops and Networking Meetups''' (Count) | #'''Onsite temporary workshops and Networking Meetups''' (Count) | ||
#*'''Levels:''' | #*'''Levels:''' | ||
− | #*'''Difficulty:''' | + | #*''' ''Difficulty:'' ''' |
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
#'''Onsite code school''' and '''Curriculum''' | #'''Onsite code school''' and '''Curriculum''' | ||
#*'''Levels:''' | #*'''Levels:''' | ||
− | #*'''Difficulty:''' | + | #*''' ''Difficulty:'' ''' |
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
#'''Alumni Network''' | #'''Alumni Network''' | ||
#*'''Levels:''' | #*'''Levels:''' | ||
− | #*'''Difficulty:''' | + | #*''' ''Difficulty:'' ''' |
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
Line 88: | Line 88: | ||
#*''Levels:'' | #*''Levels:'' | ||
#*''Difficulty:'' | #*''Difficulty:'' | ||
− | #*'''Approach:''' | + | #*''' ''Approach:'' ''' |
===Group 3=== | ===Group 3=== | ||
Line 94: | Line 94: | ||
#'''Twitter activity''' | #'''Twitter activity''' | ||
#*'''Levels:''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets | #*'''Levels:''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets | ||
− | #*'''Approach:''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle | + | #*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle |
#'''Site URL''' | #'''Site URL''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' URL |
− | #*'''Approach:''' Google using Veeral's code that allows us to search | + | #*''' ''Approach:'' ''' Google using Veeral's code that allows us to search |
#'''Address''' | #'''Address''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' Will include all addresses |
− | #*'''Approach:''' Google key terms (e.g. Contact Us) and URL using Veeral's code | + | #*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code |
#'''Nonprofit status''' | #'''Nonprofit status''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization |
− | #*'''Approach:''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not | + | #*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not |
#'''Mission statement''' | #'''Mission statement''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist) |
− | #*'''Approach:''' If not explicitly stated mission statement, will include "About" or statements on main page | + | #*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page |
#'''Specific Industry''' | #'''Specific Industry''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' Industry included in statement (no aggregation) |
− | #*'''Approach:''' *Based on Mission Statement, not aggregated | + | #*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated |
#'''Price for a space/office''' | #'''Price for a space/office''' | ||
− | #*''' | + | #*''' ''Levels:'' ''' Two prices one for shared, other for private |
− | #*'''Approach:''' Uses google methodology with key terms and URL | + | #*''' ''Approach:'' ''' Uses google methodology with key terms and URL |
Revision as of 10:41, 26 July 2016
List of Variables
For a more in-depth of the variables and procedure please see: Hubs: Mechanical Turk. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.
As of Week of 7/25
Group 1
Variables Difficult to Obtain
- Founding Date
- Difficulty: Finding date based on our strategies
- New Approach:
- Whois.net Date
- Factavia/other press release searches
- Multiple locations within city + Franchise (as of now just addresses)
- Difficulty: Company or establishment level will impact measurements
- New Approach: Will record all addresses at company level
- Onsite Venture Capital v. Angel Investors (e.g. # and Assets Under Management)
- Levels: Binary, list of investors
- Difficulty: Hub website usually does not include investors
- New Approach:
- Google key terms with address of Hub
- Start with partners and use google/crunchbase
Group 2
Variables Comfortable, Not Complete (rough order of most difficult to least difficult)
- Onsite accelerator
- Levels: Binary, count, list
- Difficulty: Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
- Approach:
- Google searches and procedure to use on website yields decent results
- Similar procedure to onsite investors
- Size (# members)
- Levels: Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
- Difficulty: Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
- Approach: For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+).
- Office hours investors and Office hours mentor/advisors
- Levels: Binary for OH, binary for two separate OH, list of names/descriptions of OH
- Difficulty: Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
- Approach: Google approach to get to OH pages and then lookup key words in description to separate out
- Sponsors/Partners (University, Corporate)
- Levels: Count, list of sponsors/partners (if exist), separate columns for university and corporate
- Difficulty: Not all companies will list sponsors, partnesrs, or either
- Approach: Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
- Size (sqft)
- Levels: Number in sqft
- Difficulty: Not all companies list square feet online
- Approach:
- Google search with key words
- If results do not appear, use of press releases is possible
- Onsite Mentors
- Levels: Count and list of mentors (if exist)
- Difficulty: Not all companies list mentors
- Approach: Use two different levels and use of google search
- Onsite temporary workshops and Networking Meetups (Count)
- Levels:
- Difficulty:
- Approach:
- Onsite code school and Curriculum
- Levels:
- Difficulty:
- Approach:
- Alumni Network
- Levels:
- Difficulty:
- Approach:
- Community membership?
- Levels:
- Difficulty:
- Approach:
Group 3
Variables Easy to Obtain
- Twitter activity
- Levels: Twitter Handle, # Tweets in a Month, # Followers, # Retweets
- Approach: Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
- Site URL
- Levels: URL
- Approach: Google using Veeral's code that allows us to search
- Address
- Levels: Will include all addresses
- Approach: Google key terms (e.g. Contact Us) and URL using Veeral's code
- Nonprofit status
- Levels: Binary variable indicating if the potential Hub is a nonprofit organization
- Approach: http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
- Mission statement
- Levels: Official mission statement or description of company (if mission does not exist)
- Approach: If not explicitly stated mission statement, will include "About" or statements on main page
- Specific Industry
- Levels: Industry included in statement (no aggregation)
- Approach: *Based on Mission Statement, not aggregated
- Price for a space/office
- Levels: Two prices one for shared, other for private
- Approach: Uses google methodology with key terms and URL