Difference between revisions of "Hubs: Hubs Data"
Jump to navigation
Jump to search
(9 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | =Hubs Pages= | ||
+ | *The main page for Hubs can be found: [[Hubs (Academic Paper)]] | ||
+ | *For the current work in progress for building the Hubs datasheet for the scorecard go to: [[Hubs: Hubs Scorecard]] | ||
+ | *For a tracker of work in progress for the dataset building for the scorecard go to [[Hubs: Hubs Data Building]] | ||
+ | *For a high-level overview of the variables for the scorecard go to [[Hubs: Hubs Data]] | ||
+ | |||
=List of Variables= | =List of Variables= | ||
− | For a more in-depth of the variables and procedure please see: [[Hubs: | + | For a more in-depth of the variables and procedure please see: [[Hubs: Hubs Scorecard]]. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach. |
+ | |||
+ | |||
+ | |||
+ | '''07/29''' Ariel: code Hubs variable for Hubs | ||
+ | :<code>E:/McNair/Projects/Hubs/Hubs Variable-Ariel</code> | ||
+ | |||
+ | |||
+ | |||
'''As of Week of 7/25''' | '''As of Week of 7/25''' | ||
===Group 1=== | ===Group 1=== | ||
'''Variables Difficult to Obtain''' | '''Variables Difficult to Obtain''' | ||
− | #'''Founding Date''' | + | #'''Founding Date''' ''(date_founded)'' |
#*''' ''Difficulty:'' ''' Finding date based on our strategies | #*''' ''Difficulty:'' ''' Finding date based on our strategies | ||
#*''' ''New Approach:'' ''' | #*''' ''New Approach:'' ''' | ||
#*#Whois.net Date | #*#Whois.net Date | ||
#*#Factavia/other press release searches | #*#Factavia/other press release searches | ||
− | #'''Multiple locations within city + Franchise''' (as of now just addresses) | + | #'''Multiple locations within city + Franchise''' (as of now just addresses) ''(multi_address)'' |
#*''' ''Difficulty:'' ''' Company or establishment level will impact measurements | #*''' ''Difficulty:'' ''' Company or establishment level will impact measurements | ||
#*''' ''New Approach:'' ''' Will record all addresses at company level | #*''' ''New Approach:'' ''' Will record all addresses at company level | ||
− | #'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) | + | #'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) ''(onsite_Vc_bin)/(onsite_vc_list)'' ''(onsite_angel_bin)/etc.'' |
#*''' ''Levels:'' ''' Binary, list of investors | #*''' ''Levels:'' ''' Binary, list of investors | ||
#*''' ''Difficulty:'' ''' Hub website usually does not include investors | #*''' ''Difficulty:'' ''' Hub website usually does not include investors | ||
Line 22: | Line 36: | ||
===Group 2=== | ===Group 2=== | ||
'''Variables Comfortable, Not Complete''' (rough order of most difficult to least difficult) | '''Variables Comfortable, Not Complete''' (rough order of most difficult to least difficult) | ||
− | #'''Onsite accelerator''' | + | #'''Onsite accelerator''' ''(onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)'' |
#*''' ''Levels:'' ''' Binary, count, list | #*''' ''Levels:'' ''' Binary, count, list | ||
#*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website. | #*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website. | ||
Line 28: | Line 42: | ||
#*#Google searches and procedure to use on website yields decent results | #*#Google searches and procedure to use on website yields decent results | ||
#*#Similar procedure to onsite investors | #*#Similar procedure to onsite investors | ||
− | #'''Size (# members)''' | + | #'''Size (# members)''' ''(num_members)'' |
#*''' ''Levels:'' ''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members) | #*''' ''Levels:'' ''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members) | ||
#*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..." | #*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..." | ||
#*''' ''Approach:'' ''' For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+). | #*''' ''Approach:'' ''' For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+). | ||
− | #'''Office hours investors''' and '''Office hours mentor/advisors''' | + | #'''Office hours investors''' and '''Office hours mentor/advisors''' ''(OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.'' |
#*''' ''Levels:'' ''' Binary for OH, binary for two separate OH, list of names/descriptions of OH | #*''' ''Levels:'' ''' Binary for OH, binary for two separate OH, list of names/descriptions of OH | ||
#*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor | #*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor | ||
#*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out | #*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out | ||
− | #'''Onsite temporary workshops and Networking Meetups''' (Count) | + | #'''Onsite temporary workshops and Networking Meetups''' (Count) ''(onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.'' |
#*''' ''Levels:'' ''' Binary for do they exist, count for each | #*''' ''Levels:'' ''' Binary for do they exist, count for each | ||
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums) | #*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums) | ||
#*''' ''Approach:'' ''' Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings | #*''' ''Approach:'' ''' Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings | ||
− | #'''Onsite code school''' and '''Curriculum''' | + | #'''Onsite code school''' and '''Curriculum''' ''(onsite_long_term_courses)/(onsite_code_school_bin)'' |
#*''' ''Levels:'' ''' Binary for do they exist, binary for each | #*''' ''Levels:'' ''' Binary for do they exist, binary for each | ||
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups | #*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups | ||
#*''' ''Approach:'' ''' Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum | #*''' ''Approach:'' ''' Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum | ||
− | #'''Sponsors/Partners''' (University, Corporate) ( | + | #'''Sponsors/Partners''' (University, Corporate) ''(sponsors_cnt)/(sponsors_list)/etc.'' |
#*''' ''Levels:'' ''' Count, list of sponsors/partners (if exist), separate columns for university and corporate | #*''' ''Levels:'' ''' Count, list of sponsors/partners (if exist), separate columns for university and corporate | ||
− | #*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either | + | #*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either. Not always clear the difference among sponsors, partners, investors. |
#*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest | #*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest | ||
− | #'''Alumni Network''' | + | #'''Alumni Network''' ''(alumni_bin)/(alumni_list)'' |
#*''' ''Levels:'' ''' Binary, list | #*''' ''Levels:'' ''' Binary, list | ||
#*''' ''Difficulty:'' ''' Not all companies list alumni, some only list "selected" | #*''' ''Difficulty:'' ''' Not all companies list alumni, some only list "selected" | ||
#*''' ''Approach:'' ''' Include all that have lists | #*''' ''Approach:'' ''' Include all that have lists | ||
− | #'''Size (sqft)''' | + | #'''Size (sqft)''' ''(size_sqft)'' |
#*''' ''Levels:'' ''' Number in sqft | #*''' ''Levels:'' ''' Number in sqft | ||
#*''' ''Difficulty:'' ''' Not all companies list square feet online | #*''' ''Difficulty:'' ''' Not all companies list square feet online | ||
Line 58: | Line 72: | ||
#*#Google search with key words | #*#Google search with key words | ||
#*#If results do not appear, use of press releases is possible | #*#If results do not appear, use of press releases is possible | ||
− | #'''Onsite Mentors''' | + | #'''Onsite Mentors''' ''(onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)'' |
#*''' ''Levels:'' ''' Count and list of mentors (if exist) | #*''' ''Levels:'' ''' Count and list of mentors (if exist) | ||
#*''' ''Difficulty:'' ''' Not all companies list mentors - bigger issue is onsite investors | #*''' ''Difficulty:'' ''' Not all companies list mentors - bigger issue is onsite investors | ||
Line 65: | Line 79: | ||
===Group 3=== | ===Group 3=== | ||
'''Variables Easy to Obtain''' | '''Variables Easy to Obtain''' | ||
− | #'''Twitter activity''' | + | #'''Twitter activity''' ''(twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)'' |
#*''' ''Levels:'' ''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets | #*''' ''Levels:'' ''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets | ||
#*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle | #*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle | ||
− | #'''Site URL''' | + | #'''Site URL''' ''(url)'' |
#*''' ''Levels:'' ''' URL | #*''' ''Levels:'' ''' URL | ||
#*''' ''Approach:'' ''' Google using Veeral's code that allows us to search | #*''' ''Approach:'' ''' Google using Veeral's code that allows us to search | ||
− | #'''Address''' | + | #''' ''Whois Date'' ''' ''(date_whois)'' |
+ | #*''' ''Levels:'' ''' Date | ||
+ | #*''' ''Approach:'' ''' Date active website was registered | ||
+ | #'''Address''' ''(address)'' | ||
#*''' ''Levels:'' ''' Will include all addresses | #*''' ''Levels:'' ''' Will include all addresses | ||
#*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code | #*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code | ||
− | #'''Nonprofit status''' | + | #'''Nonprofit status''' ''(nonprofit_binary)'' |
#*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization | #*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization | ||
#*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not | #*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not | ||
− | #'''Mission statement''' | + | #'''Mission statement''' ''(missions_stmt)'' |
#*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist) | #*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist) | ||
#*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page | #*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page | ||
− | #'''Specific Industry''' | + | #'''Specific Industry''' ''(spec_industry)'' |
#*''' ''Levels:'' ''' Industry included in statement (no aggregation) | #*''' ''Levels:'' ''' Industry included in statement (no aggregation) | ||
#*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated | #*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated | ||
− | #'''Price for a space/office''' | + | #'''Price for a space/office''' ''(price_space)'' |
#*''' ''Levels:'' ''' Two prices one for shared, other for private | #*''' ''Levels:'' ''' Two prices one for shared, other for private | ||
#*''' ''Approach:'' ''' Uses google methodology with key terms and URL | #*''' ''Approach:'' ''' Uses google methodology with key terms and URL | ||
+ | [[Category: Internal]] | ||
+ | [[Internal Classification: Legacy| ]] |
Latest revision as of 16:35, 2 September 2016
Hubs Pages
- The main page for Hubs can be found: Hubs (Academic Paper)
- For the current work in progress for building the Hubs datasheet for the scorecard go to: Hubs: Hubs Scorecard
- For a tracker of work in progress for the dataset building for the scorecard go to Hubs: Hubs Data Building
- For a high-level overview of the variables for the scorecard go to Hubs: Hubs Data
List of Variables
For a more in-depth of the variables and procedure please see: Hubs: Hubs Scorecard. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.
07/29 Ariel: code Hubs variable for Hubs
E:/McNair/Projects/Hubs/Hubs Variable-Ariel
As of Week of 7/25
Group 1
Variables Difficult to Obtain
- Founding Date (date_founded)
- Difficulty: Finding date based on our strategies
- New Approach:
- Whois.net Date
- Factavia/other press release searches
- Multiple locations within city + Franchise (as of now just addresses) (multi_address)
- Difficulty: Company or establishment level will impact measurements
- New Approach: Will record all addresses at company level
- Onsite Venture Capital v. Angel Investors (e.g. # and Assets Under Management) (onsite_Vc_bin)/(onsite_vc_list) (onsite_angel_bin)/etc.
- Levels: Binary, list of investors
- Difficulty: Hub website usually does not include investors
- New Approach:
- Google key terms with address of Hub
- Start with partners and use google/crunchbase
Group 2
Variables Comfortable, Not Complete (rough order of most difficult to least difficult)
- Onsite accelerator (onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)
- Levels: Binary, count, list
- Difficulty: Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
- Approach:
- Google searches and procedure to use on website yields decent results
- Similar procedure to onsite investors
- Size (# members) (num_members)
- Levels: Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
- Difficulty: Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
- Approach: For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+).
- Office hours investors and Office hours mentor/advisors (OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.
- Levels: Binary for OH, binary for two separate OH, list of names/descriptions of OH
- Difficulty: Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
- Approach: Google approach to get to OH pages and then lookup key words in description to separate out
- Onsite temporary workshops and Networking Meetups (Count) (onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.
- Levels: Binary for do they exist, count for each
- Difficulty: Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums)
- Approach: Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings
- Onsite code school and Curriculum (onsite_long_term_courses)/(onsite_code_school_bin)
- Levels: Binary for do they exist, binary for each
- Difficulty: Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups
- Approach: Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum
- Sponsors/Partners (University, Corporate) (sponsors_cnt)/(sponsors_list)/etc.
- Levels: Count, list of sponsors/partners (if exist), separate columns for university and corporate
- Difficulty: Not all companies will list sponsors, partnesrs, or either. Not always clear the difference among sponsors, partners, investors.
- Approach: Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
- Alumni Network (alumni_bin)/(alumni_list)
- Levels: Binary, list
- Difficulty: Not all companies list alumni, some only list "selected"
- Approach: Include all that have lists
- Size (sqft) (size_sqft)
- Levels: Number in sqft
- Difficulty: Not all companies list square feet online
- Approach:
- Google search with key words
- If results do not appear, use of press releases is possible
- Onsite Mentors (onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)
- Levels: Count and list of mentors (if exist)
- Difficulty: Not all companies list mentors - bigger issue is onsite investors
- Approach: Use two different levels and use of google search
Group 3
Variables Easy to Obtain
- Twitter activity (twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)
- Levels: Twitter Handle, # Tweets in a Month, # Followers, # Retweets
- Approach: Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
- Site URL (url)
- Levels: URL
- Approach: Google using Veeral's code that allows us to search
- Whois Date (date_whois)
- Levels: Date
- Approach: Date active website was registered
- Address (address)
- Levels: Will include all addresses
- Approach: Google key terms (e.g. Contact Us) and URL using Veeral's code
- Nonprofit status (nonprofit_binary)
- Levels: Binary variable indicating if the potential Hub is a nonprofit organization
- Approach: http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
- Mission statement (missions_stmt)
- Levels: Official mission statement or description of company (if mission does not exist)
- Approach: If not explicitly stated mission statement, will include "About" or statements on main page
- Specific Industry (spec_industry)
- Levels: Industry included in statement (no aggregation)
- Approach: *Based on Mission Statement, not aggregated
- Price for a space/office (price_space)
- Levels: Two prices one for shared, other for private
- Approach: Uses google methodology with key terms and URL