Difference between revisions of "Hubs: Hubs Data"

From edegan.com
Jump to navigation Jump to search
 
(174 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This is a new page.
+
=Hubs Pages=
 +
*The main page for Hubs can be found: [[Hubs (Academic Paper)]]
 +
*For the current work in progress for building the Hubs datasheet for the scorecard  go to: [[Hubs: Hubs Scorecard]]
 +
*For a tracker of work in progress for the dataset building for the scorecard go to [[Hubs: Hubs Data Building]]
 +
*For a high-level overview of the variables for the scorecard go to [[Hubs: Hubs Data]]
  
=Test=
+
=List of Variables=
==Test2==
+
For a more in-depth of the variables and procedure please see: [[Hubs: Hubs Scorecard]].  This page will reflect the variables being collected separated into three categories.  Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.
*'''Twitter activity''': ''
 
'''UPDATE (7/14)''': Updated turk to reflect our desired formats
 
'''UPDATE (7/12)''': '''AUDIT RESULTS''': We noticed
 
  
'''UPDATE (7/11)''': uploaded and published on amazon's mechanical turk site.  Given the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet.  Using a sample of ~10 companies, We noticed minimal differences in data observations among using 10,20, and 30 tweets.''
 
#Copy the text in the Search Text into a search engine.
 
#Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs
 
#Record the company's Twitter Handle into Twitter Handle
 
#Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE.
 
  
  
*'''NUMBER OF EVENTS''': ''UPDATE: written, not published, on amazon's mechanical turk site''
+
'''07/29''' Ariel: code Hubs variable for Hubs
'''Considerations'''
+
:<code>E:/McNair/Projects/Hubs/Hubs Variable-Ariel</code>
*Difficulties Encountered:
 
*Expected Time to Complete:
 
*Expectation of Results (accuracy of turk, comprehensiveness):
 
*Other Comments:
 
  
'''Procedure'''
 
#Copy the text in the Search Text into a search engine.
 
#Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
 
#Look for links related to events, such as 'Events' or 'Calendar' on the homepage.
 
#If not found on the homepage, check 'About' and check 'Community'
 
#Count the number of events in July 2016 and record it. If there is no information of events on the website, record DNE.
 
  
Note***: ''Events include meetups, workshops, info sessions etc. We do not want to count them separately since it is difficult to do so. Most companies put all the events on the same section and do not put event types in the titles of the events. We have to look into the details of the events to find out the type and even we do so some events descriptions do not allow us to determine the type easily. Differentiating the types of the events demands more time and effort and therefore is not suitable to be a mechanical turk project.''
 
  
  
*'''Onsite Mentors''': ''UPDATE: written, not published, on amazon's mechanical turk site''
+
'''As of Week of 7/25'''
#Copy the text in the Search Text into a search engine.
+
===Group 1===
#Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
+
'''Variables Difficult to Obtain'''
#Look for links related to mentorship such as 'mentors', 'mentorship' or 'mentoring programs'
+
#'''Founding Date''' ''(date_founded)''
#If the key words can be identified, mark as 1
+
#*''' ''Difficulty:'' ''' Finding date based on our strategies
#If there is no explicit 'mentoring' section, look for links related to a description of the company, such as: 'About,' 'Our Team,' 'Our Mission,' etc., look for a subsection or mention of mentor/mentorship/mentoring
+
#*''' ''New Approach:'' '''  
#If these exist, mark as 1.
+
#*#Whois.net Date
#If not, go to links related to membership 'benefits,' 'perks,' or related.
+
#*#Factavia/other press release searches
#Do same process as end of 4 and 5
+
#'''Multiple locations within city + Franchise''' (as of now just addresses) ''(multi_address)''
#If there is no mention of mentorship in these sections, type the company, city, and 'mentoring' into a search engine.  If a link to a reliable website (such as Desktime) appears and mentorship can be found in the description, mark as 1.
+
#*''' ''Difficulty:'' ''' Company or establishment level will impact measurements
#If none of these steps result in a mark of 1, mark as 0
+
#*''' ''New Approach:'' ''' Will record all addresses at company level
 +
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) ''(onsite_Vc_bin)/(onsite_vc_list)'' ''(onsite_angel_bin)/etc.''
 +
#*''' ''Levels:'' ''' Binary, list of investors
 +
#*''' ''Difficulty:'' ''' Hub website usually does not include investors
 +
#*''' ''New Approach:'' '''  
 +
#*#Google key terms with address of Hub
 +
#*#Start with partners and use google/crunchbase
  
 +
===Group 2===
 +
'''Variables Comfortable, Not Complete''' (rough order of most difficult to least difficult)
 +
#'''Onsite accelerator''' ''(onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)''
 +
#*''' ''Levels:'' ''' Binary, count, list
 +
#*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
 +
#*''' ''Approach:'' '''
 +
#*#Google searches and procedure to use on website yields decent results
 +
#*#Similar procedure to onsite investors
 +
#'''Size (# members)''' ''(num_members)''
 +
#*''' ''Levels:'' ''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
 +
#*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
 +
#*''' ''Approach:'' ''' For companies that have a list, we will count.  For those with select members, we will count those they listed and try to see if there is a comment about how many they have.  For those that just have a statement "with over," we will write the number and + (e.g. "120+).
 +
#'''Office hours investors''' and '''Office hours mentor/advisors''' ''(OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.''
 +
#*''' ''Levels:'' ''' Binary for OH, binary for two separate OH, list of names/descriptions of OH
 +
#*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
 +
#*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out
 +
#'''Onsite temporary workshops and Networking Meetups''' (Count) ''(onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.''
 +
#*''' ''Levels:'' '''  Binary for do they exist, count for each
 +
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums)
 +
#*''' ''Approach:'' ''' Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings
 +
#'''Onsite code school''' and '''Curriculum''' ''(onsite_long_term_courses)/(onsite_code_school_bin)''
 +
#*''' ''Levels:'' '''  Binary for do they exist, binary for each
 +
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups
 +
#*''' ''Approach:'' ''' Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum
 +
#'''Sponsors/Partners''' (University, Corporate) ''(sponsors_cnt)/(sponsors_list)/etc.''
 +
#*''' ''Levels:'' ''' Count, list of sponsors/partners (if exist), separate columns for university and corporate
 +
#*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either.  Not always clear the difference among sponsors, partners, investors.
 +
#*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
 +
#'''Alumni Network''' ''(alumni_bin)/(alumni_list)''
 +
#*''' ''Levels:'' ''' Binary, list
 +
#*''' ''Difficulty:'' ''' Not all companies list alumni, some only list "selected"
 +
#*''' ''Approach:'' ''' Include all that have lists
 +
#'''Size (sqft)''' ''(size_sqft)''
 +
#*''' ''Levels:'' ''' Number in sqft
 +
#*''' ''Difficulty:'' ''' Not all companies list square feet online
 +
#*''' ''Approach:'' '''
 +
#*#Google search with key words
 +
#*#If results do not appear, use of press releases is possible
 +
#'''Onsite Mentors''' ''(onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)''
 +
#*''' ''Levels:'' ''' Count and list of mentors (if exist)
 +
#*''' ''Difficulty:'' ''' Not all companies list mentors - bigger issue is onsite investors
 +
#*''' ''Approach:'' ''' Use two different levels and use of google search
  
*'''Nonprofit''': ''UPDATE: written, not published, on amazon's mechanical turk site''
+
===Group 3===
#Copy the text in the Search Text into a search engine.
+
'''Variables Easy to Obtain'''
#Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
+
#'''Twitter activity''' ''(twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)''
#Go to links that describe the company, usually they are labelled: 'About', 'Our Story,' 'Mission'
+
#*''' ''Levels:'' ''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets
#Look for the key word 'nonprofit'/'non-profit'
+
#*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
#If 'nonprofit' is identified, mark as 1, otherwise 0.
+
#'''Site URL''' ''(url)''
 
+
#*''' ''Levels:'' ''' URL
 
+
#*''' ''Approach:'' ''' Google using Veeral's code that allows us to search
 
+
#''' ''Whois Date'' ''' ''(date_whois)''
*'''Number of Members''': ''UPDATE: written, not published, on amazon's mechanical turk site''
+
#*''' ''Levels:'' ''' Date
#Copy the text in the Search Text into a search engine.
+
#*''' ''Approach:'' ''' Date active website was registered
#Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
+
#'''Address''' ''(address)''
#Look for the link 'Members' or 'Residents', usually they are under the links 'Community', 'Membership', 'Our Space' or 'The Space'.
+
#*''' ''Levels:'' ''' Will include all addresses
#Count the number of members
+
#*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code
#If the link or section of 'Members' is not found, go the 'Community' and 'Coworking' and look for the description on number of startups/founders/members in the community. Record the number.
+
#'''Nonprofit status''' ''(nonprofit_binary)''
#If number of members cannot be identified using above steps, record DNE.
+
#*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization
 
+
#*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
 
+
#'''Mission statement''' ''(missions_stmt)''
*'''Sponsors and Partners''':''UPDATE: written, not published, on amazon's mechanical turk site''
+
#*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist)
#Copy the text in the Search Text into a search engine.
+
#*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page
#Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
+
#'''Specific Industry''' ''(spec_industry)''
#Look for the link or mention of 'Sponsors' or 'Partners', many times of which is often under the section of 'About', 'Community', or related sections
+
#*''' ''Levels:'' ''' Industry included in statement (no aggregation)
#If sponsors or partners can be found mark as 1 and list them, otherwise mark as 0.
+
#*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated
 +
#'''Price for a space/office''' ''(price_space)''
 +
#*''' ''Levels:'' ''' Two prices one for shared, other for private
 +
#*''' ''Approach:'' ''' Uses google methodology with key terms and URL
 +
[[Category: Internal]]
 +
[[Internal Classification: Legacy| ]]

Latest revision as of 16:35, 2 September 2016

Hubs Pages

List of Variables

For a more in-depth of the variables and procedure please see: Hubs: Hubs Scorecard. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.


07/29 Ariel: code Hubs variable for Hubs

E:/McNair/Projects/Hubs/Hubs Variable-Ariel



As of Week of 7/25

Group 1

Variables Difficult to Obtain

  1. Founding Date (date_founded)
    • Difficulty: Finding date based on our strategies
    • New Approach:
      1. Whois.net Date
      2. Factavia/other press release searches
  2. Multiple locations within city + Franchise (as of now just addresses) (multi_address)
    • Difficulty: Company or establishment level will impact measurements
    • New Approach: Will record all addresses at company level
  3. Onsite Venture Capital v. Angel Investors (e.g. # and Assets Under Management) (onsite_Vc_bin)/(onsite_vc_list) (onsite_angel_bin)/etc.
    • Levels: Binary, list of investors
    • Difficulty: Hub website usually does not include investors
    • New Approach:
      1. Google key terms with address of Hub
      2. Start with partners and use google/crunchbase

Group 2

Variables Comfortable, Not Complete (rough order of most difficult to least difficult)

  1. Onsite accelerator (onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)
    • Levels: Binary, count, list
    • Difficulty: Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
    • Approach:
      1. Google searches and procedure to use on website yields decent results
      2. Similar procedure to onsite investors
  2. Size (# members) (num_members)
    • Levels: Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
    • Difficulty: Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
    • Approach: For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+).
  3. Office hours investors and Office hours mentor/advisors (OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.
    • Levels: Binary for OH, binary for two separate OH, list of names/descriptions of OH
    • Difficulty: Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
    • Approach: Google approach to get to OH pages and then lookup key words in description to separate out
  4. Onsite temporary workshops and Networking Meetups (Count) (onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.
    • Levels: Binary for do they exist, count for each
    • Difficulty: Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums)
    • Approach: Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings
  5. Onsite code school and Curriculum (onsite_long_term_courses)/(onsite_code_school_bin)
    • Levels: Binary for do they exist, binary for each
    • Difficulty: Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups
    • Approach: Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum
  6. Sponsors/Partners (University, Corporate) (sponsors_cnt)/(sponsors_list)/etc.
    • Levels: Count, list of sponsors/partners (if exist), separate columns for university and corporate
    • Difficulty: Not all companies will list sponsors, partnesrs, or either. Not always clear the difference among sponsors, partners, investors.
    • Approach: Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
  7. Alumni Network (alumni_bin)/(alumni_list)
    • Levels: Binary, list
    • Difficulty: Not all companies list alumni, some only list "selected"
    • Approach: Include all that have lists
  8. Size (sqft) (size_sqft)
    • Levels: Number in sqft
    • Difficulty: Not all companies list square feet online
    • Approach:
      1. Google search with key words
      2. If results do not appear, use of press releases is possible
  9. Onsite Mentors (onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)
    • Levels: Count and list of mentors (if exist)
    • Difficulty: Not all companies list mentors - bigger issue is onsite investors
    • Approach: Use two different levels and use of google search

Group 3

Variables Easy to Obtain

  1. Twitter activity (twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)
    • Levels: Twitter Handle, # Tweets in a Month, # Followers, # Retweets
    • Approach: Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
  2. Site URL (url)
    • Levels: URL
    • Approach: Google using Veeral's code that allows us to search
  3. Whois Date (date_whois)
    • Levels: Date
    • Approach: Date active website was registered
  4. Address (address)
    • Levels: Will include all addresses
    • Approach: Google key terms (e.g. Contact Us) and URL using Veeral's code
  5. Nonprofit status (nonprofit_binary)
    • Levels: Binary variable indicating if the potential Hub is a nonprofit organization
    • Approach: http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
  6. Mission statement (missions_stmt)
    • Levels: Official mission statement or description of company (if mission does not exist)
    • Approach: If not explicitly stated mission statement, will include "About" or statements on main page
  7. Specific Industry (spec_industry)
    • Levels: Industry included in statement (no aggregation)
    • Approach: *Based on Mission Statement, not aggregated
  8. Price for a space/office (price_space)
    • Levels: Two prices one for shared, other for private
    • Approach: Uses google methodology with key terms and URL