Difference between revisions of "Hubs: Hubs Data"

From edegan.com
Jump to navigation Jump to search
 
(47 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Background=
+
=Hubs Pages=
This page represents the work used for mechanical turks for the paper: [[Hubs (Academic Paper)]]. As of Spring 2016, a list of potential Hubs with a set of characteristics was created. Many of these are not what will be defined as Hubs. We will be creating a scorecard to help subjectively define Hubs based on certain characteristics.
+
*The main page for Hubs can be found: [[Hubs (Academic Paper)]]
 +
*For the current work in progress for building the Hubs datasheet for the scorecard go to: [[Hubs: Hubs Scorecard]]
 +
*For a tracker of work in progress for the dataset building for the scorecard go to [[Hubs: Hubs Data Building]]
 +
*For a high-level overview of the variables for the scorecard go to [[Hubs: Hubs Data]]
  
For more information on Mechanical Turks in general, see [[Mechanical Turk (Tool)]].
+
=List of Variables=
 +
For a more in-depth of the variables and procedure please see: [[Hubs: Hubs Scorecard]].  This page will reflect the variables being collected separated into three categories.  Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.
  
The main goal of the mechanical turk is to automate the collection of variables for potential hubs as much as possible.  The key steps for the project are:
 
#Creating a '''comprehensive''' list of potential hubs
 
#Determining the best variables for the scorecard
 
#Building '''"filters"''' for automating the collection
 
#'''Running''' and '''auditing''' of the automation
 
#Collecting the remaining manual data
 
 
  
=Variables to be Used=
 
==Current Complete List==
 
'''As of Week of 7/11'''
 
#Onsite Venture Capital
 
#*Assets Under Management
 
#*Number
 
#Onsite Angel Investors
 
#Onsite Mentors
 
#Founding Date
 
#Site URL
 
#Office hours investors
 
#Office hours mentor/advisors
 
#Onsite temporary workshops
 
#Networking Meetups
 
#Sponsors/Partners
 
#*University
 
#*Corporate
 
#Curriculum
 
#Onsite code school
 
#Alumni Network
 
#Nonprofit status
 
#Mission statement
 
#Specific Industry
 
#Price for a space
 
#Price for office
 
#Twitter activity
 
#Size (sqft)
 
#Size (# companies)
 
#Onsite accelerator
 
#Community membership??
 
#Franchise
 
#Multiple locations within city
 
  
==Grouping of Variables==
+
'''07/29''' Ariel: code Hubs variable for Hubs
There are a few categories the majority of the variables fall under
+
:<code>E:/McNair/Projects/Hubs/Hubs Variable-Ariel</code>
  
'''Group 1: Low Hanging Fruit'''
 
Variables in this group are very easy to find and automate.
 
#Twitter Activity
 
#URL
 
#Address
 
#Mission Statement
 
#Specific Industry
 
#Nonprofit
 
#Sponsors/Partners
 
#Price for a space + office
 
#Founding Date
 
  
  
'''Group 2: The Difficult to Find'''
 
There are certain variables where the information is not readily available online or difficult to find.
 
#Size (can try to find press releases)
 
  
 
+
'''As of Week of 7/25'''
'''Group 3: In Between 1 and 2'''
 
Variables that aren't too easy or difficult to find and automate.
 
#Onsite accelerator
 
#Alumni mentor---vs. other mentors???
 
 
 
 
 
'''Group 4: The Hard to Differentiate'''
 
The key property of this group is that there are several similar variables, which would be difficult for a turk to differentiate.  In order to fix this,  we will need to create filters akin to the DSM5 scorecard.  See the below section.
 
#Onsite VC v. Angel Investors
 
#Onsite OH Investors v. mentors
 
#Onsite temporary workshops v. networking events
 
#Curriculum v. code school
 
 
 
 
 
'''Group 5: The Need further Discussion Before Collection'''
 
Variables that need to be developed more prior to collection.
 
#Franchise and multiple locations within a city
 
#Community Membership
 
 
 
==Filters/Scorecard==
 
===General Approach===
 
The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g.  a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results.
 
 
 
===Example===
 
'''Curriculum'''
 
*'''Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
 
*'''Characteristics''':
 
**Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
 
***Code schools are for startup labor supply
 
**Active input into a current entrepreneurial endeavor
 
***e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors"
 
**Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be
 
**Has evidence of a integrated curriculum leading to a new compentance
 
**Has evidence of a set fixed start and end dates that last XXX long
 
**Is a session linked to others that regularly occurs
 
*'''TBD points'''
 
**Do we care about outsourcing?
 
*'''Potential Turk'''
 
 
 
'''Code School'''
 
*'''Desc''': training programs that teach coding, data processing, webpage building and other technical skills.
 
*'''Characteristics''':
 
**Target group are the developers or people who want to join the startups but not the founders themselves
 
**Scheduled classes, not a one time meeting (as opposed to workshops)
 
 
 
'''Temporary Workshops'''
 
*'''Desc''':a discussion/learning of a group of people on specific subjects
 
*'''Characteristic''':
 
**One time
 
**Have a topic/subject/goal
 
***e.g. learn to code workshop: Java script 101
 
 
 
=Additional Resources=
 
#[[Mechanical Turk (Tool)]]
 
#Veeral has created a google automating procedure for different lists
 
 
 
 
 
=Work in Progress=
 
==Goals for WIP==
 
#For GROUP 1, creation of mechanical turk steps:
 
#*'''EXAMPLE:'''
 
#*'''Twitter Activity'''
 
#**'''STATUS''': Complete/In Progress/Not Started
 
#**'''Previously Collected''': Yes/No
 
#**'''Published on Mechanical Turk''': Yes/No
 
#**'''Audited''': Yes/No
 
#**'''Updates''':
 
#**'''Code''':
 
#For GROUP 4:
 
##Scorecard Example
 
##Potential Mechanical Turk Steps (e.g. if specific organization is on website)
 
##Mechanical Turk Example (GROUP 1)
 
##Add Comments on:
 
###How much manual work remains/What is missing
 
###Any remaining difficulties
 
#For GROUPS 2 and 3:
 
##Brainstorm potential ways to find data
 
##Follow Steps in Group1
 
 
 
==Steps Needed to Complete==
 
#Establish automation process for Groups 1-3
 
#*Status (7/21): G/Y: Founding date issues
 
#*Begin Date: Started
 
#*Reach Goal: Complete By Friday 7/22
 
#Differentiate variables in Group 4
 
#*Status (7/21): Green: much progress has been made
 
#*Begin Date: Started
 
#*Reach Goal: Complete by Wednesday 7/27
 
#Have a comprehensive list of potential hubs
 
#*Status (7/21): Hannah working on this
 
#Test processes and audit
 
#*Status (7/21): NS
 
#*Begin Date: TBD
 
#*Reach Goal: TBD
 
#Fill in Remaining Data Manually
 
#*Status (7/21): NS
 
#*Begin Date: TBD
 
#*Reach Goal: TBD
 
 
 
==Actual WIP==
 
 
===Group 1===
 
===Group 1===
#Twitter Activity
+
'''Variables Difficult to Obtain'''
#*'''STATUS''': Complete
+
#'''Founding Date''' ''(date_founded)''
#*'''Previously Collected''': YES/NO - Recorded 2/1/0 to represent activity level, but not same as we are
+
#*''' ''Difficulty:'' ''' Finding date based on our strategies
#*'''Published on Mechanical Turk''': Yes
+
#*''' ''New Approach:'' '''  
#*'''AUDITED''': Yes
+
#*#Whois.net Date
#**'''Audit Results''': Comparing to 30 that manually  done, for '''twitter handle,''' all 3 turkers agreed with our results 81% of the time, but at least 2 turkers agreed with our results 98% (the exception was a company that had several twitter handles based on location).  Results were 52% and 89% respectively.
+
#*#Factavia/other press release searches
#*'''UPDATES''':
+
#'''Multiple locations within city + Franchise''' (as of now just addresses) ''(multi_address)''
#**'''UPDATE (7/20)''': Gunny has created a tool to do this process
+
#*''' ''Difficulty:'' ''' Company or establishment level will impact measurements
#**'''UPDATE (7/14)''': Updated turk to reflect our desired formats
+
#*''' ''New Approach:'' ''' Will record all addresses at company level
#**'''UPDATE (7/12)''': Audited
+
#'''Onsite Venture Capital v. Angel Investors''' (e.g. # and Assets Under Management) ''(onsite_Vc_bin)/(onsite_vc_list)'' ''(onsite_angel_bin)/etc.''
#**'''UPDATE (7/11)''': uploaded and published on amazon's mechanical turk site.  Given the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet.  Using a sample of ~10 companies, We noticed minimal differences in data observations among using 10,20, and 30 tweets.''
+
#*''' ''Levels:'' ''' Binary, list of investors
#*'''CODE'''
+
#*''' ''Difficulty:'' ''' Hub website usually does not include investors
#*#Copy the text in the Search Text into a search engine.
+
#*''' ''New Approach:'' '''  
#*#Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs
+
#*#Google key terms with address of Hub
#*#Record the company's Twitter Handle into Twitter Handle
+
#*#Start with partners and use google/crunchbase
#*#Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE.
 
#URL
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': YES
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/18)''': Code written, expected time for each assignment is <15 seconds - pay rate, therefore, recommended $.04
 
#*'''CODE'''
 
#*#Copy the text in the Search Text into a search engine.
 
#*#Record the URL of the first result in the following format www.___.__/ (e.g. if url is example.us/other, record www.example.us/)
 
#Address
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': YES
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/22)''': Code written.  Difficulties occur with very large companies (e.g. Impact Hub).  Will require Veeral's program, expected time for each assignment is 10-20 seconds - pay rate, therefore, recommended $.05
 
#*'''CODE'''
 
#*#Using Veeral's code, crossproduct allintext: (Group A) and site: (Group B), where '''Group A'''=Contact (high coverage), About Us, Find Us, Locations, Address, '''Group B'''= Company URLs.
 
#*Click on first result.  If addresses exist, record in ADDRESS, STATE, and ZIP.
 
#*#If not, go to company's URL. If addresses exist, record in ADDRESS, STATE, and ZIP.
 
#*#If address exists, but ZIP does not, plug in address into search engine and record ZIP.
 
#*#Otherwise, record DNE.
 
#Mission Statement
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': YES
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/18)''': Code written, expected time for each assignment is 20-30 seconds - pay rate, therefore, recommended $.08
 
#*'''CODE'''
 
#*#Copy the text in the Search Text 1 into a search engine (will include site:__ from Company's URL).
 
#*#Click on first link that is a subsection (e.g. "Mission", "About") from company's website (see Company's URL)
 
#*#If this does not exist, repeat steps 1 and 2 with Search Text 2
 
#*#If this does not exist, got to Company's URL
 
#*#Record the main text on the page up to five paragraphs (some of these will be a single line).  Do NOT record subsections.
 
#*#If locating the main text in  the prior step is unclear, record "Unclear"
 
#*#If no text exists, record "DNE"
 
#Specific Industry
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': YES/NO, based on LinkedIn identifier
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/21)''': Given that most companies include their specialty in mission statement and difficulty to turk, we will manually check each mission statement and mark it accordingly.
 
#*'''CODE'''
 
#*#NONE
 
#Nonprofit
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': NO
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#**'''REQUIRES ADDITIONAL STEPS''': YES (need to double check results)
 
#*'''UPDATES''':
 
#**'''UPDATE (7/19)''': Code written, code 2 of 2 is believed to be more accurate and efficient.  Expected time to complete is 15 seconds - pay rate, therefore, recommended $.04
 
#*'''CODE 1 of 2'''
 
#*#Go to Company's URL.
 
#*#Go to links (sometimes will be sections of the URL page) that describe the company, usually they are labelled: 'About', 'Our Story,' 'Mission'.
 
#*#If none of these exist, record DNE for PAGES
 
#*#Look for the word 'profit'/'nonprofit'/'non-profit'/'not-for-profit'  (with or without -)
 
#*#If any of the key words exist is identified, record as 1, otherwise 0 for EXISTS (1/0).
 
#*#If it is marked as 1, record all sentences that the word is found in under SENTENCES.
 
#*#If the links do exist, record the name of the link under PAGES
 
#*#Repeat steps 4, 5, and 6 on the pages that were linked.
 
#*'''CODE 2 of 2'''
 
#*#Copy the text from Search Text into the search bar at http://www.guidestar.org/.
 
#*#Record all Organization Names that appear
 
#*#If no results appear, record DNE
 
#Sponsors/Partners
 
#*'''STATUS''': In Progress
 
#*'''Previously Collected''': NO
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/21)''': Code written, but may require additional manual work. Expected time to complete is 45 seconds due to a potential list of a lot of sponsors/partners - pay rate, therefore, recommended $.12.
 
#*'''CODE'''
 
#*#Choose first result from Search Text 1 and Search Text 2 (allintext: Sponsors/Partnerrs site:URL)
 
#*#Record all Sponsors from Search Text 1 into SPONSORS.  If there does not exist a list or the link was for only 1 sponsor, record DNE.
 
#*#If any Sponsors from Search Text 1 include a University or College (will be listed in name), record them into UNIVERSITY SPONSORS
 
#*#Record all Partners from Search Text 2 into PARTNERS. If there does not exist a list or the link was for only 1 partner, record DNE.
 
#*#If any Partners from Search Text 2 include a University or College (will be listed in name), record them into UNIVERSITY PARTNERS
 
#Price for a space + office
 
#*'''STATUS''': Not Started
 
#*'''Previously Collected''': YES
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/22)''':  Code 1 written, code 2 need more work
 
#*'''CODE 1 of 2'''
 
#*#Go to company’s URL
 
#*#On the homepage, look for the section related to pricing. If pricing is not found in the homepage, look for the links ‘coworking’, ‘work space’, ‘membership’, ‘pricing’ ,‘join’ or ‘Apply for membership’ , and look for the pricing information under those links. If there is no price related section, record DNE for both ‘Flexible Desk’ and ‘Dedicated Desk’.
 
#*#If there is pricing information, look for the price of sharing space per month, often denoted as Shared/Flexible desk/non-dedicated desk, record the price at ‘Flexible Desk’. If the price is not found, record DNE.
 
#*#Look for the price of a dedicated desk per month, often denoted as Reserved/dedicated desk/private desk record the price at ‘Dedicated Desk’.  If the price is not found, record DNE.
 
#*#If price information is not found and there is a ‘locations’ link, click on it and choose the first location of the list. Repeat step 3 -4.
 
#*'''CODE 2 of 2'''
 
#*#Keywords: 24/7 access, dedicated desk, pricing
 
#*#Google allintext:"keywords" site:URL
 
#*#TBD
 
#Founding Date
 
#*'''STATUS''': To Be Discussed Further
 
#*'''Previously Collected''': YES, but only year
 
#*'''Published on Mechanical Turk''': NO
 
#*'''AUDITED''': NO
 
#**'''Audit Results''': TBD
 
#*'''UPDATES''':
 
#**'''UPDATE (7/21)''': Difficulties observed when figuring out how to Turk this
 
#*'''CODE'''
 
#*#Copy the text in the Search Text into a search engine.
 
#*#TBD
 
  
 
===Group 2===
 
===Group 2===
#Size
+
'''Variables Comfortable, Not Complete''' (rough order of most difficult to least difficult)
#*'''BRAINSTORM''': (7/19) 1), 2), 3): search allintext: sqft/square foot/square feet  site: company URL4) Company Name, city, square feet and then choose frist resultProcess might be easier (and cheaper) if Veeral runs code firstto eliminate a bunch of 0 result returned.
+
#'''Onsite accelerator''' ''(onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)''
#*'''STATUS''': In Progress
+
#*''' ''Levels:'' ''' Binary, count, list
#*'''Previously Collected''': YES/NO, many missing
+
#*''' ''Difficulty:'' ''' Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
#*'''Published on Mechanical Turk''': NO
+
#*''' ''Approach:'' '''
#*'''AUDITED''': NO
+
#*#Google searches and procedure to use on website yields decent results
#**'''Audit Results''': TBD
+
#*#Similar procedure to onsite investors
#*'''UPDATES''':
+
#'''Size (# members)''' ''(num_members)''
#**'''UPDATE (7/19)''': Brainstorm and code updated
+
#*''' ''Levels:'' ''' Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
#*'''CODE'''
+
#*''' ''Difficulty:'' ''' Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
#*#Copy the text in the Search Text 1 into a search engine.
+
#*''' ''Approach:'' ''' For companies that have a list, we will countFor those with select members, we will count those they listed and try to see if there is a comment about how many they haveFor those that just have a statement "with over," we will write the number and + (e.g. "120+).
#*#Record DNE if 0 results returned in SEARCH 1
+
#'''Office hours investors''' and '''Office hours mentor/advisors''' ''(OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.''
#*#If there is a result, click first link in which result search text appears and record the sentence in which the text appears in SEARCH 1
+
#*''' ''Levels:'' ''' Binary for OH, binary for two separate OH, list of names/descriptions of OH
#*#Repeat Steps 1-3 for Search Text 2 and 3 and record in  respective SEARCH 2 and SEARCH 3 respectively
+
#*''' ''Difficulty:'' ''' Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
 +
#*''' ''Approach:'' ''' Google approach to get to OH pages and then lookup key words in description to separate out
 +
#'''Onsite temporary workshops and Networking Meetups''' (Count) ''(onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.''
 +
#*''' ''Levels:'' '''  Binary for do they exist, count for each
 +
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums)
 +
#*''' ''Approach:'' ''' Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings
 +
#'''Onsite code school''' and '''Curriculum''' ''(onsite_long_term_courses)/(onsite_code_school_bin)''
 +
#*''' ''Levels:'' '''  Binary for do they exist, binary for each
 +
#*''' ''Difficulty:'' ''' Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups
 +
#*''' ''Approach:'' ''' Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum
 +
#'''Sponsors/Partners''' (University, Corporate) ''(sponsors_cnt)/(sponsors_list)/etc.''
 +
#*''' ''Levels:'' ''' Count, list of sponsors/partners (if exist), separate columns for university and corporate
 +
#*''' ''Difficulty:'' ''' Not all companies will list sponsors, partnesrs, or either.  Not always clear the difference among sponsors, partners, investors.
 +
#*''' ''Approach:'' ''' Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
 +
#'''Alumni Network''' ''(alumni_bin)/(alumni_list)''
 +
#*''' ''Levels:'' ''' Binary, list
 +
#*''' ''Difficulty:'' ''' Not all companies list alumni, some only list "selected"
 +
#*''' ''Approach:'' ''' Include all that have lists
 +
#'''Size (sqft)''' ''(size_sqft)''
 +
#*''' ''Levels:'' ''' Number in sqft
 +
#*''' ''Difficulty:'' ''' Not all companies list square feet online
 +
#*''' ''Approach:'' '''
 +
#*#Google search with key words
 +
#*#If results do not appear, use of press releases is possible
 +
#'''Onsite Mentors''' ''(onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)''
 +
#*''' ''Levels:'' ''' Count and list of mentors (if exist)
 +
#*''' ''Difficulty:'' ''' Not all companies list mentors - bigger issue is onsite investors
 +
#*''' ''Approach:'' ''' Use two different levels and use of google search
  
 
===Group 3===
 
===Group 3===
#Mentors
+
'''Variables Easy to Obtain'''
#*'''BRAINSTORM''': Current form of this variable seems to be too general.
+
#'''Twitter activity''' ''(twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)''
#*'''STATUS''': In Progress
+
#*''' ''Levels:'' ''' Twitter Handle, # Tweets in a Month, # Followers, # Retweets
#*'''Previously Collected''': NO
+
#*''' ''Approach:'' ''' Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
#*'''Published on Mechanical Turk''': NO
+
#'''Site URL''' ''(url)''
#*'''AUDITED''': NO
+
#*''' ''Levels:'' ''' URL
#**'''Audit Results''': TBD
+
#*''' ''Approach:'' ''' Google using Veeral's code that allows us to search
#*'''UPDATES''':
+
#''' ''Whois Date'' ''' ''(date_whois)''
#**'''UPDATE (7/19)''': Two possible codes written. First one requires more manual work
+
#*''' ''Levels:'' ''' Date
#*'''CODE 1 of 2'''
+
#*''' ''Approach:'' ''' Date active website was registered
#*#Go to Company URL
+
#'''Address''' ''(address)''
#*#Look for links related to mentorship such as 'mentors', 'mentorship' or 'mentoring programs'.
+
#*''' ''Levels:'' ''' Will include all addresses
#*#If the key words can be identified, record  1  in BINARY, copy the sentence it is included in SENTENCE, and record urlhome in PAGE.
+
#*''' ''Approach:'' ''' Google key terms (e.g. Contact Us) and URL using Veeral's code
#*#If there is no explicit 'mentoring' section, look for links related to a description of the company, such as: 'About,' 'Our Team,' 'Our Mission,' etc., and look for a subsection or mention of mentor/mentorship/mentoring.
+
#'''Nonprofit status''' ''(nonprofit_binary)''
#*#If these exist, record 1  in BINARY, copy the sentence it is included in SENTENCE, and record the link name clicked in PAGE.
+
#*''' ''Levels:'' ''' Binary variable indicating if the potential Hub is a nonprofit organization
#*#If not, go to links related to membership 'benefits,' 'perks,' or related and repeat Step 5.
+
#*''' ''Approach:'' ''' http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
#*#If none of these steps result in a mark of 1, mark as 0.
+
#'''Mission statement''' ''(missions_stmt)''
#*'''CODE 2 of 2'''
+
#*''' ''Levels:'' ''' Official mission statement or description of company (if mission does not exist)
#*#Copy Search Text into search engine
+
#*''' ''Approach:'' ''' If not explicitly stated mission statement, will include "About" or statements on main page
#*#Mark as 1 if reliable site is populated, 0 otherwise
+
#'''Specific Industry''' ''(spec_industry)''
#Onsite Accelerator
+
#*''' ''Levels:'' ''' Industry included in statement (no aggregation)
#*'''BRAINSTORM''': Need a count.
+
#*''' ''Approach:'' ''' *Based on Mission Statement, not aggregated
#*'''STATUS''': Not Started
+
#'''Price for a space/office''' ''(price_space)''
#*'''Previously Collected''': YES/NO, only a binary variable
+
#*''' ''Levels:'' ''' Two prices one for shared, other for private
#*'''Published on Mechanical Turk''': NO
+
#*''' ''Approach:'' ''' Uses google methodology with key terms and URL
#*'''AUDITED''': NO
+
[[Category: Internal]]
#**'''Audit Results''': TBD
+
[[Internal Classification: Legacy| ]]
#*'''UPDATES''':
 
#**'''UPDATE (7/21)''': Code written - 2nd part, while more manual, appears to have greater range.  2nd code would only require Veeral's code.  1st code expected completion time is 30 seconds.
 
#*'''CODE 1 of 2'''
 
#*#Go to company's URL
 
#*#Look for the link 'Accelerators' or 'Accelerating/Accelerator/Acceleration/Accelerate Programs'
 
#*#If accelerators are found, count the number of accelerators/accelerating programs and record the number. **or also copy the names of the accelerators?
 
#*#If accelerators are not found in step 1, go to the links 'Services' , 'Benefit', 'Resources', 'For Entrepreneurs', 'Startups' and look for the section of 'Accelerator/Accelerating Programs'
 
#*#If accelerators are found, count the number of accelerators/accelerating programs and record the number.
 
#*#If accelerators are not found, record 0.
 
#*'''CODE 2 of 2'''
 
#*#Search [allintitle:"accelerator"/"accelerate" site:URL] in Google
 
#*#Copy the titles of the results. **We have to scrutinize the titles ourselves to determine whether they are distinct onsite accelerators and record the number manually.
 
#*#If no result appears, record 0.
 
 
 
===Group 4===
 
====Curriculum and Code School====
 
'''Curriculum'''
 
*'''Desc''': The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
 
*'''Characteristics''':
 
**Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
 
***Code schools are for startup labor supply
 
**Active input into a current entrepreneurial endeavor
 
***e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors"
 
**Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be
 
**Has evidence of a integrated curriculum leading to a new compentance
 
**Has evidence of a set fixed start and end dates that last XXX long
 
**Cultivate leadership for entrepreneurs
 
**Tagged "Business" as opposed to 'Tech' or 'Design'
 
**Is a session linked to others that regularly occurs
 
*'''TBD points'''
 
**Do we care about outsourcing?
 
*'''Potential Turks'''
 
**'''Search Text''': Fullbridge, leadership program, business academy, business course, aspiring entrepreneurs
 
#Google: "Search Text" site:URL
 
#Record 0 if no result returns.
 
#If there is a result, click first link in which result search text appears and record the sentence in which the text appears.
 
 
 
'''Code School'''
 
*'''Desc''': training programs that teach coding, data processing, webpage building and other technical skills.
 
*'''Characteristics''':
 
**Bootcamps
 
**Target group are the developers or people who want to join the startups but not the founders themselves
 
**Scheduled classes, not a one time meeting (as opposed to workshops)
 
*'''TBD points'''
 
*'''Potential Turks'''
 
**'''Search Text 1''': website design, coding, web development, software, bootcamp
 
**'''Search Text 2''': General Assembly, Anyone Can Learn to Code, Umbraco, Designation, Boise CodeWorks, Grand Circus, DevMountain, Silicon Valley Data Academy, Academy Pittsburgh
 
#Google: "Search Text" site:URL
 
#Record 0 if no result returns.
 
#If there is a result, click first link in which result search text appears and record the sentence in which the text appears.
 
 
 
 
 
====Onsite OH Investors v. mentors====
 
Thoughts (Ariel, 07/20):
 
The names listed on 'mentor' page/sections must all be mentors, and the same applies for investors/OH investors although few companies list their investors. So here the only thing we are trying to differentiate is whether the mentor is a investor. maybe via checking whether they are from a VC firm?? But even they are from VC companies doesn't mean they are going to invest on the startups of the Hubs they are mentoring on.  Or another way to think about it is differentiating between mentors/OH mentors. Mentors tend to give the particular startups long term support and available when needed while OH mentors only gives advice on the spot.
 
 
 
'''Mentors'''
 
 
 
*'''Desc''':
 
 
 
*'''Characteristics''':
 
**Focus on improving entrepreneurial community through ongoing, recurring support
 
**Help and guide the startups on: business plans and models, management, development, execution, technology innovation, marketing, sales
 
**Common fields/occupations: founder/CEO of another company, business development, serial entrepreneur, marketing, sales, management consulting, technology and innovation, research professor etc.
 
**Some companies offer mentor office hours
 
 
 
*'''TBD Points''':
 
 
 
 
 
'''Investors'''
 
 
 
*'''Desc''':
 
 
 
*'''Characteristics''':
 
**Focus on investing on early stage or growth stage startups
 
**Usually from VC firms
 
**Common fields/ occupations: VC firm manager, VC firm partner, fund manager
 
 
 
*'''TBD Points''':
 
 
 
*'''Potential Turks''':
 
#Search allintext:"office hours" site:URL
 
#Mark ''office hours'' as 1 if there is a result, otherwise mark as 0.
 
#Click on the first five results
 
#On each of the five pages, search for two items:
 
##search for 'mentor'. (Ctrl + F) If 'mentor' appears in the description paragraph of office hours on any of the five pages, mark ''mentor OH'' as 1. Otherwise mark as DNE and copy the description paragraph of office hours of all five pages.
 
##search for 'fund'. (Ctrl + F) If 'fund' appears in the description paragraph of office hours on any of the five pages, mark ''investor OH'' as 1. Otherwise mark as DNE and copy the description paragraph of office hours of all five pages.
 
 
 
 
 
====Onsite temporary workshops v. networking events====
 
'''Temporary Workshops'''
 
 
 
*'''Desc''':
 
 
 
*'''Characteristics''':
 
**The purpose is learning and discussing
 
**Often have a specific topic: business issue (e.g. online marketing) or techniques learning (e.g. intro to Java script)
 
**In the forms of: workshop, class, panel, project, XX session, seminar, series, intro to XX
 
**Exception: tech meetup is usually a workshop(e.g. C++ programmer meetup, http://techranchaustin.com/events/)
 
 
 
*'''TBD Points''':
 
**Do we care about what particular workshops (e.g. coding, leadership, etc.)?
 
**Summits/major events
 
 
 
*'''Potential Turks''':
 
**See Turk for Both Below
 
 
 
'''Networking Events'''
 
 
 
*'''Desc''':
 
 
 
*'''Characteristics''':
 
**The purpose is to meet fellow entrepreneurs and experts and networking with them
 
**Focus on experience sharing or communication as opposed to discussing a specific topic or technical subject
 
**In the forms of: meetup, networking, happy hour, info session?, luncheon, XX night, socials, talks??, community XX
 
 
 
*'''TBD Points''':
 
 
 
*'''Potential Turks''':
 
**See Turk for Both Below
 
 
 
*'''Turk for Both 1 of 2''':
 
*#Search the Search Text 1 (allintext: events site: URL) and choose link to "Events", "Calendar", or related.  Record 'url' on SOURCE If this does not exist, go to Step 7
 
*#For all events that have dates, copy the events from today's date to the following month into ALL EVENTS
 
*#For all events that have Office Hours in the name, record the events in OFFICE HOURS.  For all events that have summit, record the events in SUMMITS.
 
*#For all events that are related to teaching or learning (e.g. contain "Training," "Seminar," "Class," "Learn," "Bootcamp," "Workshop," "Pitch Event", copy the name of the events into WORKSHOPS
 
*#For all events that are related to scoial activities and networking (e.g. "Social," "Meet Up," "Breakfast"/"Lunch"/"Happy Hour", "Movie Night"/"Bowling"), copy the name of the events into NETWORKING.  For all events that are unclear or did not fit into these descriptions
 
*#If a message explicity says there are no events, mark as 0 for ALL EVENTS, OFFICE HOURS, SUMMITS, WORKSHOPS, and NETWORKING
 
*#If this does not exist, search Search Text 2 (allintext: Company Name site: meetup.com) and click on the meetup.com for the company if it exists.  If it does exist, record meetup on SOURCE.  If not, go to step 9.
 
*#Repeat Steps 2-6.
 
*#If this does not exist, search Search Text 3 (allintext: Company Name site: eventbrite.com) and click on the eventbrite.com for the company if it exists.  If it does exist, record eventbrite on SOURCE.  If not, mark DNE for all variables.
 
*#Repeat Steps 2-6.
 
 
 
*'''Turk for Both 2 of 2''':
 
#Go to Company URL
 
#Look for links related to events, such as 'Events' or 'Calendar' on the homepage.
 
#If not found on the homepage, check 'About' and check 'Community'
 
#Count the number of events from today's date to next months and record it in ALL EVENTS. If there is no information of events or dates of the events on the website, record DNE for all variables.
 
#For all events that have Office Hours in the name, count the number of events in OFFICE HOURS.  For all events that have summit, count the number of the events in SUMMITS.
 
#For all events that are related to teaching or learning (e.g. contain "Training," "Seminar," "Class," "Learn," "Bootcamp," "Workshop," "Pitch Event", count the number of the events into WORKSHOPS
 
#For all events that are related to scoial activities and networking (e.g. "Social," "Meet Up," "Breakfast"/"Lunch"/"Happy Hour", "Movie Night"/"Bowling"), count the number of the events into NETWORKING
 
 
 
 
 
====Onsite VC v. Angel Investors====
 
 
 
 
 
 
 
===Group 5===
 
#Multiple Locations
 
#*Addresses are included in Group 1, but still needs to be discussed
 
#*Getting
 
 
 
==Companies Used for Auditing/etc.==
 
Capital Factory, Austin
 
1871, Chicago
 
Rocket Space, San Francisco
 
1776, Washington D.C.
 
Betamore, Baltimore
 
Packard Place, Charlotte
 
The venture Center, Little Rock
 
GSV Labs, San Francisco
 
The Hive, Palo Alto
 
Innovation Pavilion, Denver
 
OSC Tech Lab, Akron
 
Speakeasy, Indianapolis
 
Riverside.io, Riverside
 
The Salt Mines, Columbus
 
InNEVation, Las Vegas
 
804 RVA
 
Impact Hub, Salt Lake
 
Awesome Inc, Louisville
 
Geekdom, San Antonio
 
Alloy26, Pittsburg
 
ReSET, Hartford
 
Ansir Innovation Center, San Diego
 
Domistation, Tallahassee
 
Atlanta Tech Village, Atlanta
 
Spark Labs, New York
 
 
 
=Completed Work=
 
See Section 3 of [[Hubs (Academic Paper)]]
 

Latest revision as of 16:35, 2 September 2016

Hubs Pages

List of Variables

For a more in-depth of the variables and procedure please see: Hubs: Hubs Scorecard. This page will reflect the variables being collected separated into three categories. Each variable will include a breakdown of levels being collected if the definition is not trivial and an approximate approach.


07/29 Ariel: code Hubs variable for Hubs

E:/McNair/Projects/Hubs/Hubs Variable-Ariel



As of Week of 7/25

Group 1

Variables Difficult to Obtain

  1. Founding Date (date_founded)
    • Difficulty: Finding date based on our strategies
    • New Approach:
      1. Whois.net Date
      2. Factavia/other press release searches
  2. Multiple locations within city + Franchise (as of now just addresses) (multi_address)
    • Difficulty: Company or establishment level will impact measurements
    • New Approach: Will record all addresses at company level
  3. Onsite Venture Capital v. Angel Investors (e.g. # and Assets Under Management) (onsite_Vc_bin)/(onsite_vc_list) (onsite_angel_bin)/etc.
    • Levels: Binary, list of investors
    • Difficulty: Hub website usually does not include investors
    • New Approach:
      1. Google key terms with address of Hub
      2. Start with partners and use google/crunchbase

Group 2

Variables Comfortable, Not Complete (rough order of most difficult to least difficult)

  1. Onsite accelerator (onsite_accel_bin)/(onsite_accel_cnt)/(onsite_accel_list)
    • Levels: Binary, count, list
    • Difficulty: Usually not a list, which requires more scrubbing as many other variables just require us to find one page on a website.
    • Approach:
      1. Google searches and procedure to use on website yields decent results
      2. Similar procedure to onsite investors
  2. Size (# members) (num_members)
    • Levels: Count for companies (currently not planning to include list of companies given that some potential hubs have 200+ members)
    • Difficulty: Some companies don’t list all members - only selective ones-, others do not separate current members and alumni, and some just write "we have served more than 120 startups..."
    • Approach: For companies that have a list, we will count. For those with select members, we will count those they listed and try to see if there is a comment about how many they have. For those that just have a statement "with over," we will write the number and + (e.g. "120+).
  3. Office hours investors and Office hours mentor/advisors (OH_bin)/(OH_inv_bin)/(OH_inv_list)/etc.
    • Levels: Binary for OH, binary for two separate OH, list of names/descriptions of OH
    • Difficulty: Some companies do not list who OH are with, not always obvious if investor, mentor, or advisor, sometimes not clear if mentor is investor/future investor
    • Approach: Google approach to get to OH pages and then lookup key words in description to separate out
  4. Onsite temporary workshops and Networking Meetups (Count) (onsite_temp_events_bin)/(onsite_temp_workshop_bin)/(onsite_temp_workshop_cnt)/etc.
    • Levels: Binary for do they exist, count for each
    • Difficulty: Difficult for Turkers to differentiate between these two and also other potential events (e.g. symposiums)
    • Approach: Uses key search terms (e.g. Java/etc.) to separate out workshops and key terms (e.g. lunch/happy hour) for networking meetings
  5. Onsite code school and Curriculum (onsite_long_term_courses)/(onsite_code_school_bin)
    • Levels: Binary for do they exist, binary for each
    • Difficulty: Difficult for Turkers to differentiate between long-term coding programs for individuals and curriculum for startups
    • Approach: Uses key search terms (e.g. specific code schools) to separate out known code schools and also to look into key terms (e.g. leadership) for curriculum
  6. Sponsors/Partners (University, Corporate) (sponsors_cnt)/(sponsors_list)/etc.
    • Levels: Count, list of sponsors/partners (if exist), separate columns for university and corporate
    • Difficulty: Not all companies will list sponsors, partnesrs, or either. Not always clear the difference among sponsors, partners, investors.
    • Approach: Use two different levels and use of google search, then if list exists, separate by "college"/"university" and rest
  7. Alumni Network (alumni_bin)/(alumni_list)
    • Levels: Binary, list
    • Difficulty: Not all companies list alumni, some only list "selected"
    • Approach: Include all that have lists
  8. Size (sqft) (size_sqft)
    • Levels: Number in sqft
    • Difficulty: Not all companies list square feet online
    • Approach:
      1. Google search with key words
      2. If results do not appear, use of press releases is possible
  9. Onsite Mentors (onsite_mentors_bin)/(onsite_mentors_cnt)/(onsite_mentors_list)
    • Levels: Count and list of mentors (if exist)
    • Difficulty: Not all companies list mentors - bigger issue is onsite investors
    • Approach: Use two different levels and use of google search

Group 3

Variables Easy to Obtain

  1. Twitter activity (twit_handle)/(twit_prev_mon_cnt_tweets)/(twit_cnt_followers)/(twit_cnt_retweets)
    • Levels: Twitter Handle, # Tweets in a Month, # Followers, # Retweets
    • Approach: Easy to get twitter handle from Turk or Veeral's code that allows us to run a series of searches on google and then use Gunny's Twitter crawler to get other levels from handle
  2. Site URL (url)
    • Levels: URL
    • Approach: Google using Veeral's code that allows us to search
  3. Whois Date (date_whois)
    • Levels: Date
    • Approach: Date active website was registered
  4. Address (address)
    • Levels: Will include all addresses
    • Approach: Google key terms (e.g. Contact Us) and URL using Veeral's code
  5. Nonprofit status (nonprofit_binary)
    • Levels: Binary variable indicating if the potential Hub is a nonprofit organization
    • Approach: http://www.guidestar.org/ is a site that we can use to search if a company is nonprofit or not
  6. Mission statement (missions_stmt)
    • Levels: Official mission statement or description of company (if mission does not exist)
    • Approach: If not explicitly stated mission statement, will include "About" or statements on main page
  7. Specific Industry (spec_industry)
    • Levels: Industry included in statement (no aggregation)
    • Approach: *Based on Mission Statement, not aggregated
  8. Price for a space/office (price_space)
    • Levels: Two prices one for shared, other for private
    • Approach: Uses google methodology with key terms and URL