Hubs: Hubs Scorecard

From edegan.com
(Redirected from Hubs: Mechanical Turk)
Jump to navigation Jump to search

Hubs Pages

Background

This page represents the work used for creating the hubs data for the paper: Hubs (Academic Paper). As of Spring 2016, a list of potential Hubs with a set of characteristics was created. Many of these are not what will be defined as Hubs. We will be creating a scorecard to help subjectively define Hubs based on certain characteristics.

For more information on Mechanical Turks in general, see Mechanical Turk (Tool).

Our goal is to automate the collection of variables for potential hubs as much as possible. The key steps for the project are:

  1. Creating a comprehensive list of potential hubs (Complete)
    1. E:\McNair\Projects\Hubs\Raw Program List Contains 600 entities - vast majority are firmly not hubs (file pedigree unknown)
    2. E:\McNair\Projects\Hubs\Hubs Data - Contains 125 entities - many are not hubs (overlap with above file unknown, this file's pedigree from old Hubs project).
  2. Determining the best variables for the scorecard (Complete)
  3. Building "filters" for automating the collection (Complete)
  4. Running and auditing of the automation (In Progress)
    • See section 4.2
  5. Collecting the remaining manual data (next step)

Variables to be Used

Old variable list (see Hubs Data.xls) contains 18+3 variables. Overlap with new variable list is ~50%

Current Complete List

As of Week of 7/11

  1. Onsite Venture Capital
    • Assets Under Management
    • Number
  2. Onsite Angel Investors
  3. Onsite Mentors
  4. Founding Date
  5. Site URL
  6. Office hours investors
  7. Office hours mentor/advisors
  8. Onsite temporary workshops
  9. Networking Meetups
  10. Sponsors/Partners
    • University
    • Corporate
  11. Curriculum
  12. Onsite code school
  13. Alumni Network
  14. Nonprofit status
  15. Mission statement
  16. Specific Industry
  17. Price for a space
  18. Price for office
  19. Twitter activity
  20. Size (sqft)
  21. Size (# companies)
  22. Onsite accelerator
  23. Community membership??
  24. Franchise
  25. Multiple locations within city

Grouping of Variables

There are a few categories the majority of the variables fall under

Group 1: Low Hanging Fruit Variables in this group are very easy to find and automate.

  1. Twitter Activity
  2. URL
  3. Address
  4. Mission Statement
  5. Specific Industry
  6. Nonprofit
  7. Sponsors/Partners
  8. Price for a space + office
  9. Founding Date


Group 2: The Difficult to Find There are certain variables where the information is not readily available online or difficult to find.

  1. Size (can try to find press releases)


Group 3: In Between 1 and 2 Variables that aren't too easy or difficult to find and automate.

  1. Onsite accelerator
  2. Alumni mentor


Group 4: The Hard to Differentiate The key property of this group is that there are several similar variables, which would be difficult for a turk to differentiate. In order to fix this, we will need to create filters akin to the DSM5 scorecard. See the below section.

  1. Onsite VC v. Angel Investors
  2. Onsite OH Investors v. mentors
  3. Onsite temporary workshops v. networking events
  4. Curriculum v. code school
General Approach Group 4

The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g. a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results.


Group 5: The Need further Discussion Before Collection Variables that need to be developed more prior to collection.

  1. Franchise and multiple locations within a city
  2. Community Membership

Steps Needed to Complete

  1. Create Processes for Collecting Data
    • Status (7/27): Complete
  2. Have a comprehensive list of potential hubs
    • Status (7/27): Complete
  3. Test processes and audit
    • Status (7/27): In Progress (see section:)
  4. Fill in Remaining Data Manually
    • Status (7/27): NS

How to Code the Variables

Group 1

  1. Twitter Activity
    • UPDATE (7/20): Gunny has created a tool to do this process
  2. URL
    • UPDATE (7/22): Veeral has code to do this procedure (search company, city in google)
  3. Address
    • UPDATE (7/22): Code written. Difficulties occur with very large companies (e.g. Impact Hub). Will require Veeral's program, expected time for each assignment is 10-20 seconds - pay rate, therefore, recommended $.05
    • CODE
      1. Using Veeral's code, crossproduct allintext: (Group A) and site: (Group B), where Group A=Contact (high coverage), About Us, Find Us, Locations, Address, Group B= Company URLs.
  4. Mission Statement
    • UPDATE (7/18): Code written, expected time for each assignment is 20-30 seconds - pay rate, therefore, recommended $.08
    • CODE
      1. Copy the text in the Search Text 1 into a search engine (allintext: About/Mission site: from Company's URL).
      2. Click on first link that is a subsection (e.g. "Mission", "About") from company's website (see Company's URL)
      3. If this does not exist, repeat steps 1 and 2 with Search Text 2
      4. If this does not exist, got to Company's URL
      5. Record the main text on the page up to five paragraphs (some of these will be a single line). Do NOT record subsections.
      6. If locating the main text in the prior step is unclear, record "Unclear"
      7. If no text exists, record "DNE"
  5. Specific Industry
    • UPDATE (7/21): Given that most companies include their specialty in mission statement and difficulty to turk, we will manually check each mission statement and mark it accordingly.
  6. Nonprofit
  7. Sponsors/Partners
      • UPDATE (7/21): Code written, but may require additional manual work. Expected time to complete is 45 seconds due to a potential list of a lot of sponsors/partners - pay rate, therefore, recommended $.12.
    • CODE
      1. Choose first result from Search Text 1 and Search Text 2 (allintext: Sponsors/Partnerrs site:URL)
      2. Record all Sponsors from Search Text 1 into SPONSORS. If there does not exist a list or the link was for only 1 sponsor, record DNE.
      3. If any Sponsors from Search Text 1 include a University or College (will be listed in name), record them into UNIVERSITY SPONSORS
  8. Price for a space + office
    • CODE 1 of 2
      1. Go to company’s URL
      2. On the homepage, look for the section related to pricing. If pricing is not found in the homepage, look for the links ‘coworking’, ‘work space’, ‘membership’, ‘pricing’ ,‘join’ or ‘Apply for membership’ , and look for the pricing information under those links. If there is no price related section, record DNE for both ‘Flexible Desk’ and ‘Dedicated Desk’.
      3. If there is pricing information, look for the price of sharing space per month, often denoted as Shared/Flexible desk/non-dedicated desk, record the price at ‘Flexible Desk’. If the price is not found, record DNE.
      4. Look for the price of a dedicated desk per month, often denoted as Reserved/dedicated desk/private desk record the price at ‘Dedicated Desk’. If the price is not found, record DNE.
      5. If price information is not found and there is a ‘locations’ link, click on it and choose the first location of the list. Repeat step 3 -4.
    • CODE 2 of 2
      1. Keywords: 24/7 access, dedicated desk, pricing
      2. Google allintext:"keywords" site:URL
      3. TBD
  9. Founding Date
      • UPDATE (7/21): Difficulties observed when figuring out how to Turk this, have solution (whois.net)

Group 2

  1. Size (SQFT)
    • BRAINSTORM: (7/19) 1), 2), 3): search allintext: sqft/square foot/square feet site: company URL. 4) Company Name, city, square feet and then choose first result. Process might be easier (and cheaper) if Veeral runs code firstto eliminate a bunch of 0 result returned.
    • STATUS: In Progress
    • CODE
      1. Copy the text in the Search Text 1 into a search engine.
      2. Record DNE if 0 results returned in SEARCH 1
      3. If there is a result, click first link in which result search text appears and record the sentence in which the text appears in SEARCH 1
      4. Repeat Steps 1-3 for Search Text 2 and 3 and record in respective SEARCH 2 and SEARCH 3 respectively
  2. Size (# Companies)
    • BRAINSTORM: (7/22) Some companies don’t list all members but only selective ones. Some companies do not separate current members and alumni and goes like:"we have served more than 120 startups..."
    • CODE 1 of 2
      1. Go to Company URL
      2. Look for the link 'Members' or 'Residents', usually they are under the links 'Community', 'Membership', 'Our Space' or 'The Space'.
      3. Count the number of members
      4. If the link or section of 'Members' is not found, go the 'Community' and 'Coworking' and look for the description on number of startups/founders/members in the community. Record the number.
      5. If number of members cannot be identified using above steps, record DNE.
    • CODE 2 of 2
      1. Search allintitle:"Members/Startups/Residents/Villagers/Ventures" site:URL in Google.
      2. If no result found, record DNE.
      3. If there are results, go to the first result which is usually in the form like"Members - Company Name".
      4. If the result direct you to a page that lists the members of the company, count the number of companies and record the number.
      5. If the result direct you to a page that does not give information on number of members, record DNE.

Group 3

  1. Mentors
    • CODE 1 of 2
      1. Go to Company URL
      2. Look for links related to mentorship such as 'mentors', 'mentorship' or 'mentoring programs'.
      3. If the key words can be identified, record 1 in BINARY, copy the sentence it is included in SENTENCE, and record urlhome in PAGE.
      4. If there is no explicit 'mentoring' section, look for links related to a description of the company, such as: 'About,' 'Our Team,' 'Our Mission,' etc., and look for a subsection or mention of mentor/mentorship/mentoring.
      5. If these exist, record 1 in BINARY, copy the sentence it is included in SENTENCE, and record the link name clicked in PAGE.
      6. If not, go to links related to membership 'benefits,' 'perks,' or related and repeat Step 5.
      7. If none of these steps result in a mark of 1, mark as 0.
    • CODE 2 of 2
      1. Copy Search Text (Mentor/Mentorship) into search engine
      2. Mark as 1 if reliable site is populated, 0 otherwise
  2. Onsite Accelerator
      • UPDATE (7/21): Code written - 2nd part, while more manual, appears to have greater range. 2nd code would only require Veeral's code. 1st code expected completion time is 30 seconds.
    • CODE 1 of 2
      1. Go to company's URL
      2. Look for the link 'Accelerators' or 'Accelerating/Accelerator/Acceleration/Accelerate Programs'
      3. If accelerators are found, count the number of accelerators/accelerating programs and record the number. **or also copy the names of the accelerators?
      4. If accelerators are not found in step 1, go to the links 'Services' , 'Benefit', 'Resources', 'For Entrepreneurs', 'Startups' and look for the section of 'Accelerator/Accelerating Programs'
      5. If accelerators are found, count the number of accelerators/accelerating programs and record the number.
      6. If accelerators are not found, record 0.
    • CODE 2 of 2
      1. Search [allintitle:"accelerator"/"accelerate" site:URL] in Google
      2. Copy the titles of the results. **We have to scrutinize the titles ourselves to determine whether they are distinct onsite accelerators and record the number manually.
      3. If no result appears, record 0.

Group 4

Curriculum and Code School

Curriculum

  • Desc: The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
  • Characteristics:
    • Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
      • Code schools are for startup labor supply
    • Active input into a current entrepreneurial endeavor
      • e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors"
    • Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be
    • Has evidence of a integrated curriculum leading to a new compentance
    • Has evidence of a set fixed start and end dates that last XXX long
    • Cultivate leadership for entrepreneurs
    • Tagged "Business" as opposed to 'Tech' or 'Design'
    • Is a session linked to others that regularly occurs
  • TBD points
    • Do we care about outsourcing?
  • Potential Turks
    • Search Text: Fullbridge, leadership program, business academy, business course, aspiring entrepreneurs
  1. Google: "Search Text" site:URL
  2. Record 0 if no result returns.
  3. If there is a result, click first link in which result search text appears and record the sentence in which the text appears.

Code School

  • Desc: training programs that teach coding, data processing, webpage building and other technical skills.
  • Characteristics:
    • Bootcamps
    • Target group are the developers or people who want to join the startups but not the founders themselves
    • Scheduled classes, not a one time meeting (as opposed to workshops)
  • TBD points
  • Potential Turks
    • Search Text 1: website design, coding, web development, software, bootcamp
    • Search Text 2: General Assembly, Anyone Can Learn to Code, Umbraco, Designation, Boise CodeWorks, Grand Circus, DevMountain, Silicon Valley Data Academy, Academy Pittsburgh
  1. Google: "Search Text" site:URL
  2. Record 0 if no result returns.
  3. If there is a result, click first link in which result search text appears and record the sentence in which the text appears.


Onsite OH Investors v. Mentors

  • Potential Turks:
  1. Search allintext:"office hours" site:URL
  2. Mark office hours as 1 if there is a result, otherwise mark as 0.
  3. Click on the first five results
  4. On each of the five pages, search for two items:
    1. search for 'mentor'. (Ctrl + F) If 'mentor' appears in the description paragraph of office hours on any of the five pages, mark mentor OH as 1. Otherwise mark as DNE and copy the description paragraph of office hours of all five pages.
    2. search for 'fund'. (Ctrl + F) If 'fund' appears in the description paragraph of office hours on any of the five pages, mark investor OH as 1. Otherwise mark as DNE and copy the description paragraph of office hours of all five pages.

Onsite temporary workshops v. networking events

  • Turk for Both 1 of 2:
    1. Search the Search Text 1 (allintext: events site: URL) and choose link to "Events", "Calendar", or related. Record 'url' on SOURCE If this does not exist, go to Step 7
    2. For all events that have dates, copy the events from today's date to the following month into ALL EVENTS
    3. For all events that have Office Hours in the name, record the events in OFFICE HOURS. For all events that have summit, record the events in SUMMITS.
    4. For all events that are related to teaching or learning (e.g. contain "Training," "Seminar," "Class," "Learn," "Bootcamp," "Workshop," "Pitch Event", copy the name of the events into WORKSHOPS
    5. For all events that are related to scoial activities and networking (e.g. "Social," "Meet Up," "Breakfast"/"Lunch"/"Happy Hour", "Movie Night"/"Bowling"), copy the name of the events into NETWORKING. For all events that are unclear or did not fit into these descriptions
    6. If a message explicity says there are no events, mark as 0 for ALL EVENTS, OFFICE HOURS, SUMMITS, WORKSHOPS, and NETWORKING
    7. If this does not exist, search Search Text 2 (allintext: Company Name site: meetup.com) and click on the meetup.com for the company if it exists. If it does exist, record meetup on SOURCE. If not, go to step 9.
    8. Repeat Steps 2-6.
    9. If this does not exist, search Search Text 3 (allintext: Company Name site: eventbrite.com) and click on the eventbrite.com for the company if it exists. If it does exist, record eventbrite on SOURCE. If not, mark DNE for all variables.
    10. Repeat Steps 2-6.
  • Turk for Both 2 of 2:
  1. Go to Company URL
  2. Look for links related to events, such as 'Events' or 'Calendar' on the homepage.
  3. If not found on the homepage, check 'About' and check 'Community'
  4. Count the number of events from today's date to next months and record it in ALL EVENTS. If there is no information of events or dates of the events on the website, record DNE for all variables.
  5. For all events that have Office Hours in the name, count the number of events in OFFICE HOURS. For all events that have summit, count the number of the events in SUMMITS.
  6. For all events that are related to teaching or learning (e.g. contain "Training," "Seminar," "Class," "Learn," "Bootcamp," "Workshop," "Pitch Event", count the number of the events into WORKSHOPS
  7. For all events that are related to scoial activities and networking (e.g. "Social," "Meet Up," "Breakfast"/"Lunch"/"Happy Hour", "Movie Night"/"Bowling"), count the number of the events into NETWORKING

Onsite VC v. Angel Investors

  • Notes: Few companies have a section for their onsite VCs or angel investors. Even the company(Innovation Pavilion) that has Angel programs and VC programs does not conduct the programs by itself, but cooperate with external angel investors or VCs. Some companies have mentors or board members who are from VCs, but it does not mean they will invest in the member startups in those companies.

Group 5

  1. Multiple Locations
    • Addresses are included in Group 1, but still needs to be discussed
    • Getting

Generating the Data

All files can be found in the E:/Mcnair/Projects/Hubs/Searching
Recommended to select the CSV and Excel worksheets because there are many JSON files

There are generally 6 steps we need to do for each variable when creating the data table:

  • A good reference for this procedure is in the folder Address
  1. Run Veeral's Code on your search terms
    • A list of Companies Can be found in the file 'List of Companies'
    • Recommended to have the search file sorted by company (e.g. if searching 3 companies (A,B,C) with 2 search terms (S,T), recommend having your list as: A-S,A-T,B-S,etc.)
    • Procedure
      1. (Ariel or Veeral To Write)
  2. Check to see if output results are working properly
    • Recommend to do alt-d-f-f and choose only 1 and 2
    • Check at least 10 different companies and ensure desired result is in the results
  3. Clean table and format for Mechanical Turk
    • Ensure that mechanical turks are not getting error terms
    • We will likely use 1 row for each company and have specific headers that will allow for the inputs to be automatically populated (see Mechanical Turk (Tool)
    • You should also check to see if we need to find results manually
  4. Write the Turk on Amazon
  5. Run and audit the Turk
    • Randomly choose ~30 companies (can use the above) and compare results with the Turkers
    • Check for AT LEAST the following:
      • % similar to manual
      • DNEs
  6. Post Results in Hubs: Hubs Data Building

Completed Work

See Section 3 of Hubs (Academic Paper)