Hubs: Hubs Data
Background
This page represents the work used for mechanical turks for the paper: Hubs (Academic Paper). As of Spring 2016, a list of potential Hubs with a set of characteristics was created. Many of these are not what will be defined as Hubs. We will be creating a scorecard to help subjectively define Hubs based on certain characteristics.
For more information on Mechanical Turks in general, see Mechanical Turk (Tool).
The main goal of the mechanical turk is to automate the collection of variables for potential hubs as much as possible. The key steps for the project are:
- Creating a comprehensive list of potential hubs
- Determining the best variables for the scorecard
- Building "filters" for automating the collection
- Running and auditing of the automation
- Collecting the remaining manual data
Variables to be Used
Current Complete List
As of Week of 7/11
- Onsite Venture Capital
- Assets Under Management
- Number
- Onsite Angel Investors
- Onsite Mentors
- Founding Date
- Site URL
- Office hours investors
- Office hours mentor/advisors
- Onsite temporary workshops
- Onsite mentors
- Networking Meetups
- Sponsors/Partners
- University
- Corporate
- Curriculum
- Onsite code school
- Alumni Network
- Nonprofit status
- Mission statement
- Specific Industry
- Price for a space
- Price for office
- Twitter activity
- Size (sqft)
- Size (# companies)
- Onsite accelerator
- Community membership??
- Franchise
- Multiple locations within city
Grouping of Variables
There are a few categories the majority of the variables fall under
Group 1: Low Hanging Fruit Variables in this group are very easy to find and automate.
- Price for a space + office
- Twitter Activity
- Founding Date
- URL
- Mission Statement
- Nonprofit
- Sponsors/Partners
- Specific Industry
Group 2: The Difficult to Find
There are certain variables where the information is not readily available online or difficult to find.
- Size (can try to find press releases)
Group 3: In Between 1 and 2
Variables that aren't too easy or difficult to find and automate.
- Onsite accelerator
- Alumni mentor---vs. other mentors???
Group 4: The Hard to Differentiate
The key property of this group is that there are several similar variables, which would be difficult for a turk to differentiate. In order to fix this, we will need to create filters akin to the DSM5 scorecard. See the below section.
- Onsite VC v. Angel Investors
- Onsite OH Investors v. mentors
- Onsite temporary workshops v. networking events
- Curriculum v. code school
Group 5: The Need further Discussion Before Collection
Variables that need to be developed more prior to collection.
- Franchise and multiple locations within a city
- Community Membership
Filters/Scorecard
General Approach
The Scorecard will be broken down into three main parts: description, characteristics, andTBD parts. The procedure for creating these will be as follows: the description will be determined, develop the characteristics after looking over examples, the creation of possible mechanical turks that have complete accuracy even if not comprehension (e.g. a task will that always guarantees that there is an onsite mentor that covers only 40% of firms, but never misspecifies the existence of mentors), and auditing of the results.
Example
Curriculum
- Desc: The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
- Characteristics:
- Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
- Code schools are for startup labor supply
- Active input into a current entrepreneurial endeavor
- e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors"
- Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be
- Has evidence of a integrated curriculum leading to a new compentance
- Has evidence of a set fixed start and end dates that last XXX long
- Is a session linked to others that regularly occurs
- Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
- TBD points
- Do we care about outsourcing?
- Potential Turk
Code School
- Desc: training programs that teach coding, data processing, webpage building and other technical skills.
- Characteristics:
- Target group are the developers or people who want to join the startups but not the founders themselves
- Scheduled classes, not a one time meeting (as opposed to workshops)
Temporary Workshops
- Desc:a discussion/learning of a group of people on specific subjects
- Characteristic:
- One time
- Have a topic/subject/goal
- e.g. learn to code workshop: Java script 101
Additional Resources
- Mechanical Turk (Tool)
- Veeral has created a google automating procedure for different lists
Work in Progress
Goals for WIP
- For GROUP 1, creation of mechanical turk steps:
- EXAMPLE:
- Twitter Activity
- STATUS: Complete/In Progress/Not Started
- Previously Collected: Yes/No
- Published on Mechanical Turk: Yes/No
- Audited: Yes/No
- Updates:
- Code:
- For GROUP 4:
- Scorecard Example
- Potential Mechanical Turk Steps (e.g. if specific organization is on website)
- Mechanical Turk Example (GROUP 1)
- Add Comments on:
- How much manual work remains/What is missing
- Any remaining difficulties
- For GROUPS 2 and 3:
- Brainstorm potential ways to find data
- Follow Steps in Group1
Steps Needed to Complete
- Establish automation process for Groups 1-3
- Status (7/19):
- Begin Date: Started
- Reach Goal: Complete By Friday 7/22
- Differentiate variables in Group 4
- Status (7/19):
- Begin Date: Started
- Reach Goal: Complete by Wednesday 7/27
- Test processes and audit
- Status (7/19): NS
- Begin Date: Thursday 7/21
- Reach Goal: Complete by Friday 7/28
- Have a comprehensive list of potential hubs
- Status (7/19): NS
- Begin Date: Thursday 7/21
- Reach Goal: Complete by Tuesday 7/26
- Fill in Remaining Data Manually
- Status: NS
- Begin Date: Monday 7/25
- Reach Goal: Complete by Friday 7/29
Actual WIP
Group 1
- Twitter Activity
- STATUS: Complete
- Previously Collected: YES/NO - Recorded 2/1/0 to represent activity level, but not same as we are
- Published on Mechanical Turk: Yes
- AUDITED: Yes
- Audit Results: Comparing to 30 that manually done, for twitter handle, all 3 turkers agreed with our results 81% of the time, but at least 2 turkers agreed with our results 98% (the exception was a company that had several twitter handles based on location). Results were 52% and 89% respectively.
- UPDATES:
- UPDATE (7/14): Updated turk to reflect our desired formats
- UPDATE (7/12): Audited
- UPDATE (7/11): uploaded and published on amazon's mechanical turk site. Given the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet. Using a sample of ~10 companies, We noticed minimal differences in data observations among using 10,20, and 30 tweets.
- CODE
- Copy the text in the Search Text into a search engine.
- Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs
- Record the company's Twitter Handle into Twitter Handle
- Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE.
- URL
- STATUS: In Progress
- Previously Collected: YES
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/18): Code written, expected time for each assignment is <15 seconds - pay rate, therefore, recommended $.04
- CODE
- Copy the text in the Search Text into a search engine.
- Record the URL of the first result in the following format www.___.__/ (e.g. if url is example.us/other, record www.example.us/)
- Mission Statement
- STATUS: In Progress
- Previously Collected: YES
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/18): Code written, expected time for each assignment is 20-30 seconds - pay rate, therefore, recommended $.08
- CODE
- Copy the text in the Search Text 1 into a search engine (will include site:__ from Company's URL).
- Click on first link that is a subsection (e.g. "Mission", "About") from company's website (see Company's URL)
- If this does not exist, repeat steps 1 and 2 with Search Text 2
- If this does not exist, got to Company's URL
- Record the main text on the page up to five paragraphs (some of these will be a single line). Do NOT record subsections.
- If locating the main text in the prior step is unclear, record "Unclear"
- If no text exists, record "DNE"
- Nonprofit
- STATUS: In Progress
- Previously Collected: NO
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- REQUIRES ADDITIONAL STEPS: YES (need to double check results)
- UPDATES:
- UPDATE (7/19): Code written, code 2 of 2 is believed to be more accurate and efficient. Expected time to complete is 15 seconds - pay rate, therefore, recommended $.04
- CODE 1 of 2
- Go to Company's URL.
- Go to links (sometimes will be sections of the URL page) that describe the company, usually they are labelled: 'About', 'Our Story,' 'Mission'.
- If none of these exist, record DNE for PAGES
- Look for the word 'profit'/'nonprofit'/'non-profit'/'not-for-profit' (with or without -)
- If any of the key words exist is identified, record as 1, otherwise 0 for EXISTS (1/0).
- If it is marked as 1, record all sentences that the word is found in under SENTENCES.
- If the links do exist, record the name of the link under PAGES
- Repeat steps 4, 5, and 6 on the pages that were linked.
- CODE 2 of 2
- Copy the text from Search Text into the search bar at http://www.guidestar.org/.
- Record all Organization Names that appear
- If no results appear, record DNE
- Price for a space + office
- STATUS: Not Started
- Previously Collected: YES
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/_): TBD
- CODE
- Copy the text in the Search Text into a search engine.
- TBD
- Founding Date
- STATUS: Not Started
- Previously Collected: YES, but only year
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/_): TBD
- CODE
- Copy the text in the Search Text into a search engine.
- TBD
- Sponsors/Partners
- STATUS: Not Started
- Previously Collected: NO
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/_): TBD
- CODE
- Copy the text in the Search Text into a search engine.
- TBD
- Specific Industry
- STATUS: Not Started
- Previously Collected: YES/NO, based on LinkedIn identifier
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/_): TBD
- CODE
- Copy the text in the Search Text into a search engine.
- TBD
Group 2
- Size
- BRAINSTORM: (7/19) 1), 2), 3): search allintext: sqft/square foot/square feet site: company URL. 4) Company Name, city, square feet and then choose frist result. Process might be easier (and cheaper) if Veeral runs code firstto eliminate a bunch of 0 result returned.
- STATUS: In Progress
- Previously Collected: YES/NO, many missing
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/19): Brainstorm updated
- CODE
- Copy the text in the Search Text 1 into a search engine.
- Record DNE if 0 results returned in SEARCH 1
- If there is a result, click first link in which result search text appears and record the sentence in which the text appears in SEARCH 1
- Repeat Steps 1-3 for Search Text 2 and 3 and record in respective SEARCH 2 and SEARCH 3 respectively
Group 3
- Mentors
- BRAINSTORM: Current form of this variable seems to be too general.
- STATUS: In Progress
- Previously Collected: NO
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/19): Two possible codes written. First one requires more manual work
- CODE 1 of 2
- Go to Company URL
- Look for links related to mentorship such as 'mentors', 'mentorship' or 'mentoring programs'.
- If the key words can be identified, record 1 in BINARY, copy the sentence it is included in SENTENCE, and record urlhome in PAGE.
- If there is no explicit 'mentoring' section, look for links related to a description of the company, such as: 'About,' 'Our Team,' 'Our Mission,' etc., and look for a subsection or mention of mentor/mentorship/mentoring.
- If these exist, record 1 in BINARY, copy the sentence it is included in SENTENCE, and record the link name clicked in PAGE.
- If not, go to links related to membership 'benefits,' 'perks,' or related and repeat Step 5.
- If none of these steps result in a mark of 1, mark as 0.
- CODE 2 of 2
- Copy Search Text into search engine
- Mark as 1 if reliable site is populated, 0 otherwise
- Onsite Accelerator
- BRAINSTORM:
- STATUS: Not Started
- Previously Collected: YES/NO, only a binary variable
- Published on Mechanical Turk: NO
- AUDITED: NO
- Audit Results: TBD
- UPDATES:
- UPDATE (7/_): TBD
- CODE
- Copy the text in the Search Text into a search engine.
- TBD
Group 4
Curriculum and Code School
Curriculum
- Desc: The potential hub provides training programs for the founders of startups that might have human capital deficits that will lead to them not being about to adequately implement their ideas.
- Characteristics:
- Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
- Code schools are for startup labor supply
- Active input into a current entrepreneurial endeavor
- e.g. " The program is designed to augment and support the real-life business experiences that the students are facing every day in their entrepreneurial endeavors"
- Not an ad hoc session, not a one time meeting but a full "course", evidence of this could be
- Has evidence of a integrated curriculum leading to a new compentance
- Has evidence of a set fixed start and end dates that last XXX long
- Caltivate leadership for entrepreneurs
- Tagged "Business" as opposed to 'Tech' or 'Design'
- Is a session linked to others that regularly occurs
- Education that is for a founder (as opposed to code schools which can be for people who just want to join a startup)
- TBD points
- Do we care about outsourcing?
- "Potential Turks"
- Google "Fullbridge" site:URL
- Google "Fullbridge" site:URL
Code School
- Desc: training programs that teach coding, data processing, webpage building and other technical skills.
- Characteristics:
- Bootcamps
- Target group are the developers or people who want to join the startups but not the founders themselves
- Scheduled classes, not a one time meeting (as opposed to workshops)
- TBD points
- "Potential Turks"
- Google "General Assembly" site:URL
- Anyone Can Learn to Code
- Umbraco
- Designation
- Boise CodeWorks
- Grand Circus
- DevMountain
- Silicon Valley Data Academy
- Academy Pittsburgh
- Google "General Assembly" site:URL
Onsite VC v. Angel Investors
Onsite OH Investors v. mentors
Onsite temporary workshops v. networking events
Companies Used for Auditing/etc.
Capital Factory, Austin 1871, Chicago Rocket Space, San Francisco 1776, Washington D.C. Betamore, Baltimore Packard Place, Charlotte The venture Center, Little Rock GSV Labs, San Francisco The Hive, Palo Alto Innovation Pavilion, Denver OSC Tech Lab, Akron Speakeasy, Indianapolis Riverside.io, Riverside The Salt Mines, Columbus InNEVation, Las Vegas 804 RVA Impact Hub, Salt Lake Awesome Inc, Louisville Geekdom, San Antonio Alloy26, Pittsburg ReSET, Hartford Ansir Innovation Center, San Diego Domistation, Tallahassee Atlanta Tech Village, Atlanta Spark Labs, New York
Completed Work
OLD1
We will be creating a "Hubs scorecard" to determine how hub-like potential spaces are. In order to do so, we will evaluate the places based on certain variables. Previous variables for potential hubs were collected. Below, we list those as well as other variables we think might be helpful to build out the scorecard.
Ideally, we would have the following variables (not collected previously):
- Onsite VC/Angel/Investors (Count or binary)
- Comments:
- Mechanical Turk Comments:
- Onsite Mentors (binary) --- Are these the same as advisers?
- Comments:
- Mechanical Turk Comments:
- "Office hours" with investors or mentors (binary)
- Comments: Previously collected included number of events, but did not separate them into categories (e.g. networking events, workshops, etc.). We view this separation as important, BUT very difficult to collect
- Mechanical Turk Comments:
- Onsite temporary workshops (binary or count) *** see mechanical turk
- Comments:
- Mechanical Turk Comments:
- Networking Meetups (Binary or count) *** see mechanical turk
- Comments:
- Mechanical Turk Comments:
- Sponsors and Partners (binary and list) --- are these the same?
- Comments:
- Mechanical Turk Comments:
- Alumni Network (binary) --- not all potential hubslist this and the fact that some do might indicate its importance
- Comments:
- Mechanical Turk Comments:
- Num of Companies --- to help determine size as getting physical sqfootage is difficult
- Comments:
- Mechanical Turk Comments:
- Nonprofit (binary) --- helpful in determining goals of potential hubs
- Comments:
- Mechanical Turk Comments:
- Mission Includes Key Buzzwords (e.g. "ecosystem", "community") --- help separate simple coworking spaces form hubs
Example of Prior Variables Collected:
- Specific Industry -- defined as LinkedIN Self Identifier, no categories just plain text. We think what we really want is to see if they have a specialty (e.g. healthcare)
- Num of Events --- relatively complete inputs, but from March 2016 (see above as well)
- Price for Single Space --- defined as price for flexible desk, relatively complete inputs
- Price for Office --- no inputs
- Twitter Activity (Multinomial or Count) --- High=2/Moderate=1/No=0, no explanations on how to categorize the activity. Also no handles
- Size (sqft) --- no records for majority of the companies
- Num Conference Rooms --- no records for majority of the companies
- Onsite accelerator (binary) --- relatively complete inputs
- Onsite code school (binary) --- relatively complete inputs
- Community Membership (binary) --- relatively complete inputs
OLD2
- Twitter activity:
UPDATE (7/14): Updated turk to reflect our desired formats UPDATE (7/12): AUDIT RESULTS: We noticed
UPDATE (7/11): uploaded and published on amazon's mechanical turk site. Given the time cost to either record number of tweets in a month or look up more than 10 tweets, we decided to record the date of the last 10th tweet. Using a sample of ~10 companies, We noticed minimal differences in data observations among using 10,20, and 30 tweets.
- Copy the text in the Search Text into a search engine.
- Click on result from twitter.com with the company name. If the link does not appear on the first 3 pages, record DNE for both outputs
- Record the company's Twitter Handle into Twitter Handle
- Record the date (MM/DD/YY) of that tweet for Twitter Activity. If there are less than 10 tweets, record DNE.
- NUMBER OF EVENTS: UPDATE: written, not published, on amazon's mechanical turk site
Considerations
- Difficulties Encountered:
- Expected Time to Complete:
- Expectation of Results (accuracy of turk, comprehensiveness):
- Other Comments:
Procedure
- Copy the text in the Search Text into a search engine.
- Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
- Look for links related to events, such as 'Events' or 'Calendar' on the homepage.
- If not found on the homepage, check 'About' and check 'Community'
- Count the number of events in July 2016 and record it. If there is no information of events on the website, record DNE.
Note***: Events include meetups, workshops, info sessions etc. We do not want to count them separately since it is difficult to do so. Most companies put all the events on the same section and do not put event types in the titles of the events. We have to look into the details of the events to find out the type and even we do so some events descriptions do not allow us to determine the type easily. Differentiating the types of the events demands more time and effort and therefore is not suitable to be a mechanical turk project.
- Onsite Mentors: UPDATE: written, not published, on amazon's mechanical turk site
- Copy the text in the Search Text into a search engine.
- Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
- Look for links related to mentorship such as 'mentors', 'mentorship' or 'mentoring programs'
- If the key words can be identified, mark as 1
- If there is no explicit 'mentoring' section, look for links related to a description of the company, such as: 'About,' 'Our Team,' 'Our Mission,' etc., look for a subsection or mention of mentor/mentorship/mentoring
- If these exist, mark as 1.
- If not, go to links related to membership 'benefits,' 'perks,' or related.
- Do same process as end of 4 and 5
- If there is no mention of mentorship in these sections, type the company, city, and 'mentoring' into a search engine. If a link to a reliable website (such as Desktime) appears and mentorship can be found in the description, mark as 1.
- If none of these steps result in a mark of 1, mark as 0
- Nonprofit: UPDATE: written, not published, on amazon's mechanical turk site
- Copy the text in the Search Text into a search engine.
- Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
- Go to links that describe the company, usually they are labelled: 'About', 'Our Story,' 'Mission'
- Look for the key word 'nonprofit'/'non-profit'
- If 'nonprofit' is identified, mark as 1, otherwise 0.
- Number of Members: UPDATE: written, not published, on amazon's mechanical turk site
- Copy the text in the Search Text into a search engine.
- Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
- Look for the link 'Members' or 'Residents', usually they are under the links 'Community', 'Membership', 'Our Space' or 'The Space'.
- Count the number of members
- If the link or section of 'Members' is not found, go the 'Community' and 'Coworking' and look for the description on number of startups/founders/members in the community. Record the number.
- If number of members cannot be identified using above steps, record DNE.
- Sponsors and Partners:UPDATE: written, not published, on amazon's mechanical turk site
- Copy the text in the Search Text into a search engine.
- Click on the result that is the website of the company. If there does not exist a listing on the first three pages, mark as DNE.
- Look for the link or mention of 'Sponsors' or 'Partners', many times of which is often under the section of 'About', 'Community', or related sections
- If sponsors or partners can be found mark as 1 and list them, otherwise mark as 0.