Mechanical Turk (Tool)
Mechanical Turk (Tool) | |
---|---|
Project Information | |
Project Title | |
Start Date | |
Deadline | |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
Description
The purpose of this page is to introduce people to the use of mechanical turk in data processing. The document is structured as follows: 1. It begins by describing the mechanical turk and the many ways in which it can be used. 2. it provides simple getting started instructions that allows a new user to access the mechanical turk system and begin a new project. 3. We give an example of a project with sample code.
What is Mechanical Turk
Mechanical Turk (Mturk) is a system that allows people to outsource work to many different people in an efficient way. For the purposes of the McNair center, we will be focused on the use of Mturk for the acquisition and cleaning of data. This is a great way to look up or clean data when you have a small number of easily understood steps that need to be repeated many times. If you data task fits this definition, then it is worth thinking about turning it into a Mturk task. In the example below, we think about how to find all the Twitter handles for a set of companies in a spreadsheet. If you were to do this by hand as an RA, you would start with the spreadsheet and go through each row searching on either google or twitter for each company. In Mturk, you would create a project. In that project, you would create a task template that would provide a set of overall instructions as well as hooks to fill in specific information about one row from your spreadsheet. When the turker receives their assignment, or HIT, they will see both the overall instructions and the specific information for that row of data in your spreadsheet. The Mturk system allows many people to work on your spreadsheet in parallel allowing the work to be completed much more quickly. If this is confusing, we will provide a concrete example below. For now, just bear in mind some essential vocabulary.
- Mechanical Turk Vocabulary
- Requester: the people posting work on the system
- HIT: one task completed by a worker
- Project: a collection of HITs
- Turker: a worker on Mturk
Accessing the Mechanical Turk Platform
- go to The Mechanical Turk Requester page
- Log into the system using the following
- email: esi@rice.edu
- pass: 9Million!
- To create a new project, click on the Create link and follow the directions in the Create Project Example section below
- To modify an existing project, follow the directions in the Modify an Existing Project section below
Creating a New Project Example
In the steps below, we describe the creation of a Turk project that asks Turk workers to find the twitter handles of companies. It will take as input a series of google search queries in csv form and the workers to enter the search strings into google and look to see if there are google handles that are returned on the first page of the search results.
Step 1, Project Info: Once you click on the create link, you will be brought to an interface with a number of text entry boxes. You want to summarize your project in ways that will be informative for the team as well as potential Turk workers choosing between projects. In the figure below, we describe a HIT Project FINISH.
Figure 1: Twitter Project Info
How to choose your data
Step 2, Choosing Pay Level: Once you have named the project, you have to decide on pay scale (Reward per assignment) and the number of people working on each project (number of assignments per HIT). The higher the pay per HIT, the quicker your work will be completed by turkers, but you obviously do not want to waste money. A good rule of thumb is to work on the tasks you need completed by turkers for 30-60 minutes and then see how many rows you completed. We want the per HIT pay rate to roughly equal $6.00 - $10.00 in hourly wage to get things done efficiently on the system. If you decide to have more than 1 worker per HIT, it will be because you believe that the data task requires a certain amount of human judgement and you want to make sure that you only accept results that have been "verified" by multiple people. The last three parameters in this box determine how each HIT will be completed by each worker and how long the HIT stays in the system. You generally want "Time Allotted" to be 1 day. Expiration of the HIT doesn't matter that much. One of the last important choice in this screen is the "Auto-approve" option. The quicker the auto approve, the more likely that Turkers will take your task. For now, set it to 24 hours, but remember that you are responsible for regularly auditing results when you have a project up on the Turk system.
Figure 2: Cost Parameters in Mturk
Existing HIT Library
create a list of existing hits and what they do
TDL with HITS
- Data validation using javascript
Hash
import requests response = requests.get( "https://www.eventbriteapi.com/v3/organizers/2300226659/events/", headers = { "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN", }, verify = True, )