It’s not simple to define annotator compensation/incentive structures that optimize for

  1. High quality
  2. Quick iteration speeds
  3. Cost efficiency

Realistically, there will be trade-offs that come with optimizing for any of them.

We’ll jump into the reasoning, but we generally recommend an hourly pay model over a pay-per-task model, at least when starting out. While it can seem less cost-efficient, it’s much more effective at optimizing for data quality and iteration speeds. Many of the top labs we work with have adopted this model.

We think that hybrid incentive models do exist, and it’s perfectly OK to still establish expectations even with an hourly model.

We’re consistently seeing data quality, followed by time to data, as most important to researchers. Generative AI models trained with RLHF are particularly sensitive to bad data inputs, much more so than traditional computer vision or natural language models.

The bottom line is that a 10% reduction in per-SFT pair cost is not comparable to the cost of having to retrain your model due to low-quality data.

Pay per task

In a “pay per task” model, vendors charge a fixed price per work to be done, for example, per prompt-response pair created.

  • ✅ Pro: Vendor efficiency - Vendors are incentivized to develop processes and tooling to reduce their costs, and in an ideal world, these savings are then passed on to you, the customer.
  • ✅ Pro: Easy math - You’ll know exactly what each data point costs which seems like it would make the life of a procurement team negotiating with and comparing vendors easier, at least until you spend weeks calibrating what that right price per task is, more on that below.
  • ❌ Con: Fast as possible - Annotators are incentivized to complete as many tasks as possible to maximize their take-home earnings. This often looks like taking shortcuts, speeding through instructions, and doing the minimum amount of work required to avoid disciplinary action.
  • ❌ Con: Skewed distribution - Annotators will always be incentivized to choose easier and shorter tasks, so you will end up with a skewed distribution of task complexity. Even if you set different rates based on complexity, it’s not easy to maintain. Practically, this can look like annotators “skipping” the tasks in favor of ones that seem easier.
  • ❌ Con: Micromanagement - Instructions and the work to be done will constantly be in flux, directly impacting the time per task. Per-task pay rates will also need to change to calibrate to the change in work to be done. Without doing this calibration, annotators can feel disincentivized to participate in the new work to be done. Left unbalanced for long enough, you end up with the data annotation pay horror stories that make the rounds in the news.
  • ❌ Con: Slow pricing negotiations - For vendors that do per-task pay, they need to identify what the price per task is, this involves estimating the length of time the task will take, how many reviews or touches the task will get, and what their margin will look like under different scenarios. It can take multiple weeks, a pilot, or more to get the quote correct, greatly reducing your research team’s iteration speed. The worst part is that your instructions are all but guaranteed to change within a month, and when they do, the cost per task will need to be re-priced all over again.
  • ❌ Con: Black box transparency - Because vendors are incentivized to produce the data in the most cost-effective way, there’s no incentive to be transparent about how the data is produced. Vendors are incentivized to find the cheapest labor to produce the output, do the bare minimum review, and keep secret how the annotators are being managed and what the communication to them looks like.

In summary, the per-task model incentives cost efficiency, at the expense of data quality and time to data.

Hourly pay

In an hourly pay model, you define hourly rates for annotators (generally based on their domain and experience) and pay them for each hour worked. Tools such as Hubstaff are often used to track time spent, as well as provide accountability for how the work was produced.

  • ✅ Pro: Flexible work and quick iteration - The work to be done can change without needing to re-calculate the price per task. Running multiple small experiments, or breaking annotators into groups that do the work slightly differently can all happen seamlessly.
  • ✅ Pro: “Open”-box transparency - Vendors are incentivized to be transparent about designing the review process and coverage percentage that best works for you, the customer.
  • ✅ Pro: Data representation - The incentive to pick only easy and fast tasks is largely gone, annotators can specialize in doing specific types of tasks without worrying about being paid less if that type takes longer. There may be slight pressure still depending on how you measure annotator performance.
  • ✅ Pro: Talent attraction - In a “cost-plus” model, the model Mercor uses, the vendor and client together decide what the right level of compensation is for the talent, and the vendor takes a fixed percentage of that. If highly skilled and compensated annotators are required, there is no incentive to take shortcuts or cut corners that could compromise the skill bar.
  • ❌ Con: Annotator inefficiency - The most obvious downside is that annotators will not be incentivized to be efficient with their time. We think there are several practical ways of managing this, however.
    • Work with a high-quality team of annotators, with visibility into backgrounds. Experts in their field are less likely to be dishonest to optimize for a little more money.
    • Ensure annotators are excited and fulfilled by their work. If people are generally happy and believe their work is meaningful, they are far more likely to be efficient. Culture and team energy produce outsized returns.
    • Monitor time efficiency for each annotator, and manage annotators accordingly if there are obvious outliers.
  • ❌ Con: Over-qualified talent - Vendors can be incentivized to recommend talent that is over-qualified or at a higher pay band than what was strictly required by the market to support.

Hybrid models

You can also adopt a hybrid approach to mitigate the downsides of your selected model, although this will come with some higher overhead.

Example modifications could include:

  • Per-task bonuses: incentivize higher volume. Combine this with hourly pay to indirectly mitigate the incentive to sit on tasks. Can incentivize speeding through tasks, so could be combined with a quality requirement.
  • Set max paid time per task: disincentivize excess time per task. Combine this with hourly pay to directly reduce the incentive to sit on tasks. Disincentivizes taking on complex tasks which would take longer than the cap.
  • Modifying per-task pay based on quality: incentivize higher quality with a per-task pay model. Requires defining an objective quality rating, which will likely lead to disputes by annotators and more overhead.
  • Modifying per-task pay based on complexity: improve cost efficiency per-task if done correctly. Requires determining complexity, which is non-trivial. Also will be less transparent to annotators, leading to disputes.
  • Bonuses based on hours targets: incentivizes doing a minimum amount of work each week. Can be less cost-efficient based on how it is set.

Worker classification

For most human data projects, we recommend hiring annotators as contractors. The major differences between employees and contractors in the United States (can vary by country):

  • Employees : Subject to more control over how their work is performed. Often receive benefits such as health insurance, retirement plans, and paid time off. Entitled to certain protections under labor laws, including minimum wage and overtime pay.
  • Independent Contractors : Have more autonomy over how they complete their tasks. They bring their own tools, set their own schedules, and may work for multiple clients. However, they do not receive the same benefits and protections as employees.

Since projects can be unpredictable in length, it usually makes sense to classify annotators as independent contractors. This means complying with requirements on autonomy over when and how contractors do their work, what they do with the rest of their time, and more.

Based on the incentive structure you choose, you can face different levels of worker misclassification risk. Mercor handles on your behalf, so you can avoid dealing with risks of misclassification, including financial penalties and legal action.