Have you ever wondered how we measure Crowd performance? The first measure you probably think of is a researcher’s Rank, which is based on Kudos points.
Kudos points are intended to recognize researchers for their valid vulnerability reports, independent of monetary or swag prizes associated with the bounty program. The more severe the vulnerability impact, the greater the points awarded (from 5 to 20). We also award 1-2 kudos points for duplicate submissions of valid vulnerabilities, since the research work is still appreciated even if it wasn’t first to arrive.
The purpose of Kudos points is good, however a researcher’s quantity of valid submissions is not the only valuable measure of performance. Kudos points reflect vulnerability impact and cumulatively represent a researcher’s submission volume, but don’t reflect a researcher’s signal-to-noise ratio.
Accuracy: In October 2014 Bugcrowd introduced a new quality metric for researchers called Accuracy. This measures the ability of a researcher to identify and report vulnerabilities that are marked valid and eventually fixed, and is weighted towards more recent submissions to ensure that researchers performance is being evaluated on current activity and not submissions they may have made months prior. Researchers who learn and improve over time will see that reflected in their Accuracy score fairly quickly. When Accuracy was first introduced, the average accuracy across all researchers was 26.47%. This indicates that 26% of non-duplicate submissions were marked valid and prioritized for development.
As the Director of Researcher Operations, my job is to evaluate the Crowd and determine how to achieve desired results for researchers, customers, and Bugcrowd. I spent some time looking at the 26% Accuracy rate and exploring how to bring that up, while also looking at our ranking system and thinking about its effectiveness. I went to the data, and learned a lot about the Crowd.
What follows is a long blog post detailing changes we are making to improve our Crowd reputation measures. The summary is that we are changing Kudos points allocations, replacing Accuracy with Acceptance Rate, and adding Average Submission Priority to researcher profiles. While we are announcing multiple changes, we are confident based on our testing over the last few months that these new measures more accurately reflect the performance of our Crowd members. Read on if you are interested in the nitty gritty details of each change...
How to better measure Crowd performance?
As detailed in the Bugcrowd State of Bug Bounty Report, public bounty programs on the Bugcrowd platform currently have an 18% signal-to-noise ratio, with 39.5% submissions marked duplicate and 34.5% marked invalid. Contrast that with a whopping 36.1% of submissions marked valid in invitation-only programs. That’s double the ratio of public programs with a further 32.1% marked duplicate. Only 26.3% of submissions are marked invalid as out-of-scope or unreproducible in invitation-only programs. Our job is to shift the balance, rewarding valid submissions and the researchers who report them, incentivizing researchers to do their best possible work.
To that end we have prototyped and tested new measures that more accurately reflect Crowd performance, both in respect to vulnerability impact and signal to noise rate. I'm pleased to announce these changes take effect today and will be applied to all vulnerability submissions received to date.
New Kudos calculations:
Adjusted rewards for valid findings, based on priority:
P1 - Critical - increasing from 20 points to 40 points
P2 - High - increasing from 15 points to 20 points
P3 - Moderate - no change, 10 points
P4 - Low - no change, 5 points
P5 - Won't Fix - decreasing from 2 points to 0 points
We are also adjusting the points awarded for duplicate findings. It is pretty rare that a P1 or P2 is duplicated because customers fix those extremely quickly. But when it does happen, awarding 2 points for the report is inadequate appreciation for the researcher’s effort. Therefore we are scaling duplicate points based on issue severity.
P1 - Critical - 10 points
P2 - High - 5 points
P3 - Moderate - 2 points
P4 - Low - 1 points
P5 - Won't Fix - 0 points
Those changes are great for encouraging signal, but what about decreasing noise? Out of scope submissions and submissions that do not successfully reproduce used to have neutral impact on a researcher’s reputation. That is changing to result in a minor penalty of -1 point. As always, researchers are encouraged to read the bounty brief carefully, contact firstname.lastname@example.org if they have scope questions, and validate their findings to eliminate false positives before submitting vulnerabilities.
Update Aug 19th, 2015:
There is a delay in the allocation of duplicate points, as they are tied to the priority of the first submission which may not yet be accepted by the customer. Once that bug is confirmed with a priority score, your duplicate points will be awarded. We are working on making that more clear in the platform.
Acceptance Rate (replaces Accuracy):
Acceptance Rate is best explained as a comparison of valid to invalid reports. For those that are interested in the details:
Let X = The count of all your valid and duplicate submissions, including P5 won’t-fix
Let Y = The total count of all your submissions, excluding any marked ‘not applicable’, have not yet been reviewed, or have only been triaged but not confirmed.
Acceptance Rate = (X / Y) * 100
It’s a simple ratio of all of your accepted submissions to date, versus all submissions you’ve ever made. We exclude ‘not applicable’ submissions, which are those that have been marked by us or a customer as having been made in genuine and well-intentioned error. (And obviously we don’t include submissions that haven’t been finalized yet!)
Average Submission Priority:
We are also publishing a new performance measure on a researcher's profile: their Average Submission Priority rate. Taken in context with a researcher’s rank and Acceptance Rate, this can help us better recognize outstanding researchers who consistently submit high impact vulnerabilities, but may be lower volume in their submissions. Average Submission Priority is also one of the factors we look at when determining who to invite to private bounty programs, so giving Crowd members feedback on the average priority of their submissions is important.
previously: Kudos points 318, Accuracy 76%
now: Kudos points 459, Acceptance Rate 90.91%, Average Submission Priority 2.86
These metrics better reflect the fact that Frans hits the target almost every time, and his submissions are typically high impact.
What is the impact to researchers? The Crowd and customers will see improved performance measures as a result of more appropriate kudos reward scaling, more representative Acceptance Rates, and visibility of Average Priorities.