Reliability of CAPTCHA

Q) Can CAPTCHA (which is a system generated one) be cracked by the malicious spammers using bots(which is also a system) ?

Resp:- Let us assume that you are spammer who want to target a particular web application that is running CAPTCHA. You want your computer or bot to crack the CAPTCHA authentication by some means and fooling the application that your bot is a human. A bot or a system can do this in only two ways.

1) By expecting or guessing the pattern followed by the CAPTCHA generator used in the web application. It can be done only with your help because a computer cannot think on its own. Here you need to guess the pattern and then program the logic and inject it into your bot so that that bot can crack the next CAPTCHA with the logic provided by you.

2) By scanning the CAPTCHA pattern displayed on the screen and then retrieving the text from the scanned image and by entering the test into the provided box.This can be done by the robot on its own.

Case 1:- Let us consider that you decided to accomplish your task through first method. You need to sit in front of the web application and observe the CAPTCHA pattern it displayed. For instance the pattern given is “b2FkLo” then you need to hit “generate new pattern” to see the next pattern let us say it is “Jk1lPu” and you need to guess the relation between the two patterns. It is not possible to guess the relation with two patterns, so you need to check for more than 3 patterns. Lets say the patterns are as below.

b2FkLo : Jk1lPu : SgHkL9 : ERj8kL

Now from the above 4 patterns you need to arrive at a relation between all the patterns (This is similar to a reasoning pattern where we are asked to guess the fourth number when 3 numbers are given 2 : 4 : 6 : ?). Being a simple logic (adding a 2 to a previous number will give the number..This is the logic to solve the above simple pattern). Such logic must be found and guessed by you. Also there will be an internal relation between the 6 letters in each pattern which also must be guessed. This part of the work couldnot be done with your computer. So after guessing the logic you should program the logic and transfer it to the bot. Then the bot can be able to crack it. But guessing those critical relation between the patterns which include millions of permutations and combinations is absolutely impractical. So, its not possible to crack it this way (Yes!! you can crack the CAPTCHA if you can luckily guess the right pattern (or) Somebody leaks you the logic involved and that cannot happen).

Case 2:- Since first option is not a clever one, we go for the second one. This method can serve your purpose if the web application is running a poor CAPTCHA generator. The bot can scan the image(screenshot of the pattern) and separate the noise from image and fetch the text in the pattern and compare it with its dictionary. After arriving at the right pattern then the bot can enter the pattern into the box. But if you want to crack CAPTCHA which possess warped, intersecting, overlapped text then it is not possible for the bot to fetch the pattern even with high end image processing techniques that we use today (this is the reason human is superior than machine even today). Humans can predict pattern even with partial data but the bots need complete data to do a task as intended. So, with this sort of techniques it is not possible to crack the CAPTCHAs at this point of time. But with advanced Image processing techniques and Artificial intelligence in far future, bots can crack even complex CAPTCHAs.

You may think that all the CAPTCHA patterns will be generated in advance and can be stored in a database and retrieved whenever there is a need. This is not true.. If we do it that way it is as simple as hacking a web server. After hacking we can get the database of all the stored CAPTCHA patterns and can be easily brute forced by the computer. So, CAPTCHAs are not stored in the database and are generated randomly when there is a need.

So, the conclusion is that if a web application is running a decent CAPTCHA then it can bifurcate machine and human at an efficiency of more than 99% and the 1 % is given for some lucky predictions and complex imaging techniques.

This question was asked in studentdynasty.blogspot.in and this blog is for the queries for which your search engines couldnot pull out a satisfactory answer. If you have such queries direct those to above link