Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: May 5, 2023
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a security task verifying if the user is a human.
This tutorial will elaborate on how it works and what kind of protection it provides.
The CAPTCHA is an algorithm preventing applications from bots and various spam attacks. The most common type of CAPTCHA is a distorted image of text that users must correctly rewrite into a text field. The image is malformed in a way that only humans can correctly evaluate it and computer software should not. Below we can see an example of such a CAPTCHA:
Example applications of CAPTCHAs are:
We can see that CAPTCHA can be a useful tool. It increases applications and data security. Moreover, CAPTCHA isn’t always used for security reasons. In some cases, CAPTCHA can be a tool that teaches AI models. Such models can later be used for various purposes like digitalizing books, annotating images, and improving Google Maps.
Let’s see how CAPTCHA algorithms work.
The CAPTCHA algorithms are fully automated and reliable. They are often public but sometimes patent-covered. Making the CAPTCHA algorithm public proves and highlights its complexity. It shows that the algorithm can’t be bypassed by simply reverse engineering. Rather, cracking the algorithm would be a complex, AI-related solution.
The CAPTCHA algorithms rely on three abilities invariant recognition, segmentation, and context recognition. A human can use those abilities simultaneously to correctly and efficiently complete the CAPTCHA task. Let’s analyze them:
As we can see, solving a CAPTCHA is a more complex task than we can think. That’s because our brains are powerful and can handle it without much effort. Although, computers don’t possess such cognitive abilities. Therefore, solving CAPTCHA is very difficult to bypass by software.
Nowadays, there are different types of CAPTCHA available. In the beginning, there were only text-based CAPTCHAs. Text CAPTCHAs evolved into multiple subtypes, like:
The second type is image CAPTCHA. Typically, they present multiple images and challenge users to choose those matching a specific theme or contain a given object:
An example of CAPTCHA is an audio-based CAPTCHA. It’s usually combined with a text one. A text can be played as audio. Audio-based CAPTCHA was created especially for visually impaired people that could have problems with solving only text-based ones:
There can be other variants of CAPTCHA such as solving math problems or answering a question.
CAPTCHAs are criticized by a lot of Internet users. They have many disadvantages:
Therefore, Google invented a new generation of CAPTCHA called reCAPTCHA. It’s just a simple checkbox with the text “I’m not a robot” that the user needs to tick. Why is that so simple? The algorithm tracks the website’s visitor behavior. As Google relates, it monitors “the user’s entire engagement” on the website. When a user’s behavior is considered to be strange, the more difficult CAPTCHA problem is presented. Below, we can see the reCaptcha example:
Nowadays, CAPTCHA is still a commonly used protection against spam and bots. There exist some methods to bypass the CAPTCHA, e.g., outsourcing to paid services, machine learning-based attacks, or just insecure implementation of the CAPTCHA algorithm. Although, they are not powerful and effective enough to make CAPTCHA worthless. Therefore, if we want to provide an additional layer of security to the website CAPTCHA is still a good choice.