How Does Captcha Work?

How does captcha work? Many IT security measures are not developed or deployed until you encounter a concrete problem. It was the same with Captcha. This security query was developed to put a stop to spam and the unwanted use of web services by automated bots and to clearly distinguish humans from machines. But as is so often the case with new methods, there are unintended side effects and dead ends.

In today’s digital world, online security has become an increasingly important concern for individuals and organizations alike. One way of safeguarding online security is through the use of CAPTCHAs. CAPTCHAs, or Completely Automated Public Turing tests to tell Computers and Humans Apart, are widely used to protect websites from automated spam and abuse.

But how exactly do CAPTCHAs work? This article will explore the basics of CAPTCHA technology and how it operates.

What is Captcha?

Captcha stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”. It is a type of challenge-response test used in computing to determine whether or not the user is human.

The most common type of Captcha requires the user to correctly identify and enter a sequence of distorted letters and/or numbers displayed on a computer screen. This task is easy for a human but difficult for a computer program or “bot” to perform accurately. Captchas are used to prevent automated software from performing actions that could be harmful, such as spamming websites, creating fake accounts, or stealing information.

How Does Captcha Work?

Everyone knows them and almost everyone has been annoyed by them because reading letter combinations out of convoluted patterns or counting tropical fruits on a presentation table is not everyone’s cup of tea. We are talking about captchas. The artificial word Captcha stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” and is based on the definition given by IT experts at Carnegie Mellon University in 2000.

This technique was intended to prevent the unwanted use of services, the flooding of discussion forums with comments or advertising, and the automatic creation of accounts. The assumption was that only a human user could correctly answer a variable question (challenge/response). But this assumption was quickly disproved, as clever software mechanisms, pattern collections, and advanced text and pattern recognition forced ever-better techniques.

  What is ISO 27002?

Thus a new front was created, where a small war has been raging for more than fifteen years! Against each other, the defenders of websites and services compete against the cybercriminals who want to automate abusive use – for example, to send spam or even to manipulate opinions and states. The result of this small-scale war is a constant evolution of new captcha mechanisms and agile tactics to bypass them in an automated way.

History of Captcha

Captcha was invented in the late 1990s by researchers at Carnegie Mellon University. The team, led by computer scientist Luis von Ahn, was looking for a way to prevent automated bots from creating fake accounts on Yahoo! and other online services.

The original Captcha consisted of a distorted image of letters and numbers that users had to manually enter. However, the team quickly realized that this approach was not scalable, as it would be too time-consuming to manually generate Captchas for every user.

To address this problem, the team developed a new type of Captcha called reCAPTCHA. reCAPTCHA uses a combination of distorted text and images from old books and newspapers to verify that the user is human. By doing so, reCAPTCHA not only provides a way to prevent bots from creating fake accounts but also helps to digitize old books and newspapers.

In 2009, Google acquired reCAPTCHA and began using it to improve its own services, such as Google Maps and Google Search. Today, Captcha is used on millions of websites and is an essential tool for protecting against automated attacks.

The limits of captchas

However, there are also limits to the complexity of captchas. Because if you exceed a certain level of complexity, human users can no longer solve you. Unfortunately, many captcha implementations are not barrier-free and thus make it difficult for user groups to recognize or exclude them from the outset. The option to have components read aloud for some captchas also helps only to a limited extent.

In 2008, image captchas were the de facto standard. Text bent, distorted with lines, in front of colored backgrounds, in different fonts and sizes, with variable disturbing objects dominating the scene. Also, the first content management systems actively support the user with ready-made solutions and especially with WordPress captcha plugins are still very popular. But the other side did not sleep either and implemented OCR adaptations as a means to read the camouflage texts correctly nevertheless.

  What is Perfect Forward Secrecy (PFS)?

In 2009, Google bought the company reCAPTCHA and set a new standard with their solution. The quality of the representation objects (or query objects) and deformations, combined with the simplicity of use, facilitated the effective use of these queries enormously. reCAPTCHA was co-founded by Luis von Ahn, the scientist who coined the term Captcha years before at the university.

Benefits of Using Captcha

There are several benefits of using Captcha, including:

  • Preventing automated attacks: Captcha is primarily used to prevent automated attacks, such as spamming, brute force attacks, and credential stuffing. By requiring users to prove that they are human, Captcha helps to ensure that only legitimate users can access the website or service.
  • Protecting user data: Captcha can help to protect user data by preventing automated bots from accessing sensitive information, such as passwords, credit card numbers, and personal data.
  • Enhancing user experience: Captcha can help to enhance the user experience by reducing the amount of spam and unwanted content on the website. By filtering out bots and automated attacks, Captcha can help to create a safer and more pleasant browsing experience for users.
  • Supporting machine learning: Some Captcha systems, such as reCAPTCHA, use machine learning algorithms to improve their accuracy over time. By using Captcha, website owners can help to train these algorithms and improve the accuracy of machine learning models.

Captcha is a simple yet effective tool for preventing automated attacks and protecting user data. While it may be an inconvenience for some users, it is an essential component of online security and helps to create a safer and more secure internet for everyone.

Some Drawbacks of Using Captcha

While Captcha provides many benefits, there are also some drawbacks to using it. Here are a few:

  • Accessibility: Captcha can be difficult or impossible for some users to solve, especially for those with disabilities, such as visual or hearing impairments. This can make it challenging for these users to access websites and services that use Captcha.
  • User frustration: Some users may find Captcha to be frustrating or time-consuming, especially if they are required to solve multiple Captchas in a short period of time.
  • False positives and negatives: Captcha is not always 100% accurate, and there is always a risk of false positives (when a legitimate user is blocked by Captcha) and false negatives (when a bot is able to bypass Captcha).
  • Security concerns: While Captcha can help to prevent automated attacks, it is not a foolproof solution. Sophisticated attackers can still find ways to bypass Captcha, such as by using machine learning algorithms or by outsourcing the task to human workers in low-wage countries.
  What is an Intrusion Prevention System (IPS)?

Captcha is a useful tool for preventing automated attacks, but it should be used in conjunction with other security measures, and website owners should be mindful of the potential accessibility issues and user frustration that it may cause.

Captcha vs Other Alternatives

Captcha Honeypot 2FA (Two-Factor Authentication)
Use Prevent automated attacks Prevent automated attacks Verify user identity
Pros Widely used and effective Easy to implement Provides additional layer of security
Offers accessibility options for users with disabilities Doesn’t interfere with user experience Can be integrated with existing login systems
Can be customized for specific needs Doesn’t require user interaction Can use a variety of authentication methods
Cons Can be frustrating for users Not as effective against more sophisticated attacks Can be difficult for some users to set up
Potential for false positives and negatives Limited effectiveness against human attackers May require additional hardware or software
Can be bypassed by advanced bots and AI Can be less secure if the second factor is lost
  • Captcha: Captcha is a widely used and effective tool for preventing automated attacks. It requires users to solve a challenge, such as identifying distorted letters and numbers, to prove that they are human. However, it can be frustrating for some users and has a potential for false positives and negatives. Additionally, it can be bypassed by advanced bots and AI.
  • Honeypot: A honeypot is a technique that involves placing a hidden field on a form that is invisible to human users but visible to bots. If the field is filled out, the submission is blocked, as it is assumed to be from a bot. Honeypots are easy to implement and don’t interfere with user experience, but they are not as effective against more sophisticated attacks.
  • 2FA (Two-Factor Authentication): 2FA is a security method that requires users to provide two forms of authentication, such as a password and a verification code, to access a service. It provides an additional layer of security and can be integrated with existing login systems. However, it can be difficult for some users to set up and may require additional hardware or software. Additionally, it can be less secure if the second factor is lost. 2FA is not designed to prevent automated attacks, but rather to verify user identity.
  What Is Smishing?

ASIRRA

In the same period, Microsoft presented its alternative to Captchas, the ASIRRA system. Based on an image database with millions of images, the user has to recognize dogs and cats to authenticate himself. Along the way, it also presented a stray animal for which a new home was being sought. As a stumbling block for an automated system, a time limit was built in to prevent a computerized image analysis (initial 30 sec. to solve). Despite the social effect, the system did not catch on. Stanford University studied the attack ability of the algorithm and was able to successfully attack verification control with high probability, thanks to automated learning. The service was then discontinued by Microsoft in 2014.

reCAPTCHA

Google’s solution followed the chosen path and asked the user to verify cleverly prepared screen inputs. To support the web user the following options were implemented:

  • Reading the text aloud
  • Requesting a new captcha
  • Help function

Five years after the purchase of reCAPTCHA, the tool was subjected to risk analysis. Based on these findings, the captcha solution was revised and the new “No CAPTCHA reCAPTCHA” is offered to the user. The modus operandi is very user-friendly because the tool analyzes the user and decides for itself whether it is a human or a bot. Ideally, the user does not have to do anything.

However, if the new reCAPTCHA is unable to determine with certainty whether the user is a human or a bot, a graphical query is performed downstream. Here, one must either identify the image sections from superimposed images, which are divided into sections, where, for example, a street sign is visible (or animals of a race or species). Or one receives the known text elements for identification, which are occupied with disturbing objects (text deformation etc.).

Google is silent about the mechanisms that the new reCAPTCHA uses to identify the user as a human being. However, one may assume that it is a combination of various elements, such as:

  • Elements of the cache memory
  • Mouse movements
  • Number of saved favorites
  • Browser user agent
  • Current web history
  • System data (operating system, variant, configuration level)
  • Local settings, etc.
  What Is a Data Protection Impact Assessment?

But no matter how clever a mechanism is, there will always be attempts to break or circumvent it. In 2016, at the prestigious Black Hat conference, the presentation “I’m not a human: Breaking the Google reCAPTCHA” showed how a successful attack on the algorithm could work.

Researchers Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis were able to use an automated mechanism to earn around $110 per day, adding $2 for every 1000 queries solved. However, since the researchers handed over their records to Google, the attack model presented at the time will no longer work today but will have been incorporated into the optimization of the algorithm.

The new reCAPTCHA is offered via an API and, thanks to Google’s support, is easy to integrate. Corresponding instructions and examples can be found on the Google website. The use of reCAPTCHA is free, but you need an API key pair, which you have to request from Google.

Alternatives

Although Google’s reCAPTCHA dominates the market, there are numerous other solutions to verify access to services. One solution is FunCaptcha, where the user has to solve small games, for example, objects have to be rotated, fields have to be moved or positions have to be confirmed (which image is vertical). The advantage or the security component of FunCaptcha is that OCR cannot be used to break the verification. Another complicating factor is that the mouse must be used to solve the captcha.

Whether one relies on a honeypot captcha, in which, for example, a text field is displayed on the web page that does not have to be filled in (bots give themselves away by also filling these fields with content).

Alternatively, it is also possible to integrate an audio captcha, where an input is requested acoustically, or a logic captcha (How many legs do cows have; What color is the snow) – each system has advantages and disadvantages. Ultimately, the solution used will always depend on the implementation effort and usability – as well as the quality achieved in allowing humans and blocking out bots.

Some Common Misconceptions About Captcha

Here are some common misconceptions about Captcha:

  • Captcha is foolproof and cannot be bypassed: While Captcha is an effective tool for preventing automated attacks, it is not foolproof and can be bypassed by advanced bots and AI. Attackers can use techniques such as machine learning and crowdsourcing to solve Captcha challenges.
  • Captcha is always difficult for users to solve: While some Captcha challenges can be difficult for some users, such as those with visual or hearing impairments, there are accessibility options available, such as audio challenges and the ability to request a new challenge.
  • Captcha only uses distorted text: While Captcha challenges often involve distorted text, there are many other types of challenges, such as identifying images or solving math problems.
  • Captcha is only used for security purposes: While Captcha is primarily used for security purposes, such as preventing spam and automated attacks, it can also be used for other purposes, such as improving machine learning algorithms and conducting online surveys.
  • Captcha is a waste of time and resources: While Captcha may be frustrating for some users and require additional resources for website owners to implement, it is an important tool for preventing automated attacks and protecting user data. Without Captcha, websites would be vulnerable to spam, brute force attacks, and credential stuffing, which can compromise user data and cause other issues.
  What is Open Source Intelligence (OSINT)?

Future prospects

One hundred percent certainty in verifying whether a human actor or a bot is visiting the website will never be achieved. Specialized tools can already bypass the checks or will again find ways to break the verification mechanisms in the near future. Most recently, the tool unCAPTCHA was able to detect reCAPTCHA at a rate of 85 percent. Similarly, there are reports of a bot that is able to break well over 50 percent of the various captcha methods based on artificial intelligence.

The best protection is currently to periodically change the verification system in order to constantly demand new adjustments from the bot developers. Even today, one may doubt that programs are really able to break graphical captchas, as the examples in the tweet below by security guru Mikko Hyppönen show. Of course, it is also clear that the best captcha protection is of no use if the spammers rely on human helpers from low-wage countries.