Search This Blog

20 January 2007

Reveal thy true Nature, Sir. Or Madame. Or Silicon Creature of Units & Cyphers. Precisely what sort of Sentient art thou?

The Turing Test was named for the mathematician and computer pioneer and visionary Alan Mathison Turing. If you ever get a chance, maybe at a summer theater/theatre festival, don't miss the play about Turing called "Breaking the Code."

Turing died in 1954. He'd been interviewed on BBC radio, but when he became posthumously famous in the 1970s (when it was revealed that during World War II he had led the team of UK cryptographers who broke the German Enigma code), the BBC had erased the interview to make more blank tape. No other recording of his voice was ever made.

Assuming your only contact with a chatty Entity is an alphabetic digital text link -- like this blog, for example -- just You and It typing back and forth -- the Turing Test tries to provide a clear and reliable answer to this question:

Are you Human,
or are you Software?

In Cyberspace, it's very important for Humans to know for certain whether they're talking to another Human or to a piece of Software.

Meanwhile, also in Cyberspace, it's very important for Software to know for certain whether it's talking to Other Software or to a Human. For example, the subspecies of Human known as Hackers try to pretend they're Software to get control of telephone and computer network infrastructures. The infrastructure is controlled by Software programmed only to obey other Software.

But it will also obey Humans who can effectively imitate Software.

After Turing's pioneering discussion, the next Funny to pop up in C-Space was Eliza, a program from the late 1980s that imitated the Psychotherapist side of a therapy session, where You (presumably a neurotic Human) type each response from Your Side. In about 60 lines of crude Grammar Recognition Logic code, probably in BASIC, Eliza sounded spookily and frighteningly like a $60-$100 per hour professional psychotherapist. Eliza authentically caused a lot of Hard Feelings among (presumably) a lot of Humans (mostly psychotherapists). It was so small, portable and controversial that I'll bet it still lingers on various websites. Perhaps not a lot of Humans were truly fooled by more than 6 sentences from Eliza, but it dramatized the fundamental ambiguity and slippery evasiveness of reliable answers to the question:

Are you Human,
or are you Software?

Beep. Hello Sailor. Beep.

Anyway, I learned a new Buzzword tonight: CAPTCHA. First I read it in Cyberspace, then I asked Wikipedia what the hell it meant.

CAPTCHA (the copyright on this Acronym is held by Carnegie Mellon University in Pittsburgh, Pennsylvania USA) stands for

Completely
Automated
Public
Turing Test to tell
Computers and
Humans
Apart


CAPTCHA dates from 2002. It's January 2007 now. Was I the last to know what CAPTCHA is? Leave A Comment.

One of these days, an impressionable adolescent is going to run away from home and fly 1800 miles to a strange city to keep a rendezvous with a piece of Software the child met on-line. I'll really want to see how CNN and Fox and Court TV straighten that one out.

As software gets better and more sophisticated and more Human-Friendly -- like the computer Software that speaks in a nice female voice to the crew of the Starship Enterprise -- will Humans start flirting with the Software and developing emotional attachments to programs and scripts and applets?

My HP computer on-screen Guide is a little animated cartoon Einstein who explains the HP system to me. He's been walking around the screens of a gazillion HP PCs for about 8 years. He's helpful and always friendly and has a synthetic cute-ish male human voice.

Have any Humans fallen in love with him? Do they click OPEN just to dialogue with him every day? Would they loan him their car or send him money if he asked?

Does cartoon Einstein flirt back?

++++++++++++
from Wikipedia, of course:
++++++++++++


CAPTCHA

It has been suggested that HEC (html) be merged into this article or section. (Discuss)

A CAPTCHA (an initialism for "Completely Automated Public Turing test to tell Computers and Humans Apart", trademarked by Carnegie Mellon University) is a type of challenge-response test used in computing to determine whether or not the user is human. The term was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper of Carnegie Mellon University, and John Langford of IBM. A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen.

[images]

This CAPTCHA of "smwm" obscures its message from computer interpretation by twisting the letters.
This CAPTCHA of "smwm" obscures its message from computer interpretation by twisting the letters.
This somewhat more sophisticated CAPTCHA of "wikipedia" adds more distortions as well as highlights, shadows, and random line segments to thwart edge detection.
This somewhat more sophisticated CAPTCHA of "wikipedia" adds more distortions as well as highlights, shadows, and random line segments to thwart edge detection.


Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test. This term is ambiguous because it could also mean a Turing test in which the participants are both attempting to prove they are the computer.

[edit] Origin

Since the early days of the Internet, users have wanted to make text illegible to computers. The first such people were hackers, posting about sensitive topics to online forums they thought were being automatically monitored for keywords. To circumvent such filters, they would replace a word with look-alike characters. HELLO could become h3ll0 or |-|3|_|_() or )-(3££0, as well as numerous other variants, such that a filter could not possibly detect all of them. This latter became known as leetspeak.

The first discussion of automated tests which distinguish humans from computers for the purpose of controlling access to web services appears in a 1996 manuscript of Moni Naor from the Weizmann Institute of Science, entitled "Verification of a human in the loop, or Identification via the Turing Test". Primitive CAPTCHAs seem to have been later developed in 1997 at AltaVista by Andrei Broder and his colleagues in order to prevent bots from adding URLs to their search engine. Looking for a way to make their images resistant to OCR attack, the team looked at the manual to their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR. In 2000, von Ahn and Blum developed and publicized the notion of a CAPTCHA, which included any program that can distinguish humans from computers. They invented multiple examples of CAPTCHAs, including the first CAPTCHAs to be widely used (at Yahoo!).

[edit] Applications

[image] Fragment of image included into a spam e-mail message. Text is obscured with colored streaks in an attempt to prevent OCR from extracting it accurately enough to be identified as a spam content.

CAPTCHAs are used to prevent bots from performing actions which might be used to make a profit on the part of the person running a bot. Most often, this relates to spam. For example, free email accounts (such as those provided by Google or Yahoo) can be used to send spam, so these sites use CAPTCHAs to prohibit bots from registering. Likewise, many sites which display email addresses could be used by spammers, so CAPTCHAs protect the addresses. Other spam related applications include CAPTCHAs to prevent blog comments, or accounts on other systems that might allow link spam (eg, Wikipedia).

CAPTCHAs are also used by sites that offer multimedia downloads, online polls. However, recently, spammers have taken advantage of the difficulty of the OCR by using images to hide their marketing content.

[edit] Characteristics

CAPTCHAs are by definition fully automated, requiring little human maintenance or intervention in administering the test. This has obvious benefits in cost and reliability.

By definition, the algorithm used to create the CAPTCHA must be made public, though it may be covered by a patent. This is done to demonstrate that breaking it requires the solution to a difficult problem in the field of artificial intelligence (AI) rather than just the discovery of the (secret) algorithm, which could be obtained through reverse engineering or other means.

[edit] Accessibility

CAPTCHAs based on reading text -- or other visual-perception tasks -- prevent blind or visually impaired users from accessing the protected resource. [1] However, CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA [2]. Other implementations do not require users to enter text, instead asking the user to pick images with common themes from a random selection [3].

For non-sighted users (for example blind users, or the color blind on a color-using test), visual CAPTCHAs present serious problems. Because CAPTCHAs are designed to be unreadable by machines, common assistive technology tools such as screen readers cannot interpret them. Since sites may use CAPTCHAs as part of the initial registration process, or even every login, this challenge can completely block access. In certain jurisdictions, site owners could become target of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with Section 508 in the United States. In other cases, those with sight difficulties can choose to identify a word being read to them.

While providing an audio CAPTCHA allows blind users to read the text, it still hinders those who are both visually and hearing impaired. According to sense.org.uk, about 4% of people over 60 in the UK have both vision and hearing impairments. There are about 23,000 people in the UK who have serious vision and hearing impairments. According to The National Technical Assistance Consortium for Children and Young Adults Who Are Deaf-Blind (NTAC), there were 9,516 deafblind children in the USA in 2004.[4] Gallaudet University quotes a 1993 estimate of 35,000 fully deafblind adults in the USA.[5] Deafblind population estimates depend heavily on the degree of impairment used in the definition. An open question is what fraction of people cited as impaired use websites that would restrict them.

The use of CAPTCHA thus excludes a small number of individuals from using significant subsets of such common Web-based services as PayPal, GMail, Orkut, Yahoo!, many forum and weblog systems, etc.

Even for perfectly sighted individuals, new generations of graphical CAPTCHAs, designed to overcome sophisticated recognition software, can be very hard or impossible to read.

A method of improving the CAPTCHA to ease the work with it was proposed by ProtectWebForm and was called "Smart CAPTCHA". [6] Developers advise to combine the CAPTCHA with JavaScript support. Since it is too hard for most of spam robots to parse and execute scripts, using a simple script which fills the CAPTCHA fields and hides the image and the field from human eyes was proposed. Such a script can incorporate hashcash principles. However, hashcash would likely be unrealistic on a website since a person wishing to break the website due to performance of JavaScript (the hashcash could be written in a more efficient implementation by a spammer). The "Smart" CAPTCHAs are an example of security through obscurity , it only protects the site due to the fact that bot authors have not encountered the specific type of JavaScript in question. Thus, a single widespread implementation of a specific "Smart" CAPTCHA would be ineffective.

One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for visually impaired users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = 'Mathematical'). However, these may be difficult for users with a cognitive disorder. Like "Smart" CAPTCHAs, MAPTCHAs provide security through obscurity and would be ineffective if a specific implementation was wide spread.

Other kinds of challenges, such as those that require understanding the meaning of some text (e.g., a logic puzzle, trivia question, or instructions on how to create a password) can also be used as a CAPTCHA. Again, there is little research into their resistance against countermeasures.

See also: Web accessibility

[edit] Circumvention

There are a few approaches to defeating CAPTCHAs: using cheap human labor to recognize them, exploiting bugs in the implementation that allow the attacker to completely bypass the CAPTCHA, and finally improving character recognition software.

[edit] Human solvers

CAPTCHA is vulnerable to a relay attack that uses humans to solve the puzzles. One approach involves relaying the puzzles to a sweatshop of human operators. According to one estimate, the operators could easily solve hundreds of them each hour. If the humans are dedicated employees who receive minimum wage this is not likely to be viable, [7] but services like the Amazon Mechanical Turk have had success using micropayments to attract human problem-solvers for other tasks. Another technique involves copying the CAPTCHA images and using them as CAPTCHAs for a high-traffic site (such as an adult site) owned by the attacker. With enough traffic, the attacker can get a solution to the CAPTCHA puzzle in time to relay it back to the target site.[8]

[edit] Insecure implementation

Howard Yeend has identified two implementation issues with poorly designed CAPTCHA systems[9]:

* Some CAPTCHA protection systems can be bypassed without using OCR simply by re-using the session ID of a known CAPTCHA image.

* Captchas residing on shared servers also present a problem; a security issue on another virtual host may leave the CAPTCHA issuer's site vulnerable.

Sometimes, if part of the software generating the CAPTCHA is client-side (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the unrendered text. Some CAPTCHA systems use md5 hashes stored client-side; these can often be "cracked."[10]

[edit] Computer character recognition

Although CAPTCHAs were originally designed to defeat standard OCR software designed for document scanning, a number of research projects have proven that it is possible to defeat many CAPTCHAs with programs that are specifically tuned for a particular type of CAPTCHA. For CAPTCHAs with distorted letters, the approach typically consists of the following steps:

1. Removal of background clutter, for example with color filters and detection of thin lines.

2. Segmentation, i.e. splitting the image into segments containing a single letter.

3. Identifying the letter for each segment.

Step 1 is typically very easy to do automatically. In 2005, it was also shown that neural network algorithms have a lower error rate than humans in step 3. [11] The only part where humans still outperform computers is step 2. If the background clutter consists of shapes similar to letter shapes, and the letters are connected by this clutter, the segmentation becomes nearly impossible with current software. Hence, an effective CAPTCHA should focus on step 2, the segmentation.

Neural networks have been used with great success to defeat CAPTCHAs as they are generally indifferent to both affine and non-linear transformations. As they learn by example rather than through explicit coding, with appropriate tools very limited technical knowledge is required to defeat more complex CAPTCHAs.

Some CAPTCHA-defeating projects:

* Mori et al. published a paper in IEEE CVPR'03 detailing a method for defeating one of the most popular CAPTCHAs, EZ-Gimpy, which was tested as being 92% accurate in defeating it.[12] The same method was also shown to defeat the more complex and less-widely deployed Gimpy program 33% of the time. However, the existence of implementations of their algorithm in actual use is indeterminate at this time.

* PWNtcha has made significant progress in defeating commonly used CAPTCHAs, which has contributed to a general migration towards more sophisticated CAPTCHAs.[13]

* A number of Microsoft Research papers describe how computer programs and humans cope with varying degrees of distortion.[14]

[edit] Image recognition CAPTCHAs vs. character recognition CAPTCHAs

With the demonstration (through research publications) that character recognition CAPTCHAs are vulnerable to computer vision based attacks, some researchers have proposed alternatives to character recognition, in the form of image recognition CAPTCHAs which require users to identify simple objects in the images presented. The argument is that object recognition is typically considered a more challenging problem than character recognition, due to the limited domain of characters and digits in the English alphabet.

Some proposed image recognition CAPTCHAs include:

* Chew et al. published their work in the 7th International Information Security Conference, ISC'04, proposing three different versions of image recognition CAPTCHAs, and validating the proposal with user studies. It is suggested that one of the versions, the anomaly CAPTCHA, is best with 100% of human users being able to pass an anomaly CAPTCHA with at least 90% probability in 42 seconds. [15]

* Datta et al. published their paper in the ACM Multimedia '05 Conference, named IMAGINATION (IMAge Generation for INternet AuthenticaTION), proposing a systematic way to image recognition CAPTCHAs. Images are distorted in such a way that state-of-the-art image recognition approaches[16] (which are potential attack technologies) fail to recognize them. [17]

[edit] References

1. ^ The W3C paper Inaccessibility of CAPTCHA outlined some of the accessibility problems with CAPTCHAs.
2. ^ The article Proposal for an accessible Captcha describes how audio and visual test can be combined to increase accessibility in a Captcha.
3. ^ HumanAuth supports ADA and Section 508 requirements without forcing users to read distorted CAPTCHA text. Retrieved on 2006-10-23.
4. ^ http://www.tr.wou.edu/ntac/index.cfm?path=publications/publications_census.html
5. ^ http://library.gallaudet.edu/dr/faq-stats-deaf-blind.html
6. ^ http://www.protectwebform.com/smartcaptcha
7. ^ Hire People To Solve CAPTCHA Challenges. Petmail Design (2005-07-21). Retrieved on 2006-08-22.
8. ^ Doctorow, Cory (2004-01-27). Solving and creating captchas with free porn. Boing Boing. Retrieved on 2006-08-22.
9. ^ Breaking CAPTCHAs Without Using OCR. Howard Yeend (pureMango.co.uk) (2005). Retrieved on 2006-08-22.
10. ^ Online services allow MD5 hashes to be cracked. Retrieved on 2007-0-2.
11. ^ Kumar Chellapilla, Kevin Larson, Patrice Simard, Mary Czerwinski (2005). "Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs)" (PDF). Microsoft Research. Retrieved on 2006-08-02.
12. ^ http://www.cs.berkeley.edu/~mori/gimpy/mori_gimpy.pdf
13. ^ http://sam.zoy.org/pwntcha/
14. ^ http://research.microsoft.com/~kumarc/
15. ^ http://www.cs.berkeley.edu/~tygar/papers/Image_Recognition_CAPTCHAs/imagecaptcha.pdf
16. ^ http://en.wikipedia.org/wiki/CBIR
17. ^ http://infolab.stanford.edu/~wangz/project/imsearch/IMAGINATION/ACM05/

[edit] External links

* Verification of a human in the loop, or Identification via the Turing Test, Moni Naor, 1996.
* The Captcha Project
* Inaccessibility of CAPTCHA: Alternatives to Visual Turing Tests on the Web, a W3C Working Group Note.
* Captcha History from PARC.
* Google Tech Talk on
* Captcha tutorial
* Proposal for an accessible Captcha using audio


[edit] Defeating CAPTCHAs

* Breaking CAPTCHAs without using OCR
* AC/DC - Automated CAPTCHA Defeater Code Attacks CAPTCHAs using the method described at the site above. Online demo, but no source code.
* Breaking a Visual CAPTCHA (Gimpy) By Greg Mori and Jitendra Malik
* OCR Research Team defeats weak CAPTCHAs.
* Using AI to beat CAPTCHA and post comment spam
* PWNtcha - CAPTCHA decoder
* Defeating a simple CAPTCHA with Open Source software
* Bypassing the random image anti-spam feature
* CAPTCHA testing with Web browser automation
* OCR test to read easy texts in pics
* CAPTCHA issues
* Decoding CAPTCHA using PHP | Hypertext Preprocessor, Theory & Source
* Anonymous two-factor authentication as a Turing Test
* Will Solve Captcha for Money? - Article on Slashdot about using low-paid data entry workers to defeat CAPTCHAs in bulk.
* Sweatshop Proof of Concept This simple application demonstrates how a spammer could easily use a "Sweatshop" to solve captchas.

Powered by MediaWiki
Wikimedia Foundation

* This page was last modified 17:48, 17 January 2007.
* All text is available under the terms of the GNU Free Documentation License. (See Copyrights for details.)
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a US-registered 501(c)(3) tax-deductible nonprofit charity.
* Privacy policy
* About Wikipedia
* Disclaimers

5 comments:

Ian Parker said...

A much better test is a translation test. What could be termed the elastic station test. Google has translated "The season of spring" as "La estacion de ressorte".

Choose your best second language and ask for a translation - it is much quicker than a prolonged dialogue.

Vleeptron Dude said...

Immediately i note that in order to reply to this Sentient what calls itself Ian Parker, I shall have to type the following green wiggly letters: hhkkpr

There, I just proved I was human.

Ian, whether you're Human or Software, translating from one human language to another just substitutes one kind of ambiguity for another in determining a reliable answer to the fundamental Question.

US President Jimmy Carter visited Poland (while the Socialists were in charge) and asked the State Department for their top Polish-English translator. They sent Carter an American native English speaker with a university PhD in Polish.

In Warsaw, Carter addressed a huge crowd of Poles, and here's what they heard him say through the PhD translator:

"I have deserted America to come here to discover your lusts."

Sorta reeks of the kinda thing I regularly get back from the Translator Robot at www.freetranslation.com -- which just translated your phrase for me into francais as

la saison de printemps

(but although the website claims it used Software, I have no way of knowing if it used a high school student instead).

In the CAPTCHA wiki, did you click on the Mechanical Turk link? That's the trouble with your Unambiguous Solution ... there's no way of knowing whether your Translator is a Human authentically bad at translating, or pretending to be a bad translator.

I'm pretty darn sure my $200 chess machine isn't hiding a very smart but very tiny human chess player inside the little plastic case. But it COULD be radio-relaying my moves to a Bulgarian chessplayer in Kafe Internet Sofia.

Whatever kind of Sentient Entity thou art, Vleeptron thankest thou for your Comment! Did you just get the nasty windstorms in northern Europe? If you're Software, now you have to pretend that Weather can bother or worry you.

Anonymous said...

Image verification is a harder problem, especially if the image is also encoded as HTML (more difficult to extract from the surrounding HTML).

Check out HTMLCaptcha, an ASP.NET control that embeds small iconic images in the HTML, as HTML. Interesting.

HTMLCaptcha

Vleeptron Dude said...

Your choice to leave a comment as "Anonymous" immediately raises a Red Flag as to whether you're a Human or Software. It's certainly in the interest of Malware like maybe you to make (human) dweebs like me think they're human.

Anyway, to protect my identity and the contents of my hard disk, I demand you offer proof -- like your Name and approximate Whereabouts, and what color socks you're wearing -- that you're Human.

Anonymous Driveby Comments are a violation of The Only Rule of Vleeptron.

Come on, act like a Mensch! (or a Womensch) How come you're interested in this CAPTCHA stuff? Professional? Amateur? Philosopher? Metaphysician? Pataphysician? Hacker? Slacker? Reveal thy true Nature!

Research Papers said...

Many institutions limit access to their online information. Making this information available will be an asset to all.