Automated CAPTCHA Solving: An Empirical Comparison of Selected Techniques
Abstract.
CAPTCHAs exploit the gap in the ability between a human and a machine to understand the semantics of specific multimedia content, with vast applications in computer security. In this paper we compare two techniques in automated CAPTCHA solving for text-based CAPTCHA schemes, i.e., classification based on the Vector Space Model (VSM) versus a popular Optical Character Recognition (OCR) engine. For each technique, we build a CAPTCHA solver and give it specific sets of text-based challenges to break. From our results we draw conclusions whether it is efficient to create a CAPTCHA solver by applying parts of the VSM theory and implementing a Vector Space Image Recognizer (VSIR).
Keywords: Image recognition; Semantic context extraction; VSM; OCR;
Download: (pdf)