This is a post contributed to AppCoda. Here you can read only the introduction.
Welcome to a new tutorial, where we are going to discuss about two quite interesting, related and interconnected concepts; how to scan images and perform text recognition on them. It might sound like a complicated task, but soon you will find out that this is far from being true. Thanks to Vision framework, performing text scanning and recognition is nowadays a quite straightforward job.
Let’s see briefly a few details regarding both tasks. In order to scan images with a device, VisionKit framework provides a specific class called VNDocumentCameraViewController for that purpose. It’s a UIKit view controller that allows to scan one or more pages using a system provided user interface and the camera. What we get back are images (UIImage objects), which we can handle any way we desire.
With scanned pages, meaning images that contain text available on our hands, Vision frameworkcan get into play. Using scanned images as input, it performs the actual recognition and returns back text. It’s possible to configure a few aspects of the recognition task, and affect that way the overall accuracy and the speed of the process. However, the details about all that is something that will be discussed extensively later.
We are going to meet all that through a small SwiftUI application. Obviously we are going to mix UIKit and SwiftUI given that VNDocumentCameraViewController is a UIKit view controller, but we will do that and everything else together and step by step.
In the next part you will get an overview of the sample app we’ll be working on, and then we’ll implement everything described above; we’ll start with the document scanning, and then we’ll pass to the text recognition stuff. By finishing this tutorial, you will be able to integrate text scanning and recognition capabilities into your own apps.