My son, home this past summer from his first year of university, found himself a job doing data entry for a company processing claims forms. Every day he would receive a batch of scanned documents and transcribe their contents into an online portal, submitting the data for further processing. Although I don’t begrudge him the reasonably generous compensation he received for this work, it did strike me as a bit odd that the work couldn’t have been automated using optical character recognition (OCR) technology.

So, I did a bit of research on the topic of OCR and found that there are plenty of tools that can do a good job scanning and converting printed or typed text. OCR that can recognize handwriting, however, is another story altogether. The difference is that—especially if, like me, you have notoriously poor penmanship—handwritten characters just aren’t consistent enough for a computer algorithm to identify reliably. They may be blurry or smudged or vary too much in size or shape from one word to the next to be easily machine readable. The digit ‘1’ could be confused for a lower-case ‘l’ or upper-case ‘I’ or even, if written crookedly enough, resemble a ‘7’. The letter ‘S’ might look like a ‘5’, and, well, you get the idea. Handwriting can be ambiguous.

Humans do pretty well handling ambiguity but computers, not so much—unless we bring artificial intelligence to bear. As I continued my research, I found that handwriting recognition in particular is an area where the subset of AI called Machine Learning (ML) has proven to be especially helpful. I’ve written extensively in past posts about the opportunities and especially the limitations and risks inherent in AI, focusing mostly on generative AI, which has dominated the headlines this year. My argument has been that limited domains of AI tend to be less risky and more useful. ML is a good example of such useful AI where we can control the input domain and let the system solve specific problems with a high degree of reliability—and as we’ll see, it’s also a promising area where quantum computing can provide additional benefits.

**Deep study**

What exactly is machine learning? Wikipedia gives us a sufficient definition: it’s “an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines ‘discover’ their ‘own’ algorithms.” Or, as pithily summarized in an applied mathematics textbook I’ve been reading on the subject: “Take data and statistical techniques, throw them at a computer, and what you have is machine learning.” What it comes down to is letting a computer find patterns in a large amount of data by repetitively applying algorithms where we can tweak the parameters based on prior results, thus letting the machine learn more or less on its own.

There are three broad categories of this:

**Supervised learning: **An algorithm is given both a large set of input data to be classified along with a set of output data to test against. Think of handwriting recognition, for example. We can start with a set of input data; which might be some of my poorly scribbled notes from a meeting or phone call; meanwhile, the sample output data might be the letters of the alphabet and digits 0 through 9, neatly and consistently typed. Both the input and output data can be broken down into pixelated images of individual letters or digits. The algorithm could start by comparing the pixels of each handwritten character to the given output data and classify as well as possible which letter or digit is represented. I could check the results against what I know I had written, and have the algorithm try again, repeating the process until I find that it has been sufficiently trained to recognize my handwriting. Then, I can feed it a new batch of my handwriting—or even someone else’s handwriting—with a fairly high degree of confidence that it will successfully make sense of it.

**Unsupervised learning:** Unlike supervised learning, the algorithm is given only a certain amount of input data with no sample output data, and it then finds relationships or patterns in the data without any further prompting. The algorithm tries to find common features among some of the data elements and tries to build subsets of the data based on that commonality. Those subsets can be tested for relevance or accuracy, and then continually refined. The results may not always be predictable. Customer segmentation is a good application, where a retailer may try to identify common buying patterns to better target marketing and advertising content. A classic example is the story of beer and diapers—American convenience stores discovered a long time ago that these two products were frequently purchased together, especially on Saturday mornings. The underlying pattern was the dads of young families doing weekend shopping. A possible resulting action might be for the retailer to place other related products in close proximity—potato chips or nuts, for instance—to increase the likelihood of an additional impulse purchase.

**Reinforcement learning:** A typical example of this is teaching a computer to play a game (Go, checkers or chess are good examples). This is unsupervised learning in that the computer just needs to know the rules of the game—what are valid or invalid moves. The computer can begin by making random moves until it loses or wins a game—negative or positive reinforcement. (OK, maybe not totally random: we can provide some data on what might constitute good or bad moves in the context of a game.) Moves that lead to a winning game are weighted to be used more frequently in the next round, and vice versa for losing moves. Over time the computer gets better and better until it can play the game competently.

Note that you could apply the same use case in different categories, using, say, reinforcement learning for handwriting recognition or supervised learning for customer segmentation. The main thing is how much data you have and how long you want to repeatedly test the algorithms.

**Time for recess**

It’s important, though, not to give your algorithm too much input data to start with, or let the training run too long. With too much data or too many iterations, you might encounter the problem of overfitting—the algorithm gets so good at the specific characteristics of the data it’s given that it can no longer generalize well to other cases. It has essentially memorized that specific data set alone. With a smaller amount of data, it can learn just enough to recognize patterns and extrapolate the results to new data. Therefore, optimizing the input data for machine learning is a critical first step that may take several attempts to get right.

The mathematics behind all this quickly gets complicated. Each of your inputs is a data object with some number of features about which you’re trying to learn something useful. If you have data objects with two features each—for example looking at a group of people and considering height and weight—then you can represent them as vectors with two elements [x, y], which you could plot on graph paper. With three features per data object, you could work with a vector [x, y, z] and plot it in a three-dimensional space. But consider the handwriting example—if a letter was represented in even a very low-resolution image of 28x28 pixels, then each pixel is a feature of the data object and now you’re working in a space of 784 dimensions which might be just a bit harder to visualize.

Next, you have to try and figure out how close or far off your machine learning algorithm is from the correct solution, each time. This is done using a ‘cost function’ that measures the difference between the generated and expected results for each feature of the data points being analyzed. The farther the machine algorithm’s result is from the expected, the higher the cost function. So, successful machine learning is all about minimizing the cost function. Does this look familiar? It’s another example of optimization.

Clearly, this is where quantum comes in. Remember that quantum computers do some things very well—especially complex, computationally intensive math problems like the ones described above for machine learning, and related optimization scenarios. There’s a lot of research being done to apply quantum programming techniques to machine learning algorithms. With the advent of useful quantum computing just around the corner, I think we’ll see significant advances in machine learning. If quantum can accelerate the execution of machine learning algorithms, we can run more iterations with more parameters to improve our accuracy. And quantum can help efforts to minimize the cost function as well. Getting better results faster—that’s what quantum computing is all about.

**Back-to-back to school**

Now, just to deepen the entanglement of the two fields, and maybe make the whole process somewhat recursive, I recently came across an article suggesting that machine learning might help improve quantum computing. A group of Japanese researchers have been studying error correction techniques for quantum computers to find ways to reduce overhead and complexity. The researchers used reinforcement learning to determine the best way to encode specific types of qubits to minimize the amount of measurement needed, which often triggers the errors. It’s still in the experimental stage, but the results look promising.

Quantum computing can make machine learning more accurate, and machine learning might make quantum computing less error prone. Now that’s what I’d call a virtuous cycle.