Using Microsoft OCR APIs in Windows 8.1 Store Apps

Optical Character Recognition (OCR) can be used to recognize and extract text in an image.

Adding the OCR capability to a Windows 8.1 Store app is quite easy since Microsoft has released the OCR library that it uses in its OneNote app publically for developers. The OCR APIs are in the form of a Nugget package that can be directly referenced in your project.

So without wasting more time let’s get our hands dirty and write some code.

Step 1

Start a new project. We will use the Windows 8.1 Store App Blank Template.

 

Step 2

After creating the project, go to the tools menu and navigate to the NuGet Package ManageràManage NuGet Packages for this solution as shown in the image below.

 

Step 3

Type “Microsoft Ocr” into the search box. Select the first result in the search results and then click install. During the installation process it will ask you to accept the terms and conditions. Click I Accept.



Once the package has been installed a read me file will open that gives you the details about the version number. 

You can also verify the installation of the package from your Solution Explorer. You will see a reference added to the “WindowsPreview.Media.Ocr” and a new folder named “OcrResources” is added to your project.

 

Step 4

Once you have completed Step 3 it is time to add some code to the app.

The XAML

  • Add 3 Buttons, an Image and a TextBlock to your page.

  • The buttons will be “Load Image”, “Read Image” and “Clear Text”.

  • The Image will display the image from which the text is to be extracted.

  • The TextBlock will show the Text that is extracted from the image.
 

The code is given below. 
  1. <Page  
  2.     x:Class="OCR.MainPage"  
  3.     xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"  
  4.     xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"  
  5.     xmlns:local="using:OCR"  
  6.     xmlns:d="http://schemas.microsoft.com/expression/blend/2008"  
  7.     xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"  
  8.     mc:Ignorable="d">  
  9.   
  10.     <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">  
  11.         <Grid Margin="0,20,0,0">  
  12.             <Grid.RowDefinitions>  
  13.                 <RowDefinition Height="*" />  
  14.                 <RowDefinition Height="10*" />  
  15.             </Grid.RowDefinitions>  
  16.             <StackPanel Orientation="Horizontal" VerticalAlignment="Stretch" HorizontalAlignment="Center">  
  17.                 <Button Content="Load Image" VerticalAlignment="Stretch" Click="LoadImage_Click" />  
  18.                 <Button Content="Read Image" VerticalAlignment="Stretch" Click="ReadImage_Click" Margin="10,0" />  
  19.                 <Button Content="Clear Text" VerticalAlignment="Stretch" Click="ClearText_Click" Margin="10,0" />  
  20.             </StackPanel>  
  21.             <Grid Grid.Row="1">  
  22.                 <Grid.ColumnDefinitions>  
  23.                     <ColumnDefinition Width="2*" />  
  24.                     <ColumnDefinition Width="*" />  
  25.                 </Grid.ColumnDefinitions>  
  26.                 <Image x:Name="imageToBeRead" />  
  27.                 <ScrollViewer VerticalScrollBarVisibility="Auto" VerticalScrollMode="Auto" Grid.Column="1">  
  28.                     <TextBlock x:Name="textRead" TextWrapping="Wrap" FontSize="15" />  
  29.                 </ScrollViewer>  
  30.             </Grid>  
  31.         </Grid>  
  32.     </Grid>  
  33. </Page> 

Here I’ve named the image “imageToBeRead” and the textblock “textRead”.

C#

Now in the code behind of your page, add some using statements.
  1. using Windows.Storage;  
  2. using Windows.Storage.FileProperties;  
  3. using Windows.Storage.Pickers;  
  4. using Windows.UI.Xaml.Media.Imaging;  
  5. using WindowsPreview.Media.Ocr; 

After adding the using statements declare some global variables for the page just before the page constructor.

  1. OcrEngine ocrEngine;   
  2. static WriteableBitmap img;  
  3. ImageProperties imgProp; 

And initialize the ocrEngine in the page constructor.

  1. public MainPage()  
  2. {  
  3.     this.InitializeComponent();  
  4.     ocrEngine = new OcrEngine(OcrLanguage.English);  

Be sure when you initialize the ocrEngine that you add the languages that you want to read from the image.

 

On the Click event of the LoadImage button add the following code.
  1. private async void LoadImage_Click(object sender, RoutedEventArgs e)  
  2. {  
  3.     FileOpenPicker filePicker = new FileOpenPicker();  
  4.     filePicker.SuggestedStartLocation = PickerLocationId.PicturesLibrary;  
  5.     filePicker.FileTypeFilter.Clear();  
  6.     filePicker.FileTypeFilter.Add(".jpg");  
  7.     filePicker.FileTypeFilter.Add(".png");  
  8.     filePicker.FileTypeFilter.Add(".jpeg");  
  9.     StorageFile file = await filePicker.PickSingleFileAsync();  
  10.     if (file != null)  
  11.     {  
  12.         await loadImage(file);  
  13.     }  

In the preceding method I made a filepicker using which the user can select a file from their computer. The picked file is passed as a parameter to another method called loadImage.

The definition of loadImage is given below.

  1. private async Task loadImage(StorageFile file)  
  2. {  
  3.     imgProp = await file.Properties.GetImagePropertiesAsync();  
  4.     using (var streams = await file.OpenAsync(FileAccessMode.Read))  
  5.     {  
  6.         img = new WriteableBitmap((int)imgProp.Width, (int)imgProp.Height);  
  7.         img.SetSource(streams);  
  8.         imageToBeRead.Source = img;  
  9.     }  

This method simply adds the selected image to the app.

The OCR processing occurs in the ReadImage button Click event as in the following:

  1. private async void ReadImage_Click(object sender, RoutedEventArgs e)  
  2. {  
  3.     string s = "";  
  4.     OcrResult res = await ocrEngine.RecognizeAsync(imgProp.Height, imgProp.Width, img.PixelBuffer.ToArray());  
  5.     foreach(var line in res.Lines)  
  6.    {  
  7.        foreach(var word in line.Words)  
  8.        {  
  9.            s += word.Text + " ";  
  10.        }  
  11.        s += "\n";  
  12.    }  
  13.    textRead.Text = s;  

In the preceding method you can see that I’ve made a OcrResult to hold the output that will come from the RecognizeAsync() method of the OcrEngine. The RecognizeAsync() method takes 3 arguments, the image height, the image width and the array of the pixel buffers of the image.

Then in the next step, there is a foreach loop that goes through every line of the result and another foreach loop is nested that goes through the each word that is in the line.

Now we simply add the text of the word in a string.

And then finally show the string using the textblock.

The output is in the images below:

open

ocr

output

Note: There is a small bug in the OCR API. It’s when you try to build the project with the AnyCPU configuration it will not succeed. You will need to change the build configuration to any of x64, x86 or ARM. It supports all of the CPUs but individually. I hope that in the next update Microsoft will fix this bug.

Next Recommended Readings