Introduction
In this experiment, the Multiclass Logistic Regression algorithm is used for predicting the letter category as one among the 26 letters in the English alphabet.
Description
Letter Recognition using Multiclass Classification
This experiment demonstrates how to build a multiclass classification algorithm model letter recognition, using Azure Machine Learning Studio.
Workflow
The entire workflow of the model in Azure ML is given below.
In the model, Multiclass Logistic Regression is used for modelling and Two-Class SVM with One-v/s-All Multiclass for comparing the result such as accuracy.
The basic steps to build this experiment are as follow.
- Step 1 : Get Data
- Step 2 : Pre-Process Data
- Step 3 : Define Features
- Step 4 : Train Model
- Step 5 : Score and Evaluate Model
Step 1 Get Data
The letter recognition data from UCI Machine Learning repository is used as the dataset. The dataset consists of 20000 unique letter images generated by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts.The features of each of the 20000 characters were summarized in terms of 16 primitive numerical attributes.
In order to access this dataset, drag the "Import Data" module to the experiment canvas. This module can be used to specify the data source for an experiment.
In properties,the data source is set as Web URL via HTTP and provide the URL of the dataset and the data format.
The data contains 20K rows and the column 1 contains the label,which indicated the letter(A-Z) which is represented by the 16 numerical attributes(Col 2- Col 17).
Step 2 Data Preparation
In general, the preparation of the data involves
- Selecting a relevant subset of columns from the entire dataset
- Converting columns to categorical or continuous variables
- Changing the data type
- Taking care of missing values
- Normalizing the data or binning values
The dataset is being used for this experiment in already in good format, so no data preparation is needed in this experiment.
Step 3 Define Features
Feature definition is totally domain dependent. Selecting a good set of features for a model needs experimentation and sound knowledge about the problem(domain knowledge).
Step 4 Train Model
The split data module is used to divide the data set into training data and testing data. Half of the data is used for training and the remaining data is used for testing the model.
In this experiment, we used Multi Class Logistic regression for modelling and Two Class SVM with One-Vs-All multiplier for evaluation of the model. The outcome of the train model module will be trained classification model and score model module is used to test the model.
Step 5 Score and Evaluate Model
Run the experiment and view the output of the Evaluate Model module by clicking the output port and selecting Visualize. The following diagram shows the resulting statistics for the model.
Results
Each column of the matrix output represents the instances in a predicted class, while each row represents the instances in an actual class. And, the corresponding classification of each letter is represented with accuracy.