A Gentle Introduction to Machine Learning

Ankit Narang
Analytics Vidhya
Published in
6 min readNov 1, 2020

--

Machine learning is an application of Artificial Intelligence (AI) that provides machine the ability to automatically learn and improve from experience without being explicitly programmed.

In today’s business world, machine learning is used to solve many complex problems which help companies to save time and cost and automate their most time-consuming operations.

Data is the main aspect that gives the machine to learn and make intelligent decisions. With the resources and compute power available to us today we can easily make the machines understand the patterns, relations and trends in data and allow machines to automatically learn without being explicitly programmed to do so.

“Machine learning is all about mathematics”

Mathematics plays a huge role in machine learning as all the algorithms we come across involves mathematical equations and/or statistics in some way or the other.

Some of the machine learning methods are explained below.

Supervised Machine Learning

Supervised
You are getting supervised as you learn.

This is the most common type of machine learning, it involves providing data which includes independent variables and dependent variables and let machine find the best possible relationship between them and predict the unknown data points.

Independent variables also called features and represented by x, are the inputs for the process that is being analyzed. On the other hand, dependent variables are those which are dependent upon the independent variables and the machine tries to find a relation between them.

For example, if you are trying to predict the price of the houses in your state and you are provided with two variables, the number of rooms and the price of the house. Here, the number of room variables would be an independent variable and the dependent variable would be the price which changes as the number of rooms in the house changes.

Hence, supervised machine learning is the one where we have both features and class labels/values which we want to predict using various machine learning algorithms.

Classification

Accurate classification

It is one of the types of supervised machine learning algorithms, in classification we try to predict a class label by building a predictive model with the data in hand.

Some of the most common use cases of classification include:

· E-mail spam classification.

· Predicting whether the disease is malignant or benign.

· Whether the user will churn or not.

For the purpose of classification, we require a training dataset on which we train our model and a test dataset which is used to evaluate the performance of the model. Classification also has many types based on the number of class labels in our dataset. If there are only two class-label we call it a two-class classification or binary classification which is very widely seen in most of the use cases however, there are few other types of classification also present, it includes Multi-Class Classification, Multi-Label Classification and Imbalanced Classification.

Some of the well-known classification approaches include:

1. Logistic Regression

2. K Nearest Neighbours (K-NN)

3. Naive-Bayes Algorithm

4. Random Forest model

5. Support Vector Machine

Regression

Real number matters!

This approach is quite similar to the classification approach of machine learning in the sense that it includes both features and output variable but here the outcome variable is not a discrete value rather it is a continuous numerical value for e.g. price of the house, the height of an individual, speed, temperature etc.

This is different from classification and the metrics used to evaluate the trained model also different as compared to what we use in a classification approach. The most common and probably one of the oldest techniques of using a regression problem is linear regression.

In the case of linear regression, we try to find a linear equation that depicts the relation between the features and outcomes and once we have found the most optimal linear equation, we make use of that to predict the query points.

y = m * x_1 + c ## Linear Equation

Where x1 is the data point, m is the slope of the line and c is the intercept term representing the point where our line starts on the vertical axis.

Machine learning is mainly concerned with predicting the values and finding the loss, which is the difference between the predicted and actual outcome values. It tries to minimize the loss or error term in order to make the model predictions as close to the actual values as possible.

Unsupervised Machine Learning

Do you see those clusters?

It refers to the approach of finding the patterns and trends in the given data to classify or group the data into clusters which shows a very useful insight about our dataset.

Clustering is the most common and widely used unsupervised learning technique. It tries to find a structure in a collection of unlabelled data. It can be defined ad

“The process of organizing data points into groups whose members are similar in some way”

So, the members within a cluster have some characteristics and features in common but they are different as compared to the members of other clusters.

It is quite commonly used in market targeting, when we have geolocation data and we try to group our customers based on their purchases or behaviours and then provide a tailor-made campaigns to them.

Unsupervised machine learning algorithms include:

· Clustering

· K-Means Clustering

· Exclusive Clustering

· Overlapping Clustering

· Hierarchical Clustering

Let us also understand the technique which is picking up in recent times.

Reinforcement Learning

Reinforcement learning (RL) is an area of machine learning concerned with how a machine can learn things in an environment with an agent present who gives a reward to the machine for every correct decision/prediction made by the machine.

Main points involved in reinforcement learning are given below:

Input — The input is the initial point where the model will start.

Output — There could be a number of outputs depending on the situation and approach used.

Training — In reinforcement learning, the training is based on the input, the model will return output and the agent will decide whether to give the machine a reward or punish it for making a wrong choice. In this way, the confidence and accuracy of the prediction keep on increasing.

The best solution is then decided based on the maximum reward.

It is widely used in the gaming industry because of its capabilities and performance in the past. For example, a Chess game, Go etc. It is also used in improving the performance of autonomous cars.

With this, we come to an end of this article in which I talked about machine learning and its types. I hope you got something out of it. Thank you for reading and if you liked it do not forget to follow me on medium so that you do not any updates.

--

--

Ankit Narang
Analytics Vidhya

An autodidact having great interest in Data science and manage projects and people with care. Thank you for reading my article!