One of the most commonly used Binary Classification technique. Given (m) training examples you want to classify these training examples into 2 groups.
Let’s consider a problem statement where given a set of images we want to know if this is a Dog ( Probability – 1) or not (Probability – 0 )
These images in a computer are represented by pixel density based on the color pattern. This pixel vector is the Feature vector( A vector that represents the important characteristics of an object) for the images.
Given an Image ( I ) will have a feature vector (n) of dimension
Row = Number of pixel across Red, Green, Blue band
Each image in our data set will have the corresponding vector related to three colors Red, Green and Blue, we can stack them together to produce one big column vector with all values
Each value in this vector is independent of other but is part of single observation i.e in our case part of one single image. Like this we will have a number of vectors = number of images, with the dimension of the vector as [number of Pixels X 1], if we represent the number of Pixels as nx., then the dimension of the vector becomes 1 X nx
To determine if the given image is of a Dog or not the only way is to look at the characteristics of every input image we have in training set since these images are nothing but the vector of pixels which means using these pixels we need to determine if the given image is of a dog!
Imagining Logistic regression as the simplest Neural Network
We can imagine logistic regression as a one-layer neural network also known as a shallow neural network.
We will have inputs (x1,x2,……xm) where M = number of training examples we have. With every input in our case an image to be more precise a feature vector corresponding to that image, we will have a weight associated with every input.
Where Ŷ is the probability of the feature being of a dog picture feature i.e probability of Picture being of a Dog.
ŷ= P (Y = 1/X)
Some Function = b0+ w1*x1 + w2*x2 + …..+ wm*xm
Need for constant in Logistic Regression b0
With regression we want to approximate a function that defines a relationship between X and Y ( Input -> to -> Output) to get this working we need a bias term so that we can predict accurate values i.e if your input values are zero then the predicted value would also have to be zero Adding a bias weight that does not depend on any of the features allows the hyperplane described by your learned weights to more easily fit data that doesn’t pass through the origin
W€Rnx Where R is some real number vector of dimension nx
b€R Where R is some real number vector
Question is to predict ŷ using w,b given inputs x1,x2,……xm
Since the output of a logistic regression function can only be 0 or 1. The output function would be a Sigmoid function defined as
ŷ = σ ( wTx+b )
z = wTx+b
ŷ = σ ( z )
σ ( z ) = 1 / 1 + e -z
Case1 = When Z is very small, in that case, e -z will be some big number giving value for σ ( z ) ~0
Case2 = When Z is very large , in that case e -z will be ~ 0 giving value for σ ( z ) ~ 1
So, in logistic regression, the task is to learn w,b so that ŷ becomes a good estimate of y.
In the next blog post, we will discuss how to learn the values for w,b