Logistic Regression

Think of a question "Will it rain today evening?". To answer the question, various inputs need to be considered: temperature, humidity, wind etc. And the answer will be either "It is very likely" or "It is not likely. If you think of the process, and the outcome, the influence of various independent variables was considered and the probability of a discrete outcome was given. The model will not deliver an absolute answer but rather given you the probability of one of the discrete outcome happening.

The function used to determine the probability of the outcome is a called a logit function. So Logistic regression is also known as logit regression. The output of a logit function lies between 0 and 1 (probability).

Some common machine learning algorithms

A few commonly used machine learning algorithms

Linear Regression

Linear regression is one of the simplest algorithms in machine learning. It is used to predict the value of a dependent variable (Y) based on an independent variable (X).

Consider the problem of predicting weight of a person based on height. We know that the 2 are not a simple relationship where we can say for every additional inch height, the weight will increase by 0.5 lbs. When we take a sample set of this data, we will see a cluster of points.

The model derives the best fit line using these points. Using the best fit line, the slope(a) and intercept(b) are derived. For any new given independent variable (X), the dependent variable (Y) is derived using the equation Y = aX + b.

Types of Machine Learning

Broadly there are 3 types of machine learning:

Supervised Machine Learning

Take the example of college admissions. For simplicity, let us assume that the admission board makes a decision based on your school grade, SAT score, and interview outcome. Given this dataset, the algorithm learns and tunes the model to achieve a desired level of accuracy.

Supervised learning problems are broadly classified into 2 categories:

  1. Regression - The model tries to predict results with a continuous output.
  2. Categorization - The model tries to predict results in a discrete set of outputs.

Unsupervised Machine Learning

Think of building the decision control on an unmanned vehicle going to an alien planet. To start with there is insufficient or no data available to train. Secondly the factors that influence the outcome are not yet determined. In such a case you need an algorithm that can take in the increasing data sources, be able to identify the influence and magnitude of influence of each data source and also be able to revise it's learning as the data sets grow. Visually this could be a example of generating a cluster graph of decisions.

Reinforcement Machine Learning

This can be seen as a step further than unsupervised learning where the algorithm receives a reward or penalty for the outcomes it generates. Based on this feedback, it decides on the next steps for tuning it's model.

Python - date and time formatting cheatsheet

Following are the format codes that can be used with strftime() method in datetime library:

Format codeDescription
%aLocale’s abbreviated weekday name.
%ALocale’s full weekday name.
%bLocale’s abbreviated month name.
%BLocale’s full month name.
%cLocale’s appropriate date and time representation.
%dDay of the month as a decimal number [01,31].
%fMicrosecond as a decimal number [0,999999], zero-padded on the left
%HHour (24-hour clock) as a decimal number [00,23].
%IHour (12-hour clock) as a decimal number [01,12].
%jDay of the year as a decimal number [001,366].
%mMonth as a decimal number [01,12].
%MMinute as a decimal number [00,59].
%p Locale’s equivalent of either AM or PM.
%SSecond as a decimal number [00,61].
%UWeek number of the year (Sunday as the first day of the week)
%wWeekday as a decimal number [0(Sunday),6].
%WWeek number of the year (Monday as the first day of the week)
%xLocale’s appropriate date representation.
%XLocale’s appropriate time representation.
%yYear without century as a decimal number [00,99].
%Year with century as a decimal number.
%zUTC offset in the form +HHMM or -HHMM.
%ZTime zone name (empty string if the object is naive).
%%A literal '%' character.