# Statistics – Probability Concepts

**Probability**

Probability refers to the likelihood that an event will randomly occur. In data science, this is typically quantified in the range of 0 to 1, where 0 means the event will not occur and 1 indicates certainty that it will. The higher an event’s probability, the higher the chances are of it actually occurring.

The probability of an event can only be between 0 and 1 and can also be written as a percentage.

- The probability of event
**A**is often written as**P( A )** - If
**P( A ) > P( B )**, then event**A**has a higher chance of occurring than event**B** - If
**P( A ) = P( B ),**then the event A and B are equally likely to occur

**Conditional Probability: P( A | B )** is the likely hood of an event occurring, based on the occurrence of a previous event.

**Independent events: **Events whose outcome does not influence the probability of the outcome of another event; **P( A | B ) = P( A ).**

**Mutually Exclusive Events: **Events that cannot occur simultaneously **P( A | B ) = 0.**

**Bayes’ theorem**

In probability theory and statistics, **Bayes theorem **describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is as follows:

**Probability Distribution Functions**

A probability distribution is a function that represents the likely hood of obtaining the possible values that a random variable can assume. They are used to indicate the likelihood of an event or outcome.

As we all know there are two types of random variables **Continuous** and **Discrete**

**Continuous Data Probability Distributions ( C P D )**

A continuous distribution describes the probabilities of the possible values of a continuous random variable. A continuous random variable is a random variable with a set of possible values (known as the range) that is infinite and uncountable

The following are the types of distributions that come under continuous probability distributions

- Uniform Distribution

- Normal/Gaussian Distribution

- T – Distribution

**Discrete Data Probability Distributions ( D P D )**

A discrete distribution describes the probability of occurrence of each value of a discrete random variable. A discrete random variable is a random variable that has countable values, such as a list of non-negative integers. With a discrete probability distribution, each possible value of the discrete random variable can be associated with a non zero probability.

Thus, a discrete probability distribution is often presented in tabular form.

- Poisson Distribution

- Binomial Distribution

**Uniform Distribution – C P D**

Uniform Distribution is a probability distribution where all outcomes are equally likely

When you roll a fair die, the outcomes are 1 to 6. The probabilities of getting these outcomes are equally likely and that is the basis of a uniform distribution.

A variable X is said to be uniformly distributed if the density function is:

The graph of a uniform distribution curve looks like this

You can see that the shape of the Uniform distribution curve is rectangular, the reason why Uniform distribution is called rectangular distribution.

The standard uniform density has parameters a = 0 and b = 1, so the PDF for standard uniform density is given by:

**Normal/Gaussian Distribution- C P D**

It is a type of continuous probability distribution for a real-valued random variable and represents the distribution of many random variables as a symmetrical bell-shaped graph.

Here the parameter μ is the mean of the distribution and the **Standard Deviation**–**σ** is a measure of how spread out numbers are

When we calculate the standard deviation we find that **generally**:

**68%**of values are within**1 standard deviation**of the mean.

**95%**of values are within**2 standard deviations**of the mean.

**99.7%**of values are within**3 standard deviations**of the mean.

The rule is also called the **68-95-99.7 Rule** or the **Three Sigma Rule**.

The probability density of the standard Gaussian distribution is often denoted with the Greek letter ϕ phi. The alternative form of the Greek letter phi φ is also used quite often.

**X**∼**N(μ, σ2)**

**T – Distribution – C P D**

The T distribution, also known as the Student’s t-distribution, is a type of probability distribution that is similar to the normal distribution with its bell shape but has heavier tails. T distributions have a greater chance for extreme values than normal distributions, hence the fatter tails.

When sample sizes are sometimes small, and often we do not know the standard deviation of the population. When either of these problems occurs, statisticians rely on the distribution of the **t statistic** (also known as the **t score**), whose values are given by

**t = [ x – μ ] / [ s / sqrt( n ) ]**

Where x is the sample mean, μ is the population mean, s is the standard deviation of the sample, and n is the sample size. The distribution of the *t* statistic is called the **t distribution** or the **Student t distribution**.

The t distribution allows us to conduct statistical analyses on certain data sets that are not appropriate for analysis, using the normal distribution.

**Degrees of Freedom**

There are actually many different t distributions. The particular form of the t distribution is determined by its **degrees of freedom**. The degrees of freedom refers to the number of independent observations in a set of data.

**Poisson Distribution – D P D**

It is a Probability distribution that expresses the probability of a given number of events occurring within a fixed time period.

Poisson Distribution is applicable in situations where events occur at random points of time and space wherein our interest lies only in the number of occurrences of the event.

A distribution is called Poisson distribution when the following assumptions are valid:

- Any successful event should not influence the outcome of another successful event.

- The probability of success over a short interval must equal the probability of success over a longer interval.

- The probability of success in an interval approaches zero as the interval becomes smaller.

Now, if any distribution validates the above assumptions then it is a Poisson distribution. Some notations used in Poisson distribution are:

- λ is the rate at which an event occurs,

- t is the length of a time interval,

- And X is the number of events in that time interval.

Here, X is called a Poisson Random Variable and the probability distribution of X is called Poisson distribution.

Let µ denote the mean number of events in an interval of length t.

**µ = λ*t.**

The PMF of X following a Poisson distribution is given by:

**Binomial Distribution – D P D**

A distribution where only two outcomes are possible, such as success or failure, gain or loss, win or lose and where the probability of success and failure is the same for all the trials is called a Binomial Distribution.

When you toss a coin the probability of getting a head = 0.5 and the probability of tail can be easily computed as q = 1- p = 0.5.

The parameters of a binomial distribution are n and p where n is the total number of trials and p is the probability of success in each trial.

On the basis of the above explanation, the properties of a Binomial Distribution are

- Each trial is independent.

- There are only two possible outcomes in a trial – either a success or a failure.

- A total number of n identical trials are conducted.

- The probability of success and failure is the same for all trials. (Trials are identical.)

The mathematical representation of binomial distribution is given by:

**Central limit theorem (CLT) **

Given a dataset with unknown distribution it could be uniform, binomial, or completely random, the sample means will approximate the normal distribution.

These samples should be sufficient in size. The distribution of sample means, calculated from repeated sampling, will tend to normality as the size of your samples gets larger.

**Normalisation & Standardisation**

Standardization / Normalization is done in order to reduce the distance between data points when we intuitively visualize them in a multidimensional space. Such a distance reduction will be helpful in many techniques which uses distances between data points like regression, classification, clustering, PCA, LDA, etc.

**Normalization **usually means to scale a variable to have a values between 0 and 1

**Standardization **transforms data to have a mean of zero and a standard deviation of 1