Random Variables

Random Variables

A random variable is a variable with an unknown value or a function that gives values to each of the results of an experiment. Basically when the outcome is unknown and value is determined by the outcome of the experiment, it is referred to random variable.
Types of Random Variables

Discrete - Having specific values, can take on only a countable number and finite of possible outcomes. Example the number of days in June that will have temperature greater than 70 degrees is a discrete random variable as the possible outcomes are the integers from 0 to 30.
Bernoulli - Having only two possible values zero and one, such as toss is coin can always give head or tail.
Continuous - Having any possible value within a range, as there are uncountable number and infinite number of possible outcomes because for number between any two values we can find a number between them such as (6.95+6.94)/2 = 6.945. Examples of continuous random variable are average salary, temperature. Here the probability can be like P(>=20, <=21).

Random variables are used to quantify results of random events.

Expectations

This concept is about a mathematical expectation of a random variable.
The expected value (EV) is the weighted average of the possible outcomes of a random variable, where the weights are the probabilities that the outcomes will occur.
The EV of a random variable is denoted by E[X]. EV Formula:

The EV of a random variable gives a measure of the center of the distribution of the variable. Essentially, the EV is the long-term average value of the variable. Because of the law of large numbers, the average value of the variable converges to the EV as the number of repetitions approaches infinity.
Expectation of the random variable is often called the mean or expected value or the first moment. The difference between expected value and arithmetic mean is that the first involves a distribution of probability and the second involves a distribution of occurrence.
Discrete Random Variable

To calculate the EV for a single discrete random variable, as said above we need multiply the value of the variable by the probability of that value occurring.
For example, a normal six-sided die. Once you roll the die, it has an equal one-sixth chance of landing on one, two, three, four, five, or six.
Given this information, the calculation is straightforward:

(1/6 * 1) + (1/6 * 2) + (1/6 * 3) + (1/6 * 4) + (1/6 * 5) + (1/6 * 6) = 3.5

If we were to roll a six-sided die an infinite amount of times, it would return that the average value equals 3.5.
In some cases, the probabilities of the outcomes are not equal and calculate the expected value as the weighted sum of the outcomes, where the weights are the probabilities of each outcome.

Continuous Random Variable

Integrals must be used, integral is used to compute an area under a curve.
Formala

E[𝑋]=∫−∞∞𝑥⋅𝑓(𝑥)𝑑𝑥.

In the definition of a continuous variable, the integral is the area under the probability density function in the interval between 'a' and 'b'.

f(x) is a probability density

So f(x) dx represents the probability that X is in an infinitesimal range of width dx around x. Thus we can interpret the formula for E(X) as a weighted integral of the values x of X, where the weights are the probabilities f(x) dx.
Example: The height of people in a population can be said as a continuous random variable.
Note: Have not mentioned here the calculation part of integral.

The following are two useful properties of expected values:

If c is any constant: E(cX) = cE(X)
If X and Y are any random variables: E(X + Y) = E(X) + E(Y)

Population vs Sample

Population means having the entire data set, whereas typically often it is only subset of the population.
In general the calculations of population usually is denoted by greek variables and sample by roman variables.

Probability Distribution Functions

Probability Mass Function - PMF

For a discrete distributions.
The PMF is given by P(X=x), gives the probability of the outcome of the single discrete random variable.
Example: PMF is f(x) = 1/6, which is the probability that one roll of a six-sided die will take on one of the possible outcomes one through six. Each of the possible outcomes has the same probability of occurring (1/6 = 16.67%)
For all the PMFs the sum of probabilities of all the possible outcomes is 100%.
Visualization, the PMF can be visualized as a bar chart, with each bar representing the probability of a specific value.

Probability Density Function - PDF

For a continuous distributions.
The PDF is given by P(r1 < X < r2), gives the probability of the range of the outcomes.
Formula

f(x) = $\frac{d F (x)}{d x}$ = F'(x)

Visualization, the PDF can be visualized as a curve, with the area under the curve representing the probability.

Credits: https://faculty.nps.edu/rbassett/_book/statsbook_files/figure-html/unnamed-chunk-288-1.png

Cumulative Distribution Function - CDF

Cumulative probability associated with a function.
The CDF gives the probability that a random variable will be less than or equal to some value, P(X <= x).
In other words, the value of the CDF at a specific point indicates the probability that the random variable will take on a value less than or equal to that point.
The CDF ranges from 0 to 1.
CDF gives you the cumulative probability up to a certain point.
Discrete Distribution

The CDF for discrete random variable is the probability that the random variable is less than or equal to a specific value. It is the sum of the probabilities up to that value.
Formula:

Fx(x) = P(X ≤ x)

As seen above the CDF of a discrete random variable is a series of step functions.
Example:

The probability of getting an outcome by rolling a six-sided die is given as:

Probability of getting 1 = P(X≤ 1 ) = 1 / 6
Probability of getting 2 = P(X≤ 2 ) = 2 / 6
Probability of getting 3 = P(X≤ 3 ) = 3 / 6
Probability of getting 4 = P(X≤ 4 ) = 4 / 6
Probability of getting 5 = P(X≤ 5 ) = 5 / 6
Probability of getting 6 = P(X≤ 6 ) = 6 / 6 = 1

In the discrete example, we calculated the CDF by summing up the probabilities.

Continuous Distribution

The CDF for a continuous random variable is the integral of the probability density function up to a specific point.
Formula:

F(x)=P(X≤x)=∫−∞xf(t)dt

As seen above the CDF of a continuous random variable is usually S-curve, for normal PDF distribution.
In the continuous example, we used the integral of the PDF to calculate the CDF.

Moments

Moments in mathematical statistics involve a basic calculation. These calculations can be used to find a probability distribution's mean, variance, skewness and kurtosis.
Moments are defined as the expected values that briefly describe the features of a distribution, they provide a way to summarize the shape and characteristics of a probability distribution or a dataset.
Sample moments are those that are utilized to approximate the unknown population moments. Sample moments are calculated from the sample data. They provide a way to make inferences about the population based on limited sample data.
Need of Moments:

Estimation of Population Moments
Summary of Statistics
Comparison of Distributions
Interpret Results and Make Inferences

Note: Each measure has its strengths and limitations, and it's often valuable to consider them together for a more complete analysis.

Mean

The first moment of a distribution or dataset is a measure of the center or location, which is ideally mean.
For mean of a population - we can use the notation µX. (µ is the Greek letter mu.)
For mean of a sample - we can use the notation x-barX. (x-bar is the Roman letter.)
Formula:

There are two types of first moments: the raw first moment and the central first moment.
Raw First Moment - μ1:

Raw First Moment is calculated by taking the average of the data points without any adjustment for the mean.
Formula:

μ1=n1∑i=1nXi

Example:

Sample of scores: {75,82,90,68,88}
Calculation: (75+82+90+68+88)/5 = 80.6
The raw first moment is 80.6, which represents the average position of the exam scores.

The mean is the balance point of the dataset. It is the point around which the data tends to cluster.

Central First Moment - μ1′:

Central First Moment is calculated by taking the average of the data points after adjusting the mean, this gives the average deviation from the mean.
Central First Moment is also known as mean deviation.
Formula:

μ1′=n1∑i=1n(Xi−Xˉ)

Example:

Sample of scores: {75,82,90,68,88}
First calculate the raw first moment - simple mean : (75+82+90+68+88)/5 = 80.6
Second, find the mean after adjusting the mean : [(75−80.6)+(82−80.6)+(90−80.6)+(68−80.6)+(88−80.6)] / 5 = 0.56
The central first moment is 0.56, which represents the average deviation of exam scores from the mean. On average, the data points deviate from the mean by 0.56 units.

The central first moment indicates the average distance of data points from the mean. A higher mean deviation suggests greater variability in the dataset.

Median:

The median is the middle value (or midpoint) after all the data points have been arranged in value order as a list of numbers.
The median of a discrete random variable is the value such that the probability that a value is less than or equal to the median is equal to 50%.
Working from the other end of the distribution, we can also define the median such that 50% of the values are greater than or equal to the median.
For a random variable, X, if we denote the median as m, we have:

P[X ≥ m] = P[X ≤ m] = 0.50

Example:

Sample of scores: {75,82,90,68,88}
Here 90 is the median.

If the number of observations are even, median is middle value.
If the number of observations are odd, median is average of two middle values.

Mode:

The mode is the value that appears the most number of times in a data set.
For a discrete random variable, the mode is the value associated with the highest probability.
As with population and sample data sets, the mode of a discrete random variable need not be unique.

These averages offer a one-dimensional view of the data and tell the centre of the data. Averages give us a way of determining where the centre of a set of data is, but they don't tell us how the data varies.
Range:

The range is the difference between the largest value and the smallest value.
The range is a simple way of saying what is the spread of a set of data.
Range does not explain how the data is distributed with the range.
If the data has outliers, using the range to describe how the values are dispersed can be very misleading because of its sensitivity to outliers.

Quantiles
Quartiles
One way of constructing a mini range is to just use values around the centre of the data. We can construct a range in this way by first lining up the values in ascending our and then splitting the data into four equally sized chunks with each chunk containing one quarter of the data.
Eg:
1 1 1 2 2 Q1 2 2 3 3 3 Q2 3 3 4 4 4 Q3 4 5 5 5 10
The values that split the data into equal chunks are known as quartiles, as they split the data into quarters.
Finding the quartiles is like finding the median, instead of finding the value that splits the data in half, here the data will be split into quarters.
Interquartile Range
The lowest quartile is known as the lower quartile or first quartile (Q1), and the highest quartile is known as the upper quartile or third quartile (Q3). The quartile in the middle (Q2) is the median as it splits the data in half.
The range of the values in these two quartiles is called the interquartile range (IQR).
Formula: IQR = Upper Quartile - Lower Quartile
The IQR is the range of the central 50% of the data.
The IQR is less sensitive to outliers than the range.
Deciles
Similar to quartiles, but instead the values that split the data into ten equal chunks, hence called as deciles.
Each segment here contains 10% o the data.
Centile
In centile we take it further and splitting the data into hundred equal chunks, also known as percentile.
Each percentile is referred to by the percentage with which it splits the data, so the 10th percentile is the value that is 10% of the way through the data.
In general the xth percentile is the value that is k% of the way through the data.
Formula: Percentile = (Number of Values Below “x” / Total Number of Values) × 100
Example 1: The scores obtained by 10 students are 38, 47, 49, 58, 60, 65, 70, 79, 80, 92. Using the percentile formula, calculate the percentile for score 70?
Given:
Scores obtained by students are 38, 47, 49, 58, 60, 65, 70, 79, 80, 92
Number of scores below 70 = 6
Percentile of 70
= (6/10) × 100
= 0.6 × 100 = 60
Therefore, the percentile for score 70 = 60%
The percentile is useful for benchmarking and determining the rank or position, in particular to how relative to all the others.
From the above example, the someone has scored 70 in the test. But this number itself does not shows how well one has done relative to anyone else. Now if the student were told that he or she is 60th percentile for the exam was 70, then they would understand they are same or better that 60% of the other students.
Inverse Cumulative Distribution Function - Q(a)
Inverse Cumulative Distribution Function, is also called as quantile function.
The Inverse CDF takes a probability as input and returns the corresponding value from the distribution.
By cumulative distribution function we denote the function that returns probabilities of X being smaller than or equal to some value 𝑥.
Pr(𝑋≤𝑥)=𝐹(𝑥).
This function takes as input x and returns values from the [0,1] interval (probabilities) - let's denote them as p. The inverse of the cumulative distribution function (or quantile function) tells you what x would make F(x) return some value p,
𝐹−1(𝑝)= x
In simple words for given probability value p, we are looking for some x , that results in F(x) returning value greater or equal then p, but since there could be multiple values of x that meet this condition (e.g. F (x) ≥ 0 is true for any x), so we take the smallest x of those.
We can examine the inverse cumulative distribution function by applying it to the standard normal distribution, N(0, 1). For example,
𝐹−1(95%)= 1.645
𝐹−1(99%)= 2.326
Box and whisker plots
Box and whisker plots, specializes in showing different types of ranges. Also known as box plots.
A box and whisker diagram shows the range, interquartile range and median of a set of data. More than one set of data can be represented on the same chart, which means it's great way of comparing datasets.
Single Box Plot, explains range, and the whiskers to the either side of the box show the lower and upper bounds and the extent of the range. Some of the box and whisker plots deliberately have shorter whiskers and explicitly show outliers as dots or stars extending beyond the whiskers. This makes it easier to use see how many outliers there are and how extreme they really are.
Credits: https://flowingdata.com/wp-content/uploads/2008/02/box-plot-explained.gif
On the box and whisker diagram, the length of the whiskers increase in the line with the upper and lower bounds. One can get an idea of how the data is skewed by looking at the whiskers from the plot.
If the box and whisker diagram is symmetric, this means that the underlying data is likely to be fairly symmetric too.
But if the data is skewed to the right, then the mean will be to the right of the median and the whisker on the right will be longer than that of the left. And if the data is skewed to the left, the mean will be to left of the median and the whisker on the left will be the longest.
More than one set of data can be shown on the same chart, so they are useful for comparisons.
Credits: https://miro.medium.com/v2/resize:fit:432/1*CqdMwNZY5TZh42SgLaBCnA.png

Variance

Let's say now we having two different samples, when we check if both are giving same mean how do we differentiate - Variance and Standard Deviation can help us with this.
Variance (σ2 for a population, s2 for a sample) is a measure of the average squared deviation of each data point from the mean.
The variance is a measure of the spread of the data. A higher variance indicates greater variability, with squared deviations emphasizing the impact of larger deviations.
One of the reason we square is, the sum of difference with mean can lead to zero. Technically we can perform abs as well though it will not provide significance to the larger differences. Also in second order of moment squaring the deviation from mean, to visualize higher order of this data.
Formula for a population:

σ2=N1∑i=1N(Xi−μ)2

Formula for a sample:

s2=n−11∑i=1n(Xi−Xˉ)2
Note: For sample the formula is same, but we divide by "n-1" to correct the bias.

Unbiased Variance, concept of n-1 in sample:

The use of n−1 in the denominator when calculating only in the sample variance (s2) is based on Bessel's correction.
Bessel's correction is applied to correct the bias in the estimation of the population variance when using a sample and it helps provide a more accurate and unbiased estimate of the true population variance.
This reason of this correction introduces the factor n−1 in the denominator instead of n, is rooted in the concept of degrees of freedom.
There is more possibilities in variance of underestimation of population mean, and if we mean the denominator smaller it naturally increase the sample mean and making in unbiased - is the attempt to make sample mean (this is known) to closer to population mean (which is unknown).
Brilliantly the below video explains why only variance is n biased and why n-1 : https://www.youtube.com/watch?v=bVB4X5CUWTg

Example:

Sample of scores: {75,82,90,68,88}
First calculate the raw first moment - simple mean : (75+82+90+68+88)/5 = 80.6
Second, find the variance by measuring the average squared deviation : [(75−80.6)^2+(82−80.6)^2+(90−80.6)^2+(68−80.6)^2+(88−80.6)^2] / 5 = 67.04
In this case, s2 is 67.04, indicating a spread of data with respect to the mean.

Interpret:

Variance involves squaring the deviations from the mean.
Variance tends to be more sensitive to extreme values because squaring amplifies the impact of larger deviations. Here this gives more weightage to larger deviations.
Variance is measured in squared units of the original data.

Standard Deviation

As variance is the square of standard deviation, standard deviation is square root of variance. Basically standard deviation says how the values vary from the mean, on average.
Example:

Sample of scores: {75,82,90,68,88}
Variance: 67.04
Std: 8.19

Standard deviation is denoted by 𝜎, this is greek .
A standard deviation close to zero indicates that data points are very close to the mean, whereas a larger standard deviation indicates data points are spread further away from the mean.
The standard deviation is 0 if all the values are same.

Standard scores or z-score

Standard scores give you a way of comparing values across different sets of data where the mean and standard deviation differ. They're a way of comparing related data values in different circumstances, as if they came from the same set of data or distribution.
Standard scores work by transforming sets of data into a new, theoretical distribution with a mean of 0 and a standard deviation of 1. It's a generic distribution that can be used for comparisons. Standard scores effectively transform data so that it fits the below graph, which explains the given 'x' is how away from the mean.
Positive z-scores mean that the given value is above mean and negative z-scores mean the value is below mean. If the z-score is 0 it means the value is mean itself.
The standard deviation and the mean together can tell you where most of the values in your frequency distribution lie if they follow a normal distribution.

The empirical rule, or the 68-95-99.7 rule, tells you where your values lie:

Around 68% of scores are within 1 standard deviation of the mean,
Around 95% of scores are within 2 standard deviations of the mean,
Around 99.7% of scores are within 3 standard deviations of the mean.

Sometimes, outliers are defined as being more than 3 standard deviations of the mean.
Formula: z = ( x - mean ) / 𝜎
Standard Score = Number of standard deviations from the mean

Credits: https://algebra2.thinkport.org/module3/images/xyz-page6-graph1.jpg

Example:

Imagine you have two class of students, and you want to understand how tall or short a particular student of each class is compared to the entire class. You collect data on the heights of all the students from class I and find that the average height is 160 cm with a standard deviation of 10 cm. And collect data on the heights of all the students from class II and find that the average height is 180 cm with a standard deviation of 20 cm.
Hypothetically, each school can send one kid to represent school for class wise height competition.
The formula for calculating the Z-score is:

is the individual's height (Alex's height in this case)
is the mean (average) height of the class
is the standard deviation of the class heights

Now in Class I, let's say there's a student named Alex, and you want to calculate Alex's Z-score to see how his height compares to the class average.

Alex's height = 170 cm
Class average height = 160 cm
Standard deviation of class heights = 10 cm
Now, plug in the values:

So, Alex's Z-score is 1.
This means that Alex's height is 1 standard deviation above the class average.
In other words, Alex is taller than about 68.26% of the students in the class since the data falls one standard deviation above the mean.

In Class II, let's say there's a student named Bob, and you want to calculate Bob's Z-score to see how his height compares to the class average.

Bob's height = 220 cm
Class average height = 180 cm
Standard deviation of class heights = 20 cm
Now, plug in the values:

So, Bob's Z-score is 2.
This means that Bob's height is 2 standard deviation above the class average.
In other words, Bob is taller than about 95.44% of the students in the class since the data falls two standard deviation above the mean.

Though Class A and B had different mean and standard deviation. Alex and Bob can be still compared, with the above analysis Alex is more close to mean and Bob is taller than most of the students in the class. Bob can be represented for class II from the school as it looks he is taller than most of the students in his class II.

Covariance

Covariance is used to determine the relationship between the movements of two random variables.
If the covariance is positive, it suggests that when one variable increases, the other tends to increase as well, and vice versa.
Formula:

cov(X, Y)=N−1∑i=15(xi−Xˉ)(yi−Yˉ)

Example:

Dataset X: [2,4,6,8,10]
Dataset Y: [1,3,5,7,9]
Calculate the means: $\overset{ˉ}{�} = \frac{2 + 4 + 6 + 8 + 10}{5} = 6$ $\overset{ˉ}{�} = \frac{1 + 3 + 5 + 7 + 9}{5} = 5$
Calculate the covariance: $cov(X, Y) = \frac{(2 - 6) (1 - 5) + (4 - 6) (3 - 5) + (6 - 6) (5 - 5) + (8 - 6) (7 - 5) + (10 - 6) (9 - 5)}{5 - 1}$ $= \frac{20 + 4 + 0 + 4 + 16}{4} = \frac{44}{4} = 11$
So, cov(X, Y) is 11.

$A positive covariance (e.g., 11 in this case) indicates that as values of X$ increase, values of Y tend to increase as well, and vice versa.
However, the magnitude of covariance is not standardized.
So it doesn't provide a clear measure of the strength of the relationship.

Correlation

Correlation is a standardized measure of the linear relationship between two variables.
The value ranges from -1 to 1, with 1 indicating a perfect positive linear relationship, 0 indicating no linear relationship, and -1 indicating a perfect negative linear relationship.
A coefficient that is close to zero indicates that there is only a weak relationship between the two variables.
Formula:

corr(X, Y)=σX⋅σYcov(X, Y)

Example:

Calculate the standard deviations: $�_{�} = \sqrt{\frac{\sum_{� = 1}^{5} (�_{�} - \overset{ˉ}{�})^{2}}{� - 1}} = \sqrt{\frac{2 + 0 + 4 + 4 + 4}{4}} = \sqrt{\frac{14}{4}} = \sqrt{3.5}$ $�_{�} = \sqrt{\frac{\sum_{� = 1}^{5} (�_{�} - \overset{ˉ}{�})^{2}}{� - 1}} = \sqrt{\frac{16 + 4 + 0 + 4 + 16}{4}} = \sqrt{\frac{40}{4}} = \sqrt{10}$
Calculate the correlation: $corr(X, Y) = \frac{11}{\sqrt{3.5} \cdot \sqrt{10}}$
So, corr(X, Y) is approximately 0.785.

A correlation of approximately 0.785 suggests a strong positive linear relationship between X and Y.
Covariance is also distinct from correlation, another statistical metric often used to measure the relationship between two variables. While covariance measures the direction of a relationship between two variables, correlation measures the strength of that relationship.

Skewness

Karl Pearson introduced the use of the third moment about the mean in calculating skewness.
Skewness is a measure of distribution's symmetry. We standardize it by dividing it by the standard deviation cubed.
Formula for population:

skewness = ∑Ni (Xi – μ)3 / (N) * σ3

Formula for sample:

skewness = ∑Ni (Xi−Xˉ)3 / (N-1) * σ3

Because of both subtract the mean and divide by the standard deviation cubed, skewness is unaffected by differences in the mean or in the variance of the random variable. This allows to compare skewness of two different distributions directly.
Why power 3?

-ve^3 = -ve large value
+ve^3 = +ve large value
It many values are in the +ve side the sum should not take away the +ve and -ve, having cube works well here.

Types of Skewness

Positive Skewness

The extreme data values are higher in a positive skew distribution, which increases the mean value of the data set. To put it another way, a positive skew distribution has the tail on the right side.
It means that, Mean > Median > Mode in positive skewness

Negative Skewness

The extreme data values are smaller in negative skewness, which lowers the dataset’s mean value. A negative skew distribution is one with the tail on the left side.
Hence, in negative Skewness, Mean < Median < Mode.

Zero Skewness

A distribution with skew = 0 is perfectly symmetric.

Visualization

Credits: https://www.biologyforlife.com/uploads/2/2/3/9/22392738/c101b0da6ea1a0dab31f80d9963b0368_orig.png

Example: Find the skewness for the given Data ( 2,4,6,6)

Mean of Data = (2 + 4 + 6 + 6) / 4

   = 18 / 4

   = 4.5

Number of terms (n) = 4 (even)

Median of Data = {[n / 2]th + [n / 2 + 1]th}/2 term

= [(4 /2)th term + (4/2 +1)th term] / 2

= [2nd term + 3rd term] / 2

= [4+6]/2

= 10/2

Median of Data = 5

Mode of Data = Highest Frequency term = 6 (frequency 2)

Standard Deviation = √[(2 – 5)2 + (4-5)2 + (6-5)2 + (6-5)2/4]

   = √[(9 + 1 + 1 + 1)/4]

   = √(3)

   = 1.732

Skewness = 3(Mean – Median)/S.D.

By Applying Skewness Formula,

Skewness = 3(4.5 – 5)/1.732

= 3(-0.5)/ 1.732

Skewness = – 0.866

So, the skewness of these data is negative.

Coskewness

Coskewness, in statistics, measures how much two random variables change together.
If they exhibit positive coskewness, they will tend to undergo positive deviations at the same time.
But if they exhibit negative coskewness, they will tend to undergo negative deviations at the same time.

Kurtosis

The fourth moment about the mean in the calculation of kurtosis.
Kurtosis is a measure of the shape of a distribution, in particular the total probability in the tails of the distribution relative to the probability in the rest of the distribution.
In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.
The higher the kurtosis, the greater the probability in the tails of the distribution., and sometimes as well referred as fat-tailed/heavy-tailed distributions.

Formula for population:

kurtosis = ∑Ni (Xi – μ)4 / (N) * σ4

Formula for sample:

kurtosis = ∑Ni (Xi−Xˉ)4 / (N-1) * σ4

Specifically, the kurtosis of a random variable is commonly benchmarked against that of a normally distributed random variable, which is 3.
Random variables with kurtosis greater than 3, are described as being heavy-tailed/fat-tailed.
Like skewness, kurtosis is naturally unit-free and can be directly compared across random variables with different means and variances.
Types

Distributions with medium kurtosis (medium tails) are mesokurtic. Kurtosis = 3.0
Distributions with low kurtosis (thin tails) are platykurtic. Kurtosis < 3.0
Distributions with high kurtosis (fat tails) are leptokurtic. Kurtosis > 3.0

Visualization

Credits: https://keytodatascience.com/wp-content/uploads/2021/11/Kurtosis1.jpg

Simple Example:

Dataset A: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
Mean (μ) = (1 + 2 + 2 + 3 + 3 + 3 + 4 + 4 + 4 + 4) / 10 = 30 / 10 = 3
Variance (σ²) = [(1-3)² + (2-3)² + (2-3)² + (3-3)² + (3-3)² + (3-3)² + (4-3)² + (4-3)² + (4-3)² + (4-3)²] / 10 = (4 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 1) / 10 = 10 / 10 = 1
Kurtosis = [Σ(xi - μ)⁴ / N] / [σ⁴] - 3 = [(1-3)⁴ + (2-3)⁴ + (2-3)⁴ + (3-3)⁴ + (3-3)⁴ + (3-3)⁴ + (4-3)⁴ + (4-3)⁴ + (4-3)⁴ + (4-3)⁴] / 10 - 3 = (16 + 1 + 1 + 0 + 0 + 0 + 1 + 1 + 1 + 1) / 10 - 3 = 22 / 10 - 3 = 2.2 - 3 = -0.8
The kurtosis for Dataset A is -0.8
Typical interpretation of kurtosis involves subtracting 3 from the computed kurtosis value.
Since the kurtosis is negative, Mesokurtic. A distribution with kurtosis approximately equal to 0. This suggests that the tails of the distribution are similar to that of a normal distribution. So, Dataset A is mesokurtic.

Cokurtosis

Cokurtosis measures the degree of "tailedness" or the presence of outliers in the joint distribution of variables.
A positive cokurtosis suggests that the four variables have heavier tails than would be expected in a normal distribution.

Conclusion

Though this might look simple, but it would have really taken so much research to derive all these concepts. Big shout to all statisticians who have brilliantly designed each of the concepts. To conclude lets revise all the formulas mentioned above, below:

P(X=x) ----> PMF

f(x) = P(a < x < b) =dxdF(x) ----> PDF

F(x)=P(X≤x) ----> CDF

F(x)=∑t≤xP(X=t) ----> CDF Discrete

F(x)=∫−∞xf(t)dt ----> CDF Continuous

E(X)=μ=∑xP(x) ----> E[x] Discrete

E[𝑋]=∫−∞∞𝑥⋅𝑓(𝑥)𝑑𝑥. ----> E[x] Continuous

μ1=n1∑i=1nXi ----> Mean

μ1′=n1∑i=1n(Xi−Xˉ). ----> Central First Moment after adjusting with Mean

σ2=N1∑i=1N(Xi−μ)2 ----> Variance population

s2=n−11∑i=1n(Xi−Xˉ)2 ----> Variance sample

σ = √σ2 ----> Standard Deviation population
s= √s^2 ----> Standard Deviation of variance

z = ( x - mean ) /𝜎 ----> Z Score
Quantiles

Pr(𝑋≤𝑥)=𝐹(𝑥)
𝐹−1(𝑝)= x

∑Ni (Xi – μ)3 / (N) * σ3 ------> Skewness population

∑Ni (Xi−Xˉ)3 / (N-1) * σ3 -----> Skewness sample

∑Ni (Xi – μ)4 / (N) * σ4 -----> Kurtosis population

∑Ni (Xi−Xˉ)4 / (N-1) * σ4 -----> Kurtosis sample

𝐹−1(95%)= 1.645 ==> Inverse CDF (Excel Formula: NORMINV(x, mean, std))
𝐹−1(97.5%)= 1.96 ==> Inverse CDF (Excel Eg: =NORMINV(0.975,0,1))
𝐹−1(99%)= 2.326 ==> Inverse CDF

cov(X, Y)=N−1∑i=15(xi−Xˉ)(yi−Yˉ) ----> Covariance

corr(X, Y)=σX⋅σYcov(X, Y) ----> Correlation

Credits and References:

https://chat.openai.com/ - Big help with examples while understanding these concepts

https://www.scribbr.com/statistics/standard-deviation/#:~:text=Around%2068%25%20of%20values%20are,standard%20deviations%20of%20the%20mean.

https://www.statlect.com/glossary/absolutely-continuous-random-variable

https://www.geeksforgeeks.org/skewness-formula/

https://www.investopedia.com/*

HelloWorldEngineer

Thursday, 8 June 2023