Skip to content

Chapter 1

descriptive statistics = summarize observed data and present it graphically.

Types of data

DataDescription
Nominal DataCategories no ordering or direction
Ordinal Dataordered categories (rankings, order)
Interval DataDifferences between measurements but no true zero
Ratio DataDifferences between measurements, true zero exists

Graphs

Dot Diagram

A line of numbers on which the observations are presented as dots equal observations are stacked.

Histogram

frequency = the count.

  • choose a distribution in intervals: not too many nor too many few observations per interval.
  • count the number of observations in each interval, the frequency or determine the relativefrequency=frequencyn
  • Build a rectangle above each interval and choose as height either the frequency or the relative frequency.

Bar Graph

  • for bar graphs the variable has to be quantitative and discrete.

Measures of Center

  • Mean: arithmetic average x¯=1ni=1nxi
  • Median: the middle observation, the observations are arranged from small to large. if n is even then compute the mean of the middle observations.
  • Mode: the most frequently occurring observation.

Percentiles and quartiles

  • The median m is also the 50th percentile: about 50% of the observations is smaller than 50% and 50% is greater than the median m.
  • The quartiles Q1, m and Q3 are the 25th, 50th and 75th percentiles they split the observations in 4 roughly equal quarters.

Box-Plot

The box plot graphs the 5-number summary of the observations

  • quartiles (Q1, m, Q3)
  • smallest observation.
  • largest observation.

Measures of Variability

  • Range: the range r = largest - smallest observation
  • The inter-quartile range IQR=Q3Q1
  • Variance: sample variance: s2=1n1i=1n(xix¯)2

sample variance != population variance

  • resistant for outliers: median, IQR
  • non outlier resistant: x¯, s, x2

Chebyshev's rule: P(|Xμx|c)var(x)c2

The empirical rule

ony valid for bell shaped histograms

IntervalEmpirical ruleGeneral
x¯s,x¯+s68%0%
x¯2s,x¯+2s95%0%
x¯3s,x¯+3s99.7%89%

The z-scores

For samples with mean x¯ and standard deviations s:

the z-score of an observation x is xx¯s

Interpretation the distance between the value and the mean in standard deviations.

For populations with mean μ and standard deviation σ The z-score of an observation or value x is xμσ

Empirical Rule Applied Backwards

  • 68% of observations [-1, +1] z-score
  • 95% of observations [-2, +2] z-score
  • 99.7% of observations [-3, +3] s-score

Skewness

normal distribution skewness = 0

| Positive Kurtosis | Symetrical Distribution | Negative Skew | |

Kurtosis

normal distribution kurtosis = 3

  • Negative Kurtosis
  • Normal Distribution
  • Positive Kurtosis

Sample Estimators

MeasurePopulation DistributionSample Estimate
Meanμ=E(X)x¯ = \frac{1}{n}\sum xi
Varianceσ2=E(Xμ)2S2=1n1(xix¯)
Standard DeviationσS=S2
Skewnessy1=E(Xμ)3\simga3b1=1/2(xix¯)3((1/2(x1x¯)2)3/2)
Kurtosisy2=E(Xμ)4σ4b2=1/2(xix¯)4(1/2(xix¯)2)2

Normality Check

  • Graphs: on a histogram data looks approximately normal.
  • Numerically:
    • Skewness coefficient: (close to 0)
    • Kurtosis coefficient: (close to 3)
  • Q-Q plot: no systemic deviations from the x = y line.

Exponential Distribution Check

  • Graph: histogram:
    • no negative values
    • peak at 0
    • skew right.
  • Numerically
    • skew (close to 2)
    • kurtosis (close to 6)
  • Q-Q plot: no systemic deviations from the x = y line.