Evaluating distributions
evaluate.Rmd
One purpose of distplyr
is to handle the menial
distribution-related calculations for you. Just specify a distribution
once, and there is no need to manage its components anymore.
Example: want to compute the variance of a Uniform(-1, 1) distribution, get the 0.25- and 0.75-quantiles, and generate a sample of size 10?
Without distplyr
:
a <- -1
b <- 1
# Look up formula for variance:
(b - a) ^ 2 / 12
#> [1] 0.3333333
# Get quantiles:
qunif(c(0.25, 0.75), min = a, max = b)
#> [1] -0.5 0.5
# Get sample of size 10:
runif(10, min = a, max = b)
#> [1] -0.838499725 0.668666075 0.201521772 -0.685583117 -0.985201118
#> [6] -0.067213005 -0.004445223 -0.420465511 0.465763974 0.545043022
With distplyr
:
d <- dst_unif(-1, 1)
variance(d)
#> [1] 0.3333333
eval_quantile(d, at = c(0.25, 0.75))
#> [1] -0.5 0.5
realise(d, 10)
#> [1] 0.7492013 -0.6501187 -0.9315173 -0.3592285 -0.1953435 -0.6086603
#> [7] -0.1929238 -0.8726771 -0.2225974 0.9510957
Functional Representations of a Distribution
A distribution can be represented by different functions, such as a
density function, a cumulative distribution function, and others. In
distplyr
, you can:
- evaluate the function directly by calling
eval_*
; - evaluate the function and enframe results together with the function
arguments with
enframe_*
; or - get the function itself using
get_*
.
Here are the representations and the corresponding
distplyr
functions:
Quantity |
distplyr Functions |
---|---|
Cumulative Distribution Function |
eval_cdf() , get_cdf() ,
enframe_cdf()
|
Survival Function |
eval_survival() , get_survival() ,
enframe_survival()
|
Quantile Function |
eval_quantile() , get_quantile() ,
enframe_quantile()
|
Hazard Function |
eval_hazard() , get_hazard() ,
enframe_hazard()
|
Cumulative Hazard Function |
eval_chf() , get_chf() ,
enframe_chf()
|
Probability density function |
eval_density() , get_density() ,
enframe_density()
|
Probability mass function |
eval_pmf() , get_pmf() ,
enframe_pmf()
|
These functions all take a distribution object as their first
argument, and eval_*
and enframe_*
have a
second argument named at
indicating where to evaluate the
function. The at
argument is vectorized.
Here is an example of evaluating the hazard function and the random sample generator of a Uniform(-1,1) distribution, and enframing the density:
eval_hazard(d, at = 0:10)
#> [1] 1 Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN
enframe_density(d, at = 0:10)
#> # A tibble: 11 × 2
#> .arg density
#> <int> <dbl>
#> 1 0 0.5
#> 2 1 0.5
#> 3 2 0
#> 4 3 0
#> 5 4 0
#> 6 5 0
#> 7 6 0
#> 8 7 0
#> 9 8 0
#> 10 9 0
#> 11 10 0
set.seed(10)
enframe()
works particularly well with tibbles and
tidyr::unnest()
:
# half_marathon <- tribble(
# ~ person, ~ race_time_min,
# "Vincenzo", dst_norm(130, 25),
# "Colleen", dst_norm(110, 13),
# "Regina", dst_norm(115, 20)
# )
# half_marathon %>%
# mutate(quartiles = map(race_time_min, enframe_quantile, at = 1:3 / 4)) %>%
# unnest(quartiles)
Drawing a random sample
To draw a random sample from a distribution, use the
realise()
or realize()
function:
realise(d, n = 5)
#> [1] 0.01495641 -0.38646299 -0.14618467 0.38620416 -0.82972806
You can read this call as “realise distribution d
five
times”. By default, n
is set to 1, so that realizing a
distribution converts it to a numeric draw:
realise(d)
#> [1] -0.5491268
This default is especially useful when working with distributions in a tibble:
# half_marathon %>%
# mutate(actual_time_min = map_dbl(race_time_min, realise))
Perhaps surprisingly, distplyr does not consider
realise()
as a functional representation of a distribution,
even though random sampling falls into the same family as the
stats::p*/d*/q*/r*
functions. This is because it’s
impossible to perfectly describe a distribution based on a sample.
Properties of Distributions
Distributions have various numeric properties. Common examples are the mean and variance, but there are many others as well.
Below is a table of the properties incorporated in
distplyr
:
Property |
distplyr Function |
---|---|
Mean | mean() |
Median | median() |
Variance | variance() |
Standard Deviation | sd() |
Skewness | skewness() |
Excess Kurtosis | kurtosis_exc() |
Kurtosis | kurtosis_raw() |
Extreme Value (Tail) Index | evi() |
Here are some properties of our original Uniform(-1, 1) distribution: