Evaluating distributions
evaluate.Rmd
One purpose of distplyr
is to handle the menial
distributionrelated calculations for you. Just specify a distribution
once, and there is no need to manage its components anymore.
Example: want to compute the variance of a Uniform(1, 1) distribution, get the 0.25 and 0.75quantiles, and generate a sample of size 10?
Without distplyr
:
a < 1
b < 1
# Look up formula for variance:
(b  a) ^ 2 / 12
#> [1] 0.3333333
# Get quantiles:
qunif(c(0.25, 0.75), min = a, max = b)
#> [1] 0.5 0.5
# Get sample of size 10:
runif(10, min = a, max = b)
#> [1] 0.86476621 0.78168169 0.45430932 0.32927048 0.15404244 0.37334487
#> [7] 0.04029762 0.86620646 0.07053946 0.05104954
With distplyr
:
d < dst_unif(1, 1)
variance(d)
#> [1] 0.3333333
eval_quantile(d, at = c(0.25, 0.75))
#> [1] 0.5 0.5
realise(d, 10)
#> [1] 0.37342530 0.75218917 0.77445849 0.97073452 0.03683637 0.43664341
#> [7] 0.81486884 0.72929486 0.62879755 0.58206294
Functional Representations of a Distribution
A distribution can be represented by different functions, such as a
density function, a cumulative distribution function, and others. In
distplyr
, you can:
 evaluate the function directly by calling
eval_*
;  evaluate the function and enframe results together with the function
arguments with
enframe_*
; or  get the function itself using
get_*
.
Here are the representations and the corresponding
distplyr
functions:
Quantity 
distplyr Functions 

Cumulative Distribution Function 
eval_cdf() , get_cdf() ,
enframe_cdf()

Survival Function 
eval_survival() , get_survival() ,
enframe_survival()

Quantile Function 
eval_quantile() , get_quantile() ,
enframe_quantile()

Hazard Function 
eval_hazard() , get_hazard() ,
enframe_hazard()

Cumulative Hazard Function 
eval_chf() , get_chf() ,
enframe_chf()

Probability density function 
eval_density() , get_density() ,
enframe_density()

Probability mass function 
eval_pmf() , get_pmf() ,
enframe_pmf()

These functions all take a distribution object as their first
argument, and eval_*
and enframe_*
have a
second argument named at
indicating where to evaluate the
function. The at
argument is vectorized.
Here is an example of evaluating the hazard function and the random sample generator of a Uniform(1,1) distribution, and enframing the density:
eval_hazard(d, at = 0:10)
#> [1] 1 Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN
enframe_density(d, at = 0:10)
#> # A tibble: 11 × 2
#> .arg density
#> <int> <dbl>
#> 1 0 0.5
#> 2 1 0.5
#> 3 2 0
#> 4 3 0
#> 5 4 0
#> 6 5 0
#> 7 6 0
#> 8 7 0
#> 9 8 0
#> 10 9 0
#> 11 10 0
set.seed(10)
enframe()
works particularly well with tibbles and
tidyr::unnest()
:
# half_marathon < tribble(
# ~ person, ~ race_time_min,
# "Vincenzo", dst_norm(130, 25),
# "Colleen", dst_norm(110, 13),
# "Regina", dst_norm(115, 20)
# )
# half_marathon %>%
# mutate(quartiles = map(race_time_min, enframe_quantile, at = 1:3 / 4)) %>%
# unnest(quartiles)
Drawing a random sample
To draw a random sample from a distribution, use the
realise()
or realize()
function:
realise(d, n = 5)
#> [1] 0.01495641 0.38646299 0.14618467 0.38620416 0.82972806
You can read this call as “realise distribution d
five
times”. By default, n
is set to 1, so that realizing a
distribution converts it to a numeric draw:
realise(d)
#> [1] 0.5491268
This default is especially useful when working with distributions in a tibble:
# half_marathon %>%
# mutate(actual_time_min = map_dbl(race_time_min, realise))
Perhaps surprisingly, distplyr does not consider
realise()
as a functional representation of a distribution,
even though random sampling falls into the same family as the
stats::p*/d*/q*/r*
functions. This is because it’s
impossible to perfectly describe a distribution based on a sample.
Properties of Distributions
Distributions have various numeric properties. Common examples are the mean and variance, but there are many others as well.
Below is a table of the properties incorporated in
distplyr
:
Property 
distplyr Function 

Mean  mean() 
Median  median() 
Variance  variance() 
Standard Deviation  sd() 
Skewness  skewness() 
Excess Kurtosis  kurtosis_exc() 
Kurtosis  kurtosis_raw() 
Extreme Value (Tail) Index  evi() 
Here are some properties of our original Uniform(1, 1) distribution: