evaluate.Rmd
library(distionary)
One purpose of distplyr
is to handle the menial distributionrelated calculations for you. Just specify a distribution once, and there is no need to manage its components anymore.
Example: want to compute the variance of a Uniform(1, 1) distribution, get the 0.25 and 0.75quantiles, and generate a sample of size 10?
Without distplyr
:
a < 1
b < 1
# Look up formula for variance:
(b  a) ^ 2 / 12
#> [1] 0.3333333
# Get quantiles:
qunif(c(0.25, 0.75), min = a, max = b)
#> [1] 0.5 0.5
# Get sample of size 10:
runif(10, min = a, max = b)
#> [1] 0.48651330 0.08763024 0.49995474 0.70864176 0.39992650 0.44163056
#> [7] 0.22744847 0.78707611 0.23053899 0.94330538
With distplyr
:
d < dst_unif(1, 1)
variance(d)
#> [1] 0.3333333
eval_quantile(d, at = c(0.25, 0.75))
#> [1] 0.5 0.5
realise(d, 10)
#> [1] 0.04504622 0.19644052 0.96917160 0.17264643 0.38818268 0.55088825
#> [7] 0.07600935 0.90234684 0.99492568 0.81887116
A distribution can be represented by different functions, such as a density function, a cumulative distribution function, and others. In distplyr
, you can:
eval_*
;enframe_*
; orget_*
.Here are the representations and the corresponding distplyr
functions:
Quantity 
distplyr Functions 

Cumulative Distribution Function 
eval_cdf() , get_cdf() , enframe_cdf()

Survival Function 
eval_survival() , get_survival() , enframe_survival()

Quantile Function 
eval_quantile() , get_quantile() , enframe_quantile()

Hazard Function 
eval_hazard() , get_hazard() , enframe_hazard()

Cumulative Hazard Function 
eval_chf() , get_chf() , enframe_chf()

Probability density function 
eval_density() , get_density() , enframe_density()

Probability mass function 
eval_pmf() , get_pmf() , enframe_pmf()

These functions all take a distribution object as their first argument, and eval_*
and enframe_*
have a second argument named at
indicating where to evaluate the function. The at
argument is vectorized.
Here is an example of evaluating the hazard function and the random sample generator of a Uniform(1,1) distribution, and enframing the density:
eval_hazard(d, at = 0:10)
#> [1] 1 Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN
enframe_density(d, at = 0:10)
#> # A tibble: 11 × 2
#> .arg density
#> <int> <dbl>
#> 1 0 0.5
#> 2 1 0.5
#> 3 2 0
#> 4 3 0
#> 5 4 0
#> 6 5 0
#> 7 6 0
#> 8 7 0
#> 9 8 0
#> 10 9 0
#> 11 10 0
set.seed(10)
enframe()
works particularly well with tibbles and tidyr::unnest()
:
# half_marathon < tribble(
# ~ person, ~ race_time_min,
# "Vincenzo", dst_norm(130, 25),
# "Colleen", dst_norm(110, 13),
# "Regina", dst_norm(115, 20)
# )
# half_marathon %>%
# mutate(quartiles = map(race_time_min, enframe_quantile, at = 1:3 / 4)) %>%
# unnest(quartiles)
To draw a random sample from a distribution, use the realise()
or realize()
function:
realise(d, n = 5)
#> [1] 0.01495641 0.38646299 0.14618467 0.38620416 0.82972806
You can read this call as “realise distribution d
five times”. By default, n
is set to 1, so that realizing a distribution converts it to a numeric draw:
realise(d)
#> [1] 0.5491268
This default is especially useful when working with distributions in a tibble:
# half_marathon %>%
# mutate(actual_time_min = map_dbl(race_time_min, realise))
Perhaps surprisingly, distplyr does not consider realise()
as a functional representation of a distribution, even though random sampling falls into the same family as the stats::p*/d*/q*/r*
functions. This is because it’s impossible to perfectly describe a distribution based on a sample.
Distributions have various numeric properties. Common examples are the mean and variance, but there are many others as well.
Below is a table of the properties incorporated in distplyr
:
Property 
distplyr Function 

Mean  mean() 
Median  median() 
Variance  variance() 
Standard Deviation  sd() 
Skewness  skewness() 
Excess Kurtosis  kurtosis_exc() 
Kurtosis  kurtosis_raw() 
Extreme Value (Tail) Index  evi() 
Here are some properties of our original Uniform(1, 1) distribution: