Skip to contents

One purpose of distplyr is to handle the menial distribution-related calculations for you. Just specify a distribution once, and there is no need to manage its components anymore.

Example: want to compute the variance of a Uniform(-1, 1) distribution, get the 0.25- and 0.75-quantiles, and generate a sample of size 10?

Without distplyr:

a <- -1
b <- 1
# Look up formula for variance:
(b - a) ^ 2 / 12
#> [1] 0.3333333
# Get quantiles:
qunif(c(0.25, 0.75), min = a, max = b)
#> [1] -0.5  0.5
# Get sample of size 10:
runif(10, min = a, max = b)
#>  [1]  0.86476621 -0.78168169 -0.45430932 -0.32927048  0.15404244 -0.37334487
#>  [7] -0.04029762 -0.86620646  0.07053946 -0.05104954

With distplyr:

d <- dst_unif(-1, 1)
variance(d)
#> [1] 0.3333333
eval_quantile(d, at = c(0.25, 0.75))
#> [1] -0.5  0.5
realise(d, 10)
#>  [1] -0.37342530 -0.75218917 -0.77445849 -0.97073452  0.03683637  0.43664341
#>  [7]  0.81486884 -0.72929486  0.62879755 -0.58206294

Functional Representations of a Distribution

A distribution can be represented by different functions, such as a density function, a cumulative distribution function, and others. In distplyr, you can:

  • evaluate the function directly by calling eval_*;
  • evaluate the function and enframe results together with the function arguments with enframe_*; or
  • get the function itself using get_*.

Here are the representations and the corresponding distplyr functions:

Quantity distplyr Functions
Cumulative Distribution Function eval_cdf(), get_cdf(), enframe_cdf()
Survival Function eval_survival(), get_survival(), enframe_survival()
Quantile Function eval_quantile(), get_quantile(), enframe_quantile()
Hazard Function eval_hazard(), get_hazard(), enframe_hazard()
Cumulative Hazard Function eval_chf(), get_chf(), enframe_chf()
Probability density function eval_density(), get_density(), enframe_density()
Probability mass function eval_pmf(), get_pmf(), enframe_pmf()

These functions all take a distribution object as their first argument, and eval_* and enframe_* have a second argument named at indicating where to evaluate the function. The at argument is vectorized.

Here is an example of evaluating the hazard function and the random sample generator of a Uniform(-1,1) distribution, and enframing the density:

eval_hazard(d, at = 0:10)
#>  [1]   1 Inf NaN NaN NaN NaN NaN NaN NaN NaN NaN
enframe_density(d, at = 0:10)
#> # A tibble: 11 × 2
#>     .arg density
#>    <int>   <dbl>
#>  1     0     0.5
#>  2     1     0.5
#>  3     2     0  
#>  4     3     0  
#>  5     4     0  
#>  6     5     0  
#>  7     6     0  
#>  8     7     0  
#>  9     8     0  
#> 10     9     0  
#> 11    10     0
set.seed(10)

enframe() works particularly well with tibbles and tidyr::unnest():

# half_marathon <- tribble(
#   ~ person, ~ race_time_min,
#   "Vincenzo", dst_norm(130, 25),
#   "Colleen", dst_norm(110, 13),
#   "Regina", dst_norm(115, 20)
# ) 
# half_marathon %>% 
#   mutate(quartiles = map(race_time_min, enframe_quantile, at = 1:3 / 4)) %>% 
#   unnest(quartiles)

Drawing a random sample

To draw a random sample from a distribution, use the realise() or realize() function:

realise(d, n = 5)
#> [1]  0.01495641 -0.38646299 -0.14618467  0.38620416 -0.82972806

You can read this call as “realise distribution d five times”. By default, n is set to 1, so that realizing a distribution converts it to a numeric draw:

realise(d)
#> [1] -0.5491268

This default is especially useful when working with distributions in a tibble:

# half_marathon %>% 
#   mutate(actual_time_min = map_dbl(race_time_min, realise))

Perhaps surprisingly, distplyr does not consider realise() as a functional representation of a distribution, even though random sampling falls into the same family as the stats::p*/d*/q*/r* functions. This is because it’s impossible to perfectly describe a distribution based on a sample.

Properties of Distributions

Distributions have various numeric properties. Common examples are the mean and variance, but there are many others as well.

Below is a table of the properties incorporated in distplyr:

Property distplyr Function
Mean mean()
Median median()
Variance variance()
Standard Deviation sd()
Skewness skewness()
Excess Kurtosis kurtosis_exc()
Kurtosis kurtosis_raw()
Extreme Value (Tail) Index evi()

Here are some properties of our original Uniform(-1, 1) distribution:

mean(d)
#> [1] 0
stdev(d)
#> [1] 0.5773503
evi(d)
#> [1] -1