Skip to contents

This vignette covers the second goal of distionary: to evaluate probability distributions, even when that property is not specified in the distribution’s definition.

Distributional Representations

A distributional representation is a function that fully describes the distribution, such that any property can be calculated from it. Here is a list of representations recognised by distionary, and the functions for accessing them.

Representation distionary Functions
Cumulative Distribution Function eval_cdf(), enframe_cdf()
Survival Function eval_survival(), enframe_survival()
Quantile Function eval_quantile(), enframe_quantile()
Hazard Function eval_hazard(), enframe_hazard()
Cumulative Hazard Function eval_chf(), enframe_chf()
Probability density Function eval_density(), enframe_density()
Probability mass Function (PMF) eval_pmf(), enframe_pmf()
Odds Function eval_odds(), enframe_odds()
Return Level Function eval_return(), enframe_return()

All representations can either be accessed by the eval_*() family of functions, providing a vector of the evaluated representation.

d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144

Alternatively, the enframe_*() family of functions provides the results in a tibble or data frame paired with the inputs, useful in a data wrangling workflow.

enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#>    .arg     pmf
#>   <int>   <dbl>
#> 1     0 0.6    
#> 2     1 0.24   
#> 3     2 0.096  
#> 4     3 0.0384 
#> 5     4 0.0154 
#> 6     5 0.00614

The enframe_*() functions allow for insertion of multiple distributions, placing a column for each distribution. The column names can be changed in three ways:

  1. The input column .arg can be renamed with the arg_name argument.
  2. The pmf prefix on the evaluation columns can be changed with the fn_prefix argument.
  3. The distribution names can be changed by assigning name-value pairs for the input distributions.

Let’s practice this with the addition of a second distribution.

d2 <- dst_geom(0.4)
enframe_pmf(
  model1 = d1, model2 = d2, at = 0:5,
  arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#>   num_failures probability_model1 probability_model2
#>          <int>              <dbl>              <dbl>
#> 1            0            0.6                 0.4   
#> 2            1            0.24                0.24  
#> 3            2            0.096               0.144 
#> 4            3            0.0384              0.0864
#> 5            4            0.0154              0.0518
#> 6            5            0.00614             0.0311

Drawing a random sample

To draw a random sample from a distribution, use the realise() or realize() function:

set.seed(42)
realise(d1, n = 5)
#> [1] 0 0 0 0 0

You can read this call as “realise distribution d five times”. By default, n is set to 1, so that realising converts a distribution to a numeric draw:

realise(d1)
#> [1] 0

While random sampling falls into the same family as the p*/d*/q*/r* functions from the stats package (e.g., rnorm()), this function is not a distributional representation, hence does not have a eval_*() or enframe_*() counterpart. This is because it’s impossible to perfectly describe a distribution based on a sample.

Properties of Distributions

distionary refers to a distribution property as any value that can be calculated from a distribution, such as the mean and variance. Whereas a distributional representation must fully define a distribution, a property need not.

Below is a table of the properties incorporated in distionary, and the corresponding functions for accessing them.

Property distionary Function
Mean mean()
Median median()
Variance variance()
Standard Deviation sd()
Skewness skewness()
Excess Kurtosis kurtosis_exc()
Kurtosis kurtosis()

Here’s the mean and variance of our original distribution.

mean(d1)
#> [1] 0.6666667
variance(d1)
#> [1] 1.111111

Some properties are easy to make yourself. Here is an example of a function that calculates interquartile range.

# Make a function that takes a distribution as input, and returns the
# interquartile range.
iqr <- function(distribution) {
  diff(eval_quantile(distribution, at = c(0.25, 0.75)))
}

Apply the function.

iqr(d2)
#> [1] 2

For properties that are not handled by distionary (e.g., extreme value index, or moment generating function), one option is to build these properties into your own distribution. A future version of distionary will make user-defined properties easier to work with.