This vignette covers the second goal of distionary: to
evaluate probability distributions, even when that property is not
specified in the distribution’s definition.
Distributional Representations
A distributional representation is a function that fully
describes the distribution, such that any property can be calculated
from it. Here is a list of representations recognised by
distionary, and the functions for accessing them.
| Representation |
distionary Functions |
|---|---|
| Cumulative Distribution Function |
eval_cdf(), enframe_cdf()
|
| Survival Function |
eval_survival(), enframe_survival()
|
| Quantile Function |
eval_quantile(), enframe_quantile()
|
| Hazard Function |
eval_hazard(), enframe_hazard()
|
| Cumulative Hazard Function |
eval_chf(), enframe_chf()
|
| Probability density Function |
eval_density(), enframe_density()
|
| Probability mass Function (PMF) |
eval_pmf(), enframe_pmf()
|
| Odds Function |
eval_odds(), enframe_odds()
|
| Return Level Function |
eval_return(), enframe_return()
|
All representations can either be accessed by the
eval_*() family of functions, providing a vector of the
evaluated representation.
d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144Alternatively, the enframe_*() family of functions
provides the results in a tibble or data frame paired with the inputs,
useful in a data wrangling workflow.
enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#> .arg pmf
#> <int> <dbl>
#> 1 0 0.6
#> 2 1 0.24
#> 3 2 0.096
#> 4 3 0.0384
#> 5 4 0.0154
#> 6 5 0.00614The enframe_*() functions allow for insertion of
multiple distributions, placing a column for each distribution. The
column names can be changed in three ways:
- The input column
.argcan be renamed with thearg_nameargument. - The
pmfprefix on the evaluation columns can be changed with thefn_prefixargument. - The distribution names can be changed by assigning name-value pairs for the input distributions.
Let’s practice this with the addition of a second distribution.
d2 <- dst_geom(0.4)
enframe_pmf(
model1 = d1, model2 = d2, at = 0:5,
arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#> num_failures probability_model1 probability_model2
#> <int> <dbl> <dbl>
#> 1 0 0.6 0.4
#> 2 1 0.24 0.24
#> 3 2 0.096 0.144
#> 4 3 0.0384 0.0864
#> 5 4 0.0154 0.0518
#> 6 5 0.00614 0.0311Drawing a random sample
To draw a random sample from a distribution, use the
realise() or realize() function:
You can read this call as “realise distribution d five
times”. By default, n is set to 1, so that realising
converts a distribution to a numeric draw:
realise(d1)
#> [1] 0While random sampling falls into the same family as the
p*/d*/q*/r* functions from the stats package
(e.g., rnorm()), this function is not a distributional
representation, hence does not have a eval_*() or
enframe_*() counterpart. This is because it’s impossible to
perfectly describe a distribution based on a sample.
Properties of Distributions
distionary refers to a distribution property as
any value that can be calculated from a distribution, such as the mean
and variance. Whereas a distributional representation must fully define
a distribution, a property need not.
Below is a table of the properties incorporated in
distionary, and the corresponding functions for accessing
them.
| Property |
distionary Function |
|---|---|
| Mean | mean() |
| Median | median() |
| Variance | variance() |
| Standard Deviation | sd() |
| Skewness | skewness() |
| Excess Kurtosis | kurtosis_exc() |
| Kurtosis | kurtosis() |
Here’s the mean and variance of our original distribution.
Some properties are easy to make yourself. Here is an example of a function that calculates interquartile range.
# Make a function that takes a distribution as input, and returns the
# interquartile range.
iqr <- function(distribution) {
diff(eval_quantile(distribution, at = c(0.25, 0.75)))
}Apply the function.
iqr(d2)
#> [1] 2For properties that are not handled by distionary (e.g.,
extreme value index, or moment generating function), one option is to
build these properties into your own distribution. A future version of
distionary will make user-defined properties easier to work
with.
