This vignette covers the second goal of distionary
: to
evaluate probability distributions, even when that property is not
specified in the distribution’s definition.
Distributional Representations
A distributional representation is a function that fully
describes the distribution, such that any property can be calculated
from it. Here is a list of representations recognised by
distionary
, and the functions for accessing them.
Representation |
distionary Functions |
---|---|
Cumulative Distribution Function |
eval_cdf() , enframe_cdf()
|
Survival Function |
eval_survival() , enframe_survival()
|
Quantile Function |
eval_quantile() , enframe_quantile()
|
Hazard Function |
eval_hazard() , enframe_hazard()
|
Cumulative Hazard Function |
eval_chf() , enframe_chf()
|
Probability density Function |
eval_density() , enframe_density()
|
Probability mass Function (PMF) |
eval_pmf() , enframe_pmf()
|
Odds Function |
eval_odds() , enframe_odds()
|
Return Level Function |
eval_return() , enframe_return()
|
All representations can either be accessed by the
eval_*()
family of functions, providing a vector of the
evaluated representation.
d1 <- dst_geom(0.6)
eval_pmf(d1, at = 0:5)
#> [1] 0.600000 0.240000 0.096000 0.038400 0.015360 0.006144
Alternatively, the enframe_*()
family of functions
provides the results in a tibble or data frame paired with the inputs,
useful in a data wrangling workflow.
enframe_pmf(d1, at = 0:5)
#> # A tibble: 6 × 2
#> .arg pmf
#> <int> <dbl>
#> 1 0 0.6
#> 2 1 0.24
#> 3 2 0.096
#> 4 3 0.0384
#> 5 4 0.0154
#> 6 5 0.00614
The enframe_*()
functions allow for insertion of
multiple distributions, placing a column for each distribution. The
column names can be changed in three ways:
- The input column
.arg
can be renamed with thearg_name
argument. - The
pmf
prefix on the evaluation columns can be changed with thefn_prefix
argument. - The distribution names can be changed by assigning name-value pairs for the input distributions.
Let’s practice this with the addition of a second distribution.
d2 <- dst_geom(0.4)
enframe_pmf(
model1 = d1, model2 = d2, at = 0:5,
arg_name = "num_failures", fn_prefix = "probability"
)
#> # A tibble: 6 × 3
#> num_failures probability_model1 probability_model2
#> <int> <dbl> <dbl>
#> 1 0 0.6 0.4
#> 2 1 0.24 0.24
#> 3 2 0.096 0.144
#> 4 3 0.0384 0.0864
#> 5 4 0.0154 0.0518
#> 6 5 0.00614 0.0311
Drawing a random sample
To draw a random sample from a distribution, use the
realise()
or realize()
function:
You can read this call as “realise distribution d
five
times”. By default, n
is set to 1, so that realising
converts a distribution to a numeric draw:
realise(d1)
#> [1] 0
While random sampling falls into the same family as the
p*/d*/q*/r*
functions from the stats
package
(e.g., rnorm()
), this function is not a distributional
representation, hence does not have a eval_*()
or
enframe_*()
counterpart. This is because it’s impossible to
perfectly describe a distribution based on a sample.
Properties of Distributions
distionary
refers to a distribution property as
any value that can be calculated from a distribution, such as the mean
and variance. Whereas a distributional representation must fully define
a distribution, a property need not.
Below is a table of the properties incorporated in
distionary
, and the corresponding functions for accessing
them.
Property |
distionary Function |
---|---|
Mean | mean() |
Median | median() |
Variance | variance() |
Standard Deviation | sd() |
Skewness | skewness() |
Excess Kurtosis | kurtosis_exc() |
Kurtosis | kurtosis() |
Here’s the mean and variance of our original distribution.
Some properties are easy to make yourself. Here is an example of a function that calculates interquartile range.
# Make a function that takes a distribution as input, and returns the
# interquartile range.
iqr <- function(distribution) {
diff(eval_quantile(distribution, at = c(0.25, 0.75)))
}
Apply the function.
iqr(d2)
#> [1] 2
For properties that are not handled by distionary
(e.g.,
extreme value index, or moment generating function), one option is to
build these properties into your own distribution. A future version of
distionary
will make user-defined properties easier to work
with.