Skip to contents

This vignette covers the first goal of distionary: to provide a framework for creating a distribution, either one of the built-in distributions, or your own.

Built-In Distribution Families

All distribution families found in the stats package are made available with distionary, along with a few others. All families are shown in the following table.

Distribution distionary Function Has counterpart in stats
Bernoulli dst_bern() Yes
Beta dst_beta() Yes
Binomial dst_binom() Yes
Cauchy dst_cauchy() Yes
Chi Squared dst_chisq() Yes
Degenerate dst_degenerate() No
Exponential dst_exp() Yes
F dst_f() Yes
Gamma dst_gamma() Yes
Geometric dst_geom() Yes
Generalised Extreme Value (GEV) dst_gev() No
Generalised Pareto (GPD) dst_gpd() No
Hypergeometric dst_hyper() Yes
Log Normal dst_lnorm() Yes
Log Pearson Type III dst_lp3() No
Negative Binomial dst_nbinom() Yes
Normal dst_norm() Yes
Pearson Type III dst_pearson3() No
Poisson dst_pois() Yes
Student t dst_t() Yes
Uniform dst_unif() Yes
Weibull dst_weibull() Yes

In addition, there is a special “Null” distribution object akin to a missing or unknown distribution. This is useful, for example, if an algorithm fails to return a distribution: instead of throwing an error, a Null distribution can be returned.

# Make a Null distribution.
null <- dst_null()
# Inspect
null
#> Null distribution

A Null distribution always evaluates to NA:

mean(null)
#> [1] NA
eval_pmf(null, at = 1:4)
#> [1] NA NA NA NA

Distributions with a counterpart in the stats package use the same functions from the package and the same parameter names. For instance, take a look at the source code defining the Normal distribution to see that the representations wrap the stats::*norm() family of functions.

dst_norm
#> function (mean, sd) 
#> {
#>     if (is.na(mean) || is.na(sd)) {
#>         return(dst_null())
#>     }
#>     if (length(mean) != 1 || length(sd) != 1) {
#>         stop("Input parameters must have length 1.")
#>     }
#>     if (sd == 0) {
#>         return(dst_degenerate(mean))
#>     }
#>     if (sd < 0) {
#>         stop("'sd' parameter must be non-negative.")
#>     }
#>     distribution(parameters = list(mean = mean, sd = sd), density = function(x) stats::dnorm(x, 
#>         mean = mean, sd = sd), cdf = function(x) stats::pnorm(x, 
#>         mean = mean, sd = sd), quantile = function(p) stats::qnorm(p, 
#>         mean = mean, sd = sd), realise = function(n) stats::rnorm(n, 
#>         mean = mean, sd = sd), survival = function(x) stats::pnorm(x, 
#>         mean = mean, sd = sd, lower.tail = FALSE), mean = mean, 
#>         median = mean, variance = sd^2, stdev = sd, skewness = 0, 
#>         kurtosis_exc = 0, range = c(-Inf, Inf), .name = "Normal", 
#>         .vtype = "continuous")
#> }
#> <bytecode: 0x56257eb4e6e8>
#> <environment: namespace:distionary>

User-Defined Distributions

You can make your own distribution using the distribution() function. Provide it with name-value pairs that will be stored with the object in a list. Some names have a special interpretation in distionary in that they may be leveraged to compute other properties that aren’t specified in distribution(). These special names are:

  • Any distributional representation * invoked by eval_*() (e.g., quantile for eval_quantile()).
  • Any property (e.g., mean for mean())

See the [Evaluate a Distribution] vignette for more details on these evaluation functions.

For this version of distionary, it is mandatory to specify at least cdf in order for non-specified properties to be evaluated. For continuous or discrete distributions, density or pmf may also be needed.

Optionally, you can also specify an entry for parameters, which should be a named list of parameters that define the distribution. These parameters are never used for distribution calculations, but are sometimes useful to keep track of for analysis. (A future version of distionary will make heavier use of these.)

Here is an example distribution.

# Make a continuous distribution
linear <- distribution(
  parameters = list(a = 1),
  density = function(x) {
    d <- 2 * (1 - x)
    d[x < 0 | x > 1] <- 0
    d
  },
  cdf = function(x) {
    p <- 2 * x * (1 - x / 2)
    p[x < 0] <- 0
    p[x > 1] <- 1
    p
  },
  g = 9.81,
  another_representation = function(x) x^2,
  .vtype = "continuous",
  .name = "My Linear"
)
# Inspect
linear
#> My Linear distribution (continuous)
#> --Parameters--
#> a 
#> 1

The usual evaluation framework can now be accessed. For example, the CDF and mean can be evaluated, even though the mean has not been specified.

eval_cdf(linear, at = c(0.2, 0.5, 0.7))
#> [1] 0.36 0.75 0.91
mean(linear)
#> [1] 0.3333333

These quantities can be invoked by the more general function eval_property().

eval_property(linear, "cdf", c(0.2, 0.5, 0.7))
#> [1] 0.36 0.75 0.91
eval_property(linear, "mean")
#> [1] 0.3333333

eval_property() is a useful function for accessing quantities that are not built-in to distionary, like g and another_representation in this example.

eval_property(linear, "another_representation", 1:4)
#> [1]  1  4  9 16
eval_property(linear, "g")
#> [1] 9.81