Tracks
/
Julia
Julia
/
Syllabus
/
Statistics
St

Statistics in Julia

0 exercises

About Statistics

"Statistics" is a large and complex subject with many aspects.

Some aspects were touched on in the Randomness concept, which is a separate but related subject.

At its simplest, statistics can involve ways to summarize an iterable collection of data in a single value. Newcomers to Julia may be surprised at how few functions are built in to the language by default:

  • sum, to add up numbers.
  • min and max, to get the extreme values of the given parameters.
  • extrema, to get the (min, max) tuple of an iterable.
  • length, to count all values.
  • count, to count only values meeting some criterion.
julia> v = collect(1:0.2:3)
11-element Vector{Float64}:
 1.0
 1.2
 ... # display truncated
 3.0

julia> mean(v)
2.0

julia> extrema(v)
(1.0, 3.0)

julia> length(v)
11

julia> min('g', 'a', 'c')
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> count(isodd, 1:5)  # 1, 3, and 5 are odd
3

This limited set of functions is a deliberate decision to minimize bloat in base Julia by moving other functions out to a cascade of other modules and packages.

The Statistics module

As part of the standard library, Statistics is likely to be pre-installed, so just needs using Statistics added to the top of the program to bring it into the namespace.

This module contains the next tier of common functions, which are likely to be widely used by a subset of programmers.

Many of the functions in Statistics assume some background knowledge of the subject. The simplest examples include:

  • mean, which many people would call the average (same result as sum / length).
  • median, the middle value after sorting.
  • std, standard deviation: a measure of how widely spread the values are.
  • var, variance: the square of std, which is generally quicker to calculate.

Each function tends to have various options, and there are many more functions, so check the documentation if interested.

Other packages

As you might expect, statistics is a big use-case for Julia, and there many available packages to support more specialized work.

The JuliaStats group helpfully maintain an online list.

Edit via GitHub The link opens in a new window or tab