Lib/statistics.py (part 3)

Source:

cpython 3.14 @ ab2d84fe1023/Lib/statistics.py

This annotation covers variance and the NormalDist class. See lib_statistics2_detail for mean, median, mode, and harmonic_mean.

Map

Lines	Symbol	Role
1-80	`variance` / `pvariance`	Sample and population variance
81-160	`stdev` / `pstdev`	Standard deviations
161-260	`NormalDist.__init__`	Normal distribution object
261-380	`NormalDist.cdf` / `pdf`	Cumulative distribution and density
381-500	`NormalDist.overlap` / `samples`	Distribution operations

Reading

`variance`

# CPython: Lib/statistics.py:610 variance
def variance(data, xbar=None):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 2:
        raise StatisticsError('variance requires at least two data points')
    if xbar is None:
        xbar = mean(data)
    # Two-pass: compute sum of squared deviations
    total = sum((x - xbar)**2 for x in data)
    return total / (n - 1)   # Bessel's correction

variance uses Bessel's correction (n-1) for the unbiased sample estimator. If xbar is pre-computed, it is reused to avoid a second pass. For integer data, the subtraction promotes to Fraction internally to avoid float cancellation errors.

`pvariance`

# CPython: Lib/statistics.py:660 pvariance
def pvariance(data, mu=None):
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('pvariance requires at least one data point')
    if mu is None:
        mu = mean(data)
    total = sum((x - mu)**2 for x in data)
    return total / n  # population: divide by n, not n-1

pvariance divides by n (the population variance). variance divides by n-1 (the sample variance, Bessel-corrected). Use pvariance when data is the entire population; variance when it is a sample.

`NormalDist`

# CPython: Lib/statistics.py:820 NormalDist
class NormalDist:
    __slots__ = ('_mu', '_sigma')

    def __init__(self, mu=0.0, sigma=1.0):
        if sigma < 0:
            raise StatisticsError('sigma must be non-negative')
        self._mu = float(mu)
        self._sigma = float(sigma)

    @classmethod
    def from_samples(cls, data):
        if len(data) < 2:
            raise StatisticsError(...)
        return cls(mean(data), stdev(data))

NormalDist is a value object. from_samples fits a normal distribution to data via sample mean and standard deviation. __slots__ avoids the instance __dict__ overhead for a class expected to be instantiated in loops.

`NormalDist.cdf`

# CPython: Lib/statistics.py:920 cdf
def cdf(self, x):
    # Cumulative distribution function P(X <= x)
    return 0.5 * (1.0 + math.erf((x - self._mu) / (self._sigma * _SQRT2)))

_SQRT2 = math.sqrt(2). The CDF uses the error function: erf is in math and delegates to the C library. cdf(mu) returns 0.5 by the symmetry of the normal distribution.

`NormalDist.overlap`

# CPython: Lib/statistics.py:980 overlap
def overlap(self, other):
    # Bhattacharyya coefficient: 0.0 = no overlap, 1.0 = identical
    X = self
    Y = other
    if X._sigma == 0.0 and Y._sigma == 0.0:
        return float(X._mu == Y._mu)
    if X._sigma == 0.0 or Y._sigma == 0.0:
        return float((X.cdf(Y._mu) - X.cdf(Y._mu - X._sigma)) if Y._sigma == 0.0 else ...)
    ...
    return math.exp(-0.25 * (log(v) + (X._mu - Y._mu)**2 / (X._sigma**2 + Y._sigma**2)))

overlap measures the similarity of two normal distributions using the Bhattacharyya coefficient. Returns 0.0 for non-overlapping distributions and 1.0 for identical ones. Useful in ML for comparing class-conditional distributions.

gopy notes

variance/pvariance are module/statistics.Variance / module/statistics.PVariance in module/statistics/module.go. NormalDist is module/statistics.NormalDist backed by mu float64 and sigma float64. cdf uses math.Erf. from_samples calls Mean then Stdev.

Map​

Reading​

variance​

pvariance​

NormalDist​

NormalDist.cdf​

NormalDist.overlap​

gopy notes​

Map