Basis Functions

Basis functions control how smooth terms behave in your GAM models -- different basis types are suitable for different kinds of data and modeling requirements:

ThinPlateSpline

ThinPlateSpline(
    *,
    shrinkage: bool | None = False,
    m: int | None = None,
    max_knots: int | None = None,
)

Thin plate regression spline basis.

Parameters:

shrinkage (bool | None, default: False ) –

If True, the penalty is modified so that the term is shrunk to zero for a high enough smoothing parameter.
m (int | None, default: None ) –

The order of the derivative in the thin plate spline penalty. If \(d\) is the number of covariates for the smooth term, this must satisfy \(m>(d+1)/2\). If left to None, the smallest value satisfying \(m>(d+1)/2\) will be used, which creates "visually smooth" functions.
max_knots (int | None, default: None ) –

The maximum number of knots to use. Defaults to 2000.

CubicSpline

CubicSpline(*, shrinkage: bool = False, cyclic: bool = False)

Cubic regression spline basis.

Cubic splines use piecewise cubic polynomials with knots placed throughout the data range. They tend to be computationally efficient, but often performs slightly worse than thin plate splines and are limited to univariate smooths. Note the limitation of being restricted to one-dimensional smooths does not imply they cannot be used for multivariate T smooths, which are constructed from marginal bases.

Parameters:

cyclic (bool, default: False ) –

If True, creates a cyclic spline where the function values and derivatives match at the boundaries. Use for periodic data like time of day, angles, or seasonal patterns. Default is False.
shrinkage (bool, default: False ) –

If True, adds penalty to the null space (linear component). Helps with model selection and identifiability. Default is False. Cannot be used with cyclic=True.

Raises:

ValueError –

If both cyclic and shrinkage are True (incompatible options)

BSpline

BSpline(*, degree: int = 3, penalty_orders: Iterable[int] | None = None)

B-spline basis with derivative-based penalties.

These are univariate (but note univariate smooths can be used for multivariate smooths constructed with T). BSpline(degree=3, penalty_orders=[2]) constructs a conventional cubic spline.

Parameters:

degree (int, default: 3 ) –

The degree of the B-spline basis (e.g. 3 for a cubic spline).
penalty_orders (Iterable[int] | None, default: None ) –

The derivative orders to penalize. Default to [degree - 1].

PSpline

PSpline(*, degree: int = 3, penalty_order: int | None = None)

P-spline (penalized spline) basis as proposed by Eilers and Marx (1996).

Uses B-spline bases penalized by discrete penalties applied directly to the basis coefficients. Note for most use cases splines with derivative-based penalties (e.g. ThinPlateSpline or CubicSpline) tend to yield better MSE performance. BSpline(degree=3, penalty_order=2) is cubic-spline-like.

Parameters:

degree (int, default: 3 ) –

Degree of the B-spline basis (e.g. 3 for cubic).
penalty_order (int | None, default: None ) –

The difference order to penalize. 0-th order is ridge penalty. Default to degree-1.

DuchonSpline

DuchonSpline(*, m: int = 2, s: float | int = 0)

Duchon spline basis - a generalization of thin plate splines.

These smoothers allow the use of lower orders of derivative in the penalty than conventional thin plate splines, while still yielding continuous functions.

The description, adapted from mgcv is as follows: Duchon’s (1977) construction generalizes the usual thin plate spline penalty as follows. The usual thin plate spline penalty is given by the integral of the squared Euclidian norm of a vector of mixed partial \(m\)-th order derivatives of the function w.r.t. its arguments. Duchon re-expresses this penalty in the Fourier domain, and then weights the squared norm in the integral by the Euclidean norm of the fourier frequencies, raised to the power \(2s\), where \(s\) is a user selected constant.

If \(d\) is the number of arguments of the smooth:

It is required that \(-d/2 < s < d/2\).
If \(s=0\) then the usual thin plate spline is recovered.
To obtain continuous functions we further require that \(m + s > d/2\).

For example, DuchonSpline(m=1, s=d/2) can be used in order to use first derivative penalization for any \(d\), and still yield continuous functions.

Parameters:

m –

Order of derivative to penalize.
s –

\(s\) as described above, should be an integer divided by 2.

SplineOnSphere

SplineOnSphere(*, m: int = 0)

Isotropic smooth for data on a sphere (latitude/longitude coordinates).

This should be used with exactly two variables, where the first represents latitude on the interval [-90, 90] and the second represents longitude on the interval [-180, 180].

Parameters:

m –

An integer in [-1, 4]. Setting m=-1 uses DuchonSpline(m=2,s=1/2). Setting m=0 signals to use the 2nd order spline on the sphere, computed by Wendelberger’s (1981) method. For m>0, (m+2)/2 is the penalty order, with m=2 equivalent to the usual second derivative penalty.

RandomEffect

RandomEffect()

Random effect basis for correlated grouped data.

This can be used with any mixture of numeric or categorical variables. Acts similarly to an Interaction but penalizes the corresponding coefficients with a multiple of the identity matrix (i.e. a ridge penalty), corresponding to an assumption of i.i.d. normality of the parameters.

Warning

Numeric variables (int/float), will be treated as a linear term with a single penalized slope parameter. Do not use an integer variable to encode categorical groups!

Example

For an example, see the supplement vs placebo example.

MarkovRandomField

MarkovRandomField(
    *,
    polys: dict[str, ndarray] | None = None,
    neighbours: dict[str, ndarray] | dict[str, list[str]] | None = None,
)

Intrinsic Gaussian Markov random field for discrete spatial data.

The smoothing penalty encourages similar value in neighboring locations. The variable used in the corresponding smooth should be a categorical variable with strings represenging the area labels.

Should be constructed using either polygons or neighborhood structure. For plotting, the polygon structure is required.

Example

For an example, see the markov random field crime example.

Parameters:

polys (dict[str, ndarray] | None, default: None ) –

Dictionary mapping levels of the categorical variable to the polygons structure arrays. Each array should have two columns, representing the coordinates of the vertices.
neighbours (dict[str, ndarray] | dict[str, list[str]] | None, default: None ) –

Dictionary mapping levels of the categorical variable to a list or numpy array of strings corresponding neighbours for that level.

FactorSmooth

FactorSmooth(bs: AbstractBasis = <factory>)

S for each level of a categorical variable.

When using this basis, the first variable of the smooth should be a numeric variable, and the second should be a categorical variable.

Unlike using a categorical by variable e.g. S(x, by="group"):

The terms share a smoothing parameter.
The terms are fully penalized, with seperate penalties on each null space component (e.g. intercepts). The terms are non-centered, and can be used with an intercept without introducing indeterminacy, due to the penalization.

Parameters:

bs (AbstractBasis, default: <factory> ) –

Any singly penalized basis function. Defaults to ThinPlateSpline. Only the type of the basis is passed to mgcv (i.e. what is returned by str(bs)). This is a limitation of mgcv (e.g. you cannot do ) mgcv provides no way to pass more details for setting up the basis function.

AbstractBasis

Abstract class defining the interface for GAM basis functions.

All basis function classes must implement this protocol to be usable with smooth terms. The protocol ensures basis functions can be converted to appropriate mgcv R syntax and provide any additional parameters needed.