Title: | LS-PLS Models |
---|---|
Description: | Implements the LS-PLS (least squares - partial least squares) method described in for instance Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) "A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables" Journal of Chemometrics, 18(10), 451--464, <doi:10.1002/cem.890>. |
Authors: | Bjørn-Helge Mevik [aut, cre] |
Maintainer: | Bjørn-Helge Mevik <[email protected]> |
License: | GPL-2 |
Version: | 0.2-2 |
Built: | 2025-03-02 02:47:21 UTC |
Source: | https://github.com/bhmevik/lspls |
Implements the LS-PLS (least squares - partial least squares) method described in for instance Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) "A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables" Journal of Chemometrics, 18(10), 451–464, <doi:10.1002/cem.890>.
The DESCRIPTION file:
Package: | lspls |
Title: | LS-PLS Models |
Version: | 0.2-2 |
Date: | 2018-07-26 |
Authors@R: | c(person("Bjørn-Helge", "Mevik", role = c("aut", "cre"), email = "[email protected]")) |
Author: | Bjørn-Helge Mevik [aut, cre] |
Maintainer: | Bjørn-Helge Mevik <[email protected]> |
Encoding: | UTF-8 |
Depends: | pls (>= 2.2.0) |
Imports: | grDevices, graphics, methods, stats |
Description: | Implements the LS-PLS (least squares - partial least squares) method described in for instance Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) "A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables" Journal of Chemometrics, 18(10), 451--464, <doi:10.1002/cem.890>. |
License: | GPL-2 |
URL: | http://mevik.net/work/software/lspls.html, https://github.com/bhmevik/lspls |
BugReports: | https://github.com/bhmevik/lspls/issues |
Repository: | https://bhmevik.r-universe.dev |
RemoteUrl: | https://github.com/bhmevik/lspls |
RemoteRef: | HEAD |
RemoteSha: | af5bf5e2bf0ca27c59ca774a092edc9db1e6db1e |
Index of help topics:
MSEP.lsplsCv MSEP, RMSEP and R^2 for LS-PLS lspls Fit LS-PLS Models lspls-package LS-PLS Models lsplsCv Cross-Validate LS-PLS Models orthlspls.fit Underlying LS-PLS Fit Function orthlsplsCv Low Level Cross-Validation Function plot.lspls Plots of LS-PLS Models plot.lsplsCv Plot Method for Cross-Validations predict.lspls Predict Method for LS-PLS Models project Projection and Orthogonalisation
LS-PLS (least squares–partial least squares) models are written on the form
where the terms are one or more matrices
separated by a colon (:), i.e.,
. Multi-response models are
possible, in wich case
should be a matrix.
The model is fitted from left to right. First is fitted to
using least squares (LS) regression and the residuals
calculated.
For each
, the matrices
, ...,
are orthogonalised against the variables used in the regression sofar
(when
, this means
).
The residuals from the LS regression are used as the response in PLS
regressions with the orthogonalised matrices as predictors (one PLS
regression for each matrix), and the desired number of PLS components
from each matrix are included among the LS prediction variables.
The LS regression is then refit with the new variables, and new
residuals calculated.
The function to fit LS-PLS models is lspls
. A typical
usage to fit the model
would be
mod <- lspls(y ~ X + Z + V1:V2 + W, ncomp = list(3, c(2,1), 2), data = mydata)
The first argument is the formula describing the model.
X
is fit first, using LS. Then PLS scores from Z
(orthogonalised) are added. Then PLS scores from V1
and
V2
are added (simultaneously), and finally PLS scores from
W
. The next argument, ncomp
, specifies the number of
components to use from each PLS: 3 Z
score vectors, 2 V1
score vectors, 1 V2
score vector and 2 W
score vectors.
Finally, mydata
should be a data frame with matrices y
,
X
, Z
, V1
, V2
and W
(for
single-response models, y
can be a vector).
Currently, score plots and loading plots of fitted models are
implemented. plot(mod, "scores")
gives score plots for each PLS
regression, and plot(mod, "loadings")
gives loading plots.
There is a predict
method to predict response or score values
from new data
predict(mod, newdata = mynewdata)
(This predicts response values. Use type = "scores"
to get
scores.) Also, the standard functions resid
and fitted
can be used to extract the residuals and fitted values.
In order to determine the number of components to use from each matrix, one can use cross-validation:
cvmod <- lsplsCv(y ~ X + Z + V1:V2 + W, ncomp = list(4, c(3,4), 3), segments = 12, data = mydata)
In lsplsCv
, ncomp
gives the maximal number of components to
test. The argument segments
specifies the number of segments to
use. One can specify the type of segments to use (random (default),
consequtive or interleaved) with the argument segment.type
.
Alternatively, one can supply the segments explicitly with
segments
. See lsplsCv
for details.
One can plot cross-validated RMSEP values with plot(cvmod)
.
(Similarly, plot(cvmod, "MSEP")
plots MSEP values.) This makes
it easier to determine the optimal number of components for each PLS.
See plot.lsplsCv
for details. To calculate the RMSEP or
MSEP values explicitly, one can use the function RMSEP
or
MSEP
.
Bjørn-Helge Mevik [aut, cre]
Maintainer: Bjørn-Helge Mevik <[email protected]>
Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.
Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)
Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)
lspls
, lsplsCv
, plot.lspls
,
plot.lsplsCv
## FIXME
## FIXME
A function to fit LS-PLS (least squares–partial least squares) models.
lspls(formula, ncomp, data, subset, na.action, model = TRUE, ...)
lspls(formula, ncomp, data, subset, na.action, model = TRUE, ...)
formula |
model formula. See Details. |
ncomp |
list or vector of positive integers, giving the number of components to use for each ‘pls-matrix’. See Details. |
data |
an optional data frame with the data to fit the model from. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing values. |
model |
logical. If |
... |
additional arguments, passed to the underlying PLSR fit function. |
lspls
fits LS-PLS models, in which matrices are added
successively to the model. The first matrix is fit
with ordinary least squares (LS) regression. The rest of the matrices
are fit with partial least squares regression (PLSR), using the
residuals from the preceeding model as response. See
lspls-package or the references for more details, and
lspls-package for typical usage.
The model formula is specified as
resp ~ term1 + term2 + ...
.
If resp is a matrix (with more than one
coloumn), a multi-response model is fitted. term1 specifies the
first matrix to be fitted, using LS. Each of the remaining terms will
be added sequentially in the order specified in the formula (from left
to right). Each term can either be a single matrix, which will be added
by itself, or several matrices separated with :
, e.g.,
Z:V:W
, which will be added simultaneously (these will be denoted
parallell matrices).
The first matrix, term1, is called the LS matrix, and the rest of the predictor matrices (whether parallell or not) are called PLS matrices.
Note that an intercept is not automatically added to the model. It should be included as a constant coloumn in the LS matrix, if desired. (If no intercept is included, the PLS matrices should be centered. This happens automatically if the LS matrix includes the intercept.)
The number of components to use in each of the PLSR models is
specified with the ncomp
argument, which should be a
list. Each element of the list gives the number of components to
use for the corresponding term in the formula. If the term specifies
parallell matrices (separated with :
), the list element
should be a vector with one integer for each matrix. Otherwise, it
should be a number.
To simplify the specification of ncomp
, the following
conversions are made: if ncomp
is a vector, it will be
converted to a list. ncomp
will also be recycled as neccessary to get
one element for each term. Finally, for a parallell term, the list
element will be recycled as needed. Thus, ncomp = 4
will
result in 4 components being fit for every PLS matrix.
Currently, the function lspls
itself handles the formula and
the data, and calls the underlying fit function
orthlspls.fit
to do the actual fitting. This implements
the orthogonalized version of the LS-PLS algorithm, and without splitting
of parallell matrices into common and unique components (see
the references). Extensions to non-orthogonalized algorithms, and
splitting of parallell matrices are planned.
An object of class "lspls"
. The object contains all components
returned by the underlying fit function (currently
orthlspls.fit
). In addition, it contains the following
components:
fitted.values |
matrix with fitted values, one coloumn per response |
na.action |
if observations with missing values were removed,
|
ncomp |
the list of number of components used in the model. |
call |
the function call. |
terms |
the model terms. |
model |
if |
The user interface (e.g. the model handling) is experimental, and might well change in later versions.
The handling of formula
(especially :
) is non-standard.
Note that the order of the terms is significant; terms are added
from left to right.
Bjørn-Helge Mevik
Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.
Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)
Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)
lspls-package, lsplsCv
,
plot.lspls
##FIXME
##FIXME
Calculate cross-validated predictions for LS-PLS models.
lsplsCv(formula, ncomp, data, subset, na.action, segments = 10, segment.type = c("random", "consecutive", "interleaved"), length.seg, model = TRUE, ...)
lsplsCv(formula, ncomp, data, subset, na.action, segments = 10, segment.type = c("random", "consecutive", "interleaved"), length.seg, model = TRUE, ...)
formula |
model formula. See Details. |
ncomp |
list or vector of positive integers, giving the number of components to use for each PLS matrix. See Details. |
data |
an optional data frame with the data to fit the model from. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain missing values. |
segments |
the number of segments to use, or a list with segments (see Details). |
segment.type |
the type of segments to use. Ignored if
|
length.seg |
Positive integer. The length of the segments to
use. If specified, it overrides |
model |
logical. If |
... |
additional arguments, passed to the underlying
cross-validation function (currently |
The function performs a cross-validation, using the model and segments
specified in the call. It returns an object of class
"lsplsCv"
, which has a plot method (see
plot.lsplsCv
). See lspls-package for typical
usage and more about LS-PLS models.
See lspls
for details about specifying the model
with formula
and ncomp
. Note that lsplsCv
cross-validates models with from 0 components to the numbers of
components specified with ncomp
.
If segments
is a list, the arguments segment.type
and
length.seg
are ignored. The elements of the list should be
integer vectors specifying the indices of the segments. See
cvsegments
for details.
Otherwise, segments of type segment.type
are generated. How
many segments to generate is selected by specifying the number of
segments in segments
, or giving the segment length in
length.seg
. If both are specified, segments
is
ignored.
An object of class "lsplsCv"
, with components
pred |
the cross-validated predictions. An array with one dimension for the observations, one for the responses, and one for each of the PLS matrices. |
segments |
the list of segments used in the cross-validation. |
na.action |
if observations with missing values were removed,
|
ncomp |
the list of number of components used in the model. |
call |
the function call. |
terms |
the model terms. |
model |
if |
Currently, lsplsCv
handles the formula and the data, and calls
orthlsplsCv
for the actual cross-validation. The
formula interface is experimental, and might change in future versions.
Bjørn-Helge Mevik
Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.
Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)
Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)
lspls
, plot.lsplsCv
,
cvsegments
, orthlsplsCv
,
lspls-package
##FIXME
##FIXME
(Root) Mean Squared Error of Prediction ((R)MSEP) and R^2 methods for LS-PLS
cross-validations ("lsplsCv"
objects).
## S3 method for class 'lsplsCv' MSEP(object, scale = FALSE, ...) ## S3 method for class 'lsplsCv' RMSEP(object, scale = FALSE, ...) ## S3 method for class 'lsplsCv' R2(object, ...)
## S3 method for class 'lsplsCv' MSEP(object, scale = FALSE, ...) ## S3 method for class 'lsplsCv' RMSEP(object, scale = FALSE, ...) ## S3 method for class 'lsplsCv' R2(object, ...)
object |
an |
scale |
logical. Whether the responses and predicted values
should be divided by the standard deviation of the response prior to
calculating the measure. This is most useful when comparing several
responses. Default is not to scale. Note that this argument is
ignored by the |
... |
Further arguments. Currently unused. |
An array. The first dimension corresponds to the responses (for single-response models, the length of this dimension is 1). The rest of the dimensions correspond to the number of components from the PLS matrices.
Bjørn-Helge Mevik
Fits orthogonalized LS-PLS models.
orthlspls.fit(Y, X, Z, ncomp)
orthlspls.fit(Y, X, Z, ncomp)
Y |
matrix. Response matrix. |
X |
matrix. The first predictor matrix (typically a design matrix). |
Z |
list. List of predictor matrices. |
ncomp |
list. The number of components to fit from each matrix. |
orthlspls.fit
is not meant to be called by the user. It is
called by lspls
to do the actual fitting. See
lspls
for details about LS-PLS and ncomp
. Each
element of the list Z
should either be a matrix or a list of
matrices.
A list with components
coefficients |
matrix with the final prediction coefficients |
predictors |
matrix with variables and scores used in the final regression |
orthCoefs |
list of coefficient generating matrices, to be used when predicting new predictors. |
models |
list of fitted PLS models for the matrices |
ncomp |
list with the number of components used |
scores |
list of score matrices |
loadings |
list of loading matrices |
residuals |
matrix with fit residuals, one coloumn per response |
The interface (arguments and return values) is likely to change in a future version.
Bjørn-Helge Mevik
Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.
Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)
Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)
Low-level function to perform the cross-validation in lsplsCv
.
orthlsplsCv(Y, X, Z, ncomp, segments, trace = FALSE, ...)
orthlsplsCv(Y, X, Z, ncomp, segments, trace = FALSE, ...)
Y |
matrix. Response matrix. |
X |
matrix. The first predictor matrix (typically a design matrix). |
Z |
list. List of predictor matrices. |
ncomp |
list. The number of components to fit from each matrix. |
segments |
list. The segments to use. |
trace |
logical; if |
... |
Further arguments. Currently not used. |
This function is not meant to be called directly by the user. It performs cross-validation of ortogonalized LS-PLS-models without splitting of parallell matrices into common and unique components. See the references for details.
An array of cross-validated predictions. The first dimension corresponds to the observations, the second to the responses, and the rest to the number of components of the PLS models.
Bjørn-Helge Mevik
Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.
Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)
Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)
Plot method for "lspls"
objects.
## S3 method for class 'lspls' plot(x, plottype = c("scores", "loadings"), ...) ## S3 method for class 'lspls' scoreplot(object, ...) ## S3 method for class 'lspls' loadingplot(object, ...)
## S3 method for class 'lspls' plot(x, plottype = c("scores", "loadings"), ...) ## S3 method for class 'lspls' scoreplot(object, ...) ## S3 method for class 'lspls' loadingplot(object, ...)
x , object
|
Object of class |
plottype |
character string. What type of plot to generate. |
... |
Further arguments, passed on to underlying plot functions. |
The plot
method simply calls scoreplot.lspls
or
loadingplot.lspls
depending on the plottype
argument.
scoreplot.lspls
gives a series of score plots, one for each PLS
model. The user is asked to press Return between each plot.
loadingplot.lspls
shows a series of loading plots, one for each
PLS model. All plots are shown in the same plot window.
The functions return whatever the (last) underlying plot function returns.
Bjørn-Helge Mevik
lspls
, scoreplot
,
loadingplot
, plot.lsplsCv
##FIXME
##FIXME
Plot method for "lsplsCv"
objects. It plots the
cross-validated (R)MSEP or R^2 against the total number of components
or the matrices included in the model.
## S3 method for class 'lsplsCv' plot(x, which = c("RMSEP", "MSEP", "R2"), ncomp, separate = TRUE, scale = !isTRUE(separate), ...)
## S3 method for class 'lsplsCv' plot(x, which = c("RMSEP", "MSEP", "R2"), ncomp, separate = TRUE, scale = !isTRUE(separate), ...)
x |
object of class |
which |
character string. Which measure to plot. |
ncomp |
list. The number of components to use when plotting, for each PLS matrix in the model. See Details. |
separate |
logical. Whether separate plots should be generated for each response (default) or one plot with the sum of the measure for all responses. |
scale |
logical. Whether the responses and predicted values
should be divided by the standard deviation of the response prior to
calculating the measure. Default is to scale when producing a
combined plot ( |
... |
Further arguments, sent to the underlying plot function. |
If ncomp
is not specified,
the plot
method generates a plot of the cross-validated (R)MSEP
or R^2 values for all combinations of number of components. The
values are plotted against the total number of components. Each point
is labelled with the combination of number of components. E.g., for
a model with three PLS matrices, ‘132’ means one
component from the first matrix, three from the second and two from
the third.
Also, the lowest (R)MSEP or highest values for each total
number of components are joined by a line.
If ncomp
is specified, the plot
method plots (R)MSEP
or R^2 for models with the first matrix, with the two first matrices,
etc. ncomp
should be specified as when running lsplsCv
,
and is used for selecting the number of components for each PLS
matrix. For instance
mod <- lsplsCv(Y ~ X + Z + V:W, ...) plot(mod, ncomp = list(2, c(1,3)))
would plot the RMSEPs for Y ~ X
, Y ~ X + Z
and Y ~
X + Z + V:W
, using 2, 1 and 3 components for Z
, V
and
W
, respectively.
If separate
is TRUE
, a separate plot panel is produced
for each response. Otherwise the measure is added for all responses
and shown in one plot. If scale
is TRUE
(the default
when producing a combined plot), the measures for each response are
standardised by dividing the responses and predicted values by the
standard deviation of the (corresponding) response prior to
calculating the measure. Note that scale
is ignored when
which
is "R2"
because is independent of scale.)
The function returns whatever the (last) underlying plot function returns.
Bjørn-Helge Mevik
##FIXME
##FIXME
Predict method for "lspls"
objects. It predicts response
values or scores from new data.
## S3 method for class 'lspls' predict(object, newdata, type = c("response", "scores"), na.action = na.pass, ...)
## S3 method for class 'lspls' predict(object, newdata, type = c("response", "scores"), na.action = na.pass, ...)
object |
object of class |
newdata |
data frame. The new data. |
type |
character. Wether to predict responses or scores. |
na.action |
function determining what should be done with missing
values in |
... |
further arguments. Currently not used. |
If type = "response"
, a matrix with predicted response values
is returned. If type = "scores"
, a matrix with predicted
score values is returned.
Bjørn-Helge Mevik
##FIXME
##FIXME
Functions to project one matrix onto another, or to ortghogonalise it against the other.
project(M, N) orth(M, N) Corth(M, N)
project(M, N) orth(M, N) Corth(M, N)
M |
matrix to be projected or orthogonalised |
N |
matrix to be projected onto or orthogonalised against |
project(M, N)
calculates the projection of M
onto N
,
i.e., .
orth(M, N)
orthogonalises M
with respect to N
,
i.e., it calculates the projection of M
onto the orthogonal
space of N
: .
Corth(M, N)
calculates the coefficient matrix needed to
orthogonalise future matrices, that is,
. Future
matrices
m
and n
can be orthogonalised with
m - n %*% Corth(M, N)
.
A matrix.
The functions need to be opitmised, both for speed and numerical accurracy.
Bjørn-Helge Mevik
##FIXME
##FIXME