INTRODUCTION
A new multi-way regression method, called N-way partial least
squares (N-PLS), is presented. The model and algorithm is an extension
of the ordinary regression model PLS to the multi-way case. The algorithm
is available from the Internet at
http://www.models.life.ku.dk/source/nwaytoolbox/.
In chemometrics there is some confusion in distinguishing between multi-way
methods
and multi-way data. Bilinear two-way PLS and PCA can cope with multi-way
data by unfolding the data arrays to matrices, but the methods themselves
are not multi-way and do not take advantage of any multi-way structure
in the data (Smilde 92).
Unfolding can be unfavorable for several reasons:
- Unfold models are complex, (many parameters).
- Unfold models are difficult to interprete. (confounding of modes)
- Multi-way information is thrown away.
- Risk of poor predictive power.
THEORY
For trilinear PLS1, a trilinear decomposition of X
(IJK) is sought as shown in figure 1. From X and y
the weight vectors wj and wk are determined and these, in
turn, define the score vector t as the least-squares model of X
(or its residual). The scores are successively determined to have maximal
covariance with the dependent variable (or its residual). The scores are
related to the dependent variable by regression.
Figure 1. Two-component PLS decomposition of three-way cube.
The trilinear PLS1 algorithm is outlined below.
Center X and y. f=1. u=y.
1. Z = X'u
2. wj & wk = first singular vectors from SVD on Z
3. t = least-squares model of X given wj & wk
4. b = (T'T)^(-1)T'u
5. Calculate residuals
6. f=f+1. Step 1 (replace X &
u
with residuals)
The algorithm is easily extended to higher orders of both dependent and independent variables Bro 95 for details).
APPLICATION OF N-PLS
The data is obtained from the last part of a sugar plant, where white sugar is produced (figure 2). Samples were taken from 8 different places (indicated on the figure by Xs) for three days. For each sample the fluorescence was measured at 4 excitation and 371 emission wavelengths.
Figure 2. Sugar plant. Samples taken at X1 - X8.
The important quality parameters of the sugar can be predicted from the fluorescence of the sugar (Nørgaard 95). The fluorescence of the sugar can hence be regarded as a global quality indicator for the sugar.
To investigate how changes in the process affect the quality of
the sugar, the following model is sought: From the fluorescence of different
intermediate juices (X2 - X8) the fluorescence of sugar (X1) is to be predicted.
In this example the array of independent variables is four-way (samples
x excitation x emission x location) and the array of dependent variables
is three-way (samples x excitation x emission), hence the model is termed
quadri-PLS3 (figure 3). The results will be compared with the results of
unfold-PLS (i.e. bi-PLS2).
Figure 3. The quadri-PLS3 model is based on the arrays X and Y (dependent/independent). In bi-PLS2 these are unfolded to matrices.
RESULTS
Both unfold-PLS and quadri-PLS3 explain 83% of the variance in
Y
(cross-validation, one component). However, the loadings of the two models
have quite different degrees of interpretability.
Figure 4. Loadings from unfold PLS. Interpretation is difficult
due to confounded orders.
Figure 5. Loadings from quadri-PLS3 in the three orders. Clearly
shows which wavelengths and locations are important in predicting the quality
of the sugar (wash syrup data is for example known to be very informative
with respect to the sugar quality).
CONCLUSION
The N-PLS proposed, is a natural extension of the bilinear PLS model. The method has already proven useful in QSAR, kinetics, chromatography and fluorescence spectroscopy, but other obvious applications would for example include: dynamical systems or sensoric data.
In this case no real improvement in the predictions was achieved. However, a huge simplification of the model was obtained, yielding a much more interpretable model. In situations with noisy data, one often also obtain more predictive models by explicitly using the multi-way structure.
REFERENCES
Bro, R. J. Chemom.,
1996, 10, 47
Nørgaard,
L. Zuckerindustrie, 1995, 120, 970.
Smilde, A. K. Chemom.
Intell. Lab. Syst., 1992, 5, 143.