SUGAR PROCESS DATA
Sugar was sampled continuously during eight hours to make a mean sample representative for one 'shift' (eight hour period). Samples were taken during the three months of operation (the so-called campaign) in late autumn from a sugar plant in Scandinavia giving a total of 268 samples. The sugar was sampled directly from the final unit operation (centrifuge) of the process.
Get the data
The data are available in zipped MATLAB 4.2 format. Download the data and write load data in MATLAB. If you use the data we would appreciate that you report the results to us as a courtesey of the work involved in producing and preparing the data. Also you may want to refer to the data by referring to
R. Bro, Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis, Chemom. Intell. Lab. Syst., 1999, (46), 133-147. (read excerpt)
The data have also been described in
- R. Bro, Multi-way Analysis in the Food Industry. Models, Algorithms and Applications, Ph.D. thesis, University of Amsterdam, 1998.
- L. Munck, L. Nørgaard, S. B. Engelsen, R. Bro, C. A. Andersson, Chemometrics in food science – a demonstration of the feasibility of a highly exploratory, inductive evaluation strategy of fundamental scientific significance, Chemom. Intell. Lab. Syst., 1998, 44, 31-60.
- L. Eriksson, J. Trygg, E. Johansson, R. Bro, and S. Wold, Orthogonal Signal Correction, Wavelet Analysis, and Multivariate Calibration of Complicated Process Fluorescence Data, Analytica Chimica Acta 420/2(2000)181-195
There are 268 samples corresponding approximately to a sample every ieght hour during the three months of operation (autumn 1995). The sample number can be converted to actual time using the matrix time. For sample (row) number i the actual date of measurement is Time(i). This gives the time of sampling as a number xyyz. x is the month (2:Oct,3:Nov,4:Dec,5:Jan),yy the day, and z the time a day (1:Morning,2:Afternoon,3:Night).
The following matrices are supplied
|X||268 x 571 x 7||Fluorescence data|
|DimX||Size of X|
|EmAx||1 x 571||Emission labels|
|ExAx||1 x 7||Excitation labels|
|Proc||268 x 39 x 7||Process data|
|DimProc||Size of Proc|
|ProcNumber||1 x 88||Code for process variables|
|Lab||268 x 48 x 7||Lab measurements|
|DimLab||Size of Lab|
|LabNumber||1 x 48||Code for lab measurements|
|y||268 x 3||Two main quality measurements|
|Yidx||Desciption of y|
|time||Code for time|
|readmetime||Description of time|
The sugar was dissolved in un-buffered water (2.25g/15mL) and the solution was measured spectrofluorometrically in a 10 by 10 mm cuvette on a PE LS50B spectrofluorometer. Raw non-smoothed data was output from the fluorometer. For every sample the emission spectra from 275-560 nm were measured in 0.5 nm intervals (571 wavelengths) at seven excitation wavelengths (230, 240, 255, 290, 305, 325, 340 nm). The data of all the 265 samples can be arranged in an I × J × K three-way array of specific size 265 × 571 × 7. The first mode refers to samples, the second to emission wavelengths, and the third to excitation wavelengths. The ijkth element in this array corresponds to the measured emission intensity from sample i, excited at wavelength k, and measured at wavelength j. The reason for the large number of emission wavelengths is that the fluorometer used in this study only allows emission in half nanometer steps to be measured. One may, of course, simply use a subset of the wavelengths if physical computer memory is limited.
The fluorescence data are held in the matrix X which is a 267 x 3997 matrix. The first 571 columns are the emission measurements for the first excitation wavelength (340 nm). The last 571 columns are the emission measurements for the last excitation wavelength (230 nm). The vector EmAx contains the 571 emission wavelengths in nm and the vector ExAx contains the corresponding seven excitation wavelengths. If you use MATLAB ver. 5 and want to convert the unfolded three-way array simply type X = reshape(X,DimX).
Also available were laboratory determinations of the quality of the produced sugar sampled at the same rate as the fluorometrically measured sugar samples. These quality measures are ash content and color. Ash content is determined by conductivity and is a measure of the amount of inorganic impurities in the refined sugar. It is given in percentages. Color is determined as the absorption at 420 nm of a membrane-filtered solution of sugar adjusted to pH 7. The color is given as a unit derived from the absorbance where 45 is the maximum allowed color of standard sugar. It gives an indication of the miscoloring of the sugar. This color is by far so low, that it is of no importance for the consumer, but it is of interest for process control and for retailers.
The matrix y holds the color and ash measurements.
Auxiliary laboratory measurements
A number of auxiliary variables are available. These are laboratory measurements and are given in a 268 x 48 x 7 three-way array where 268 is the number of samples, 48 the number of variables, and 7 the lags. That is, the first 268 x 48 matrix holds the measurements at the same time as given in the time variable. The second 268 x 48 matrix gives the values measured one hour before etc. NaN is used for missing values.
Automatically sampled process variables
The process variables include temperature, flow, and pH determinations at different points in the process. Typically these variables are noisy and sampled at quite different rates. The process data are given in a 268 x 39 x 7 three-way array where 268 is the number of samples, 39 the number of variables, and 7 the lags. That is, the first 268 x 39 matrix holds the measurements at the same time as given in the time variable. The second 268 x 39 matrix gives the values measured one hour before etc. NaN is used for missing values.
Known upsets in production
Break-downs are known to have occured at the following dates:
30/9, 1/10, 2/10, 4/10, 5/10, 11/10, 18/10, 25/10, 3/11, 4/11, 8/11, 9/11, 16/11, 17/11, 18/11, 23/11, 5/12, 6/12, 7/12, 9/12, 13/12, 22/12