FC 155 - Regression

The regression block correlates up to four independent variables to a single dependent variable. Data can be collected on a time or trigger basis, and buffered either sequentially or in bins. The size of the data buffer is configurable.

 

A goodness of fit is specified. This output disables the updating of parameter estimates when a mismatch between the collected data and the estimated curve is beyond the specified goodness of fit.

 

A reset input provides the ability to suspend the start data collection when flagged.

 

Calculation and edit are the two operating modes. In both modes, the first four outputs are dedicated to the computed coefficients. The remaining outputs are dependent on the mode. The calculation mode outputs information about the current calculation. The edit mode identifies the inputs to the calculation and allows the operator to change the quality of a row of data in the regression matrix.

 

 

Outputs:

Blk

Type

Description

Calculate Mode

Edit Mode

N

R

Parameter 1

Parameter 1

N+1

R

Parameter 2

Parameter 2

N+2

R

Parameter 3

Parameter 3

N+3

R

Parameter 4

Parameter 4

N+4

R

Goodness of fit

Dependent variable, (y)

N+5

R

Maximum model mismatch

First independent variable, x1

N+6

R

Row no. producing maximum model mismatch

Second independent variable, x2

N+7

R

No. of rows with good quality

Third independent variable, x3

N+8

R

Time of last computation in mmddhh format

Fourth independent variable, x4

N+9

B

State of outputs:

0 = computed

1 = default

Quality of data point:

0 = bad, excluded from computation

1 = good, included in computation

 

 

Specifications:

Spec

Tune

Default

Type

Range

Description

S1

N

5

I

Note 1

Block address of dependent variable

S2

N

5

I

Note 1

Block address of first independent variable

S3

N

5

I

Note 1

Block address of second independent variable

S4

N

5

I

Note 1

Block address of third independent variable

S5

N

5

I

Note 1

Block address of fourth independent variable

S6

N

1

I

1 - 4

Number of independent variables to use in calculation

S7

N

5

I

5 - 32

Number of sets of data to buffer

S8

N

5

I

5 - 32

Minimum number of good sets of data required for calculation

S9

N

1

B

Full

Trigger or time mode flag:

0 = trigger

1 = time

S10

N

1.000

R

0.0 - 9.2E18

Time interval between calculations (minutes)

S11

N

0

I

Note 1

Block address of external trigger

S12

N

0

B

Full

Data storage mode flag:

0 = sequential

1 = bins

S13

N

100.000

R

Full

High range of first independent variable for bin storage

S14

N

0.000

R

Full

Low range of first independent variable for bin storage

S15

N

0

I

Note 1

Block address of edit mode switch:

0 = calculate

1 = edit

S16

N

5

I

Note 1

Block address of set to edit value

S17

N

0

I

Note 1

Block address of set quality toggle flag:

1 = toggle quality

S18

N

0

I

Note 1

Block address of reset flag:

1 = reset

S19

Y

0.000

R

Full

Initial default for first parameter

S20

Y

0.000

R

Full

Initial default for second parameter

S21

Y

0.000

R

Full

Initial default for third parameter

S22

Y

0.000

R

Full

Initial default for fourth parameter

S23

Y

1.000

R

0.000 - 1.000

Desired goodness of fit

S24

N

0.000

R

Full

Default update

S25

N

0.000

R

Full

Full Spare

S26

N

0.000

R

Full

Full Spare

S27

N

0.000

R

Full

Full Spare

 

NOTES:

1. Maximum values are:9,998 for the BRC-100, IMMFP11/12 31,998 for the HAC

 

155.1   Explanation

 

The regression block has two modes of operation. First is calculation of parameter estimates, and second is editing of data contained in the data table. Specification S15 selects the mode of operation (the edit mode switch). Setting S15 to zero selects the calculation mode and one selects the edit mode.

 

In the calculation mode, the regression block stores the measurements x1, x2, x3, x4 (independent variable) and y (dependent variable) in a data table. The matrix X and the vector Y represent this data table as shown below. Each row of the matrix and the corresponding element in Y contain data from one sampling period.

 

 

 

where:

 

an

=

The values of parameter n where n = 1 to 4

 

X

=

Matrix of input values for independent variables.  Each column contains the group of samples for one of the four independent variables;

x1N = values for independent variable x1, etc.

 

Y

=

Matrix of values for the dependent variable. The number of rows is the number of samples taken.

 

 

 

The regression algorithm solves the equation Xa=Y. If the number of samples (rows in X) equals the number of parameters to find (columns in X), creating a square matrix, the solution is a=X-1Y. However, the matrix X is not always invertible. If the rows of X are not unique, the matrix is singular and the inverse does not exist. The internal logic of the regression block prevents entry of data that creates a singular matrix.

 

When collecting live data, there is always uncertainty in the values collected, resulting from the influence of uncontrollable effects in the surrounding environment. To counteract this influence, more data points are collected to increase confidence in the model parameters. When this is done, the matrix X is not square. This leaves more equations than unknown parameters to specify, and the simple algebraic solution explained above is not possible.

 

Rearranging the equation Xa=Y gives X(a-y)=r where r is the vector of residuals. Generally, any a selected leaves a nonzero vector of residuals, indicating the mismatch between model and data. To solve this problem, the regression block uses the least squares method to minimize the square of the residuals. The solution takes the form X´(X(a-Y))=0. This is a set of linear equations, solved by the Gaussian Elimination method. This method provides numerically stable solutions while requiring less processing time than more direct solution techniques.

 

A minimum number of sets of data with good quality must be present in the data table before the parameters may be calculated. Specification S8 specifies the minimum number. But, the minimum number must be equal to or greater than five.  The data set can be viewed and changed in the edit mode. Each time a good quality data set is entered, the values of a1

through a4 are recalculated.

 

If the calculation of a is valid, and the goodness of fit is less than that specified with S23, then the values of a1 through a4 are output from the block. The goodness of fit is defined as the mean relative residual:

 

 

where:

 

J

=

Number of independent variables used in the calculation (S6).

 

K

=

Number of data sets used in the calculation (S7).

 

a(K)

=

Value of a determined when k data sets are used for calculation.

 

X

=

Matrix of input values for independent variables.

 

y(j)

=

Value of the dependent variable associated with independent variable number j.

 

 

The block also performs a test on residuals r after accepting new data. The old set of data is always buffered for the duration of the calculation, and replaces the new set of data in the event that the block is unable to calculate valid parameters. When the computed goodness of fit is greater than the tolerance limit (S23), the new data set is removed from the data table X and the old data is reinstated.

 

Data can be collected in two ways (time basis and transition of external trigger). Specification S9 selects the mode. If data is collected on a time basis, the collection frequency is specified in minutes with S10.

 

The data can be stored in the data table in one of two ways. Data can be stored either in bin mode or sequentially.

Specification S12 selects the mode.

 

S12 = one = bin storage

 

The bin mode of data storage allows the system to maintain a spread of data over a range of the independent variable X1.

The bin mode of data storage should be used when the correlation is not expected to change (e.g., due to sensor contamination or independent variable changes over a wide range). In this mode the data is sent to the appropriate bin and calculated as shown:

 

 

where:

 

<S2>

=

First independent variable.

 

S7

=

N, the number of data sets used for calculation.

 

<S13>

=

High end of range of allowable X1 values.

 

<S14>

=

Low end of range of allowable X1 values.

 

 

 

Any input greater than S13 or less than S14 is discarded. In the sequential mode of data collection, the data table is first in first out (FIFO queue). The new points fill the data table row by row, and the newest data set replaces the oldest data set. This mode is used when the correlation is expected to change.

 

In the edit mode, a set of data from the data table can be selected and displayed by entering a number between one and N as S16. The set of data selected is then available in output blocks N+4 through N+8. Block N+9 contains a quality bit that indicates if the data selected is currently in use in the calculation.

 

0 = good data

1 = bad data

 

The operator can change the quality associated with a set of data by toggling S17 to one. By changing the quality associated with an erroneous set of data as bad, it is eliminated from the parameter calculation.

 

Default values for each of the four parameters are specified with S19 through S22. The default values can be periodically updated from the data tables by selecting the update time in hours with S24. If S24 is set to 0.0 there is no updating of the default parameters. The minimum update time is 18.0 hours. The default update is an important feature because the data table is stored in RAM and is lost on power down, module reset, or entering configuration mode. The default parameters are stored in NVRAM which is not affected by these interruptions of normal operation. Thus, when the module is started, real values are available. The default parameters are output after start-up, and until there are the specified number of good quality data sets (S9).

 

A reset input is also available. If it is set to one it marks all sets of data in the table to bad and makes the default parameters

S19 through S22 available at the output to the block.

 

 

155.1.1  Specifications

 

S1 - Y

Block address of dependent variable.

 

S2 - X1

Block address of independent variable X1.

 

S3 - X2

Block address of independent variable X2.

 

S4 - X3

Block address of independent variable X3.

 

S5 - X4

Block address of independent variable X4.

 

S6 - J

Number of independent variables (one to four) used for calculation. Select the number of variables from one to four used in the calculation.

 

S7 - K

Number of sets of data used for calculation. This identifies the number of sets of data to be drawn from to perform the calculation. There can be up to 32 sets.

 

S8 - MD

Minimum number of good sets of data required for calculation. The minimum number of good data sets required to perform the calculation is five.

 

S9 - MD1

Time and trigger mode flag. This specification defines the mode of data collection used. In the time mode, data is collected at a fixed interval of time specified with S10. In the trigger mode, data is collected each time the externally controlled collection trigger (S11) goes to one.

0 = trigger mode

1 = time mode

 

S10 - DT

Time in minutes between collections of data when the regression block is in the time collection mode (S9 equals one).

 

S11 - ET

Block address of the external collection trigger. This input determines when collections of data occur in the trigger mode (S9 equals zero). When this input makes a zero to one transition, the block reads the incoming data.

 

S12 - MD2

Data storage mode flag. This specification defines the data collection mode. In the bin mode, the system maintains a spread of data over a range of the independent variable X1,(S2). In the sequential mode, the newest set of data replaces the oldest set of data in the data table.

0 = sequential

1 = bin

 

S13 - HR

High end of the range of X1 for bin storage. If there is data stored in the bin mode, any input values greater than this number are discarded. If data storage is in the sequential mode, retain the default value.

 

S14 - LR

Low end of the range of X1 for bin storage. If storing data in the bin mode, input values less than this number are discarded. If storing data in the sequential mode, retain the default value.

 

S15 - MD3

Block address of calculate and edit mode switch. This value controls the operating mode of the regression block.

0 = calculate mode

1 = edit mode

 

S16 - EDN

Block address of the number of data sets from one to n viewable in the edit mode. This specification is only activated in the edit mode (S15 equals one). When in edit mode, the variables in the set selected with S16 output to blocks N+4 through N+8.

 

S17 - SQ

Block address of the quality switch. This specification is active only in edit mode. When <S17> changes from zero to one, the quality value of the row S16 specifies changes to the opposite quality. Good quality can be forced bad or bad quality, likewise, can be forced good.

 

1 = change quality

 

S18 - RS

Block address of the reset switch. When this value goes to one, all rows in the data table are marked bad quality, and the default parameter values from S19 through S22 are output from the block.

1 = reset

0 = normal

 

S19 - D1

Initial default value for parameter a1. If S24 is not equal to zero, the calculated value replaces the initial value at the interval specified with S24. If S24 equals zero, S19 equals default value.

 

S20 - D2

Initial default value for parameter a2. If S24 is not equal to zero, the calculated value replaces the initial value at the interval specified with S24. If S24 equals zero, S20 equals default value.

 

S21 - D3

Initial default value for parameter a3. If S24 is not equal to zero, the calculated value replaces the initial value at the interval specified with S24. If S24 equals zero, S21 equals default value.

 

S22 - D4

Initial default value for parameter a4. If S24 is not equal to zero, the calculated value replaces the initial value at the interval specified with S24. If S24 equals zero, S22 equals default value.

 

S23 - GF

Desired goodness of fit parameter. If the calculated values are not less than this value, they will not be output from the block. The calculated values will be discarded and the last set of successfully calculated values will be output. This input can be used to reject noisy data.

 

S24 - DEFUP

Default update period. At the end of this time, the calculated values of the parameters a1 to a4 are copied to the default parameters. The minimum update period is 18 hours.

 

S24 = 0, no update of default values.

S24 not equal 0 and > 18, the default values of the parameters will update at the end of the update period.

 

S25 to S27

Spare.

 

 

 

155.1.2  Outputs

 

N

Value of the first calculated parameter in both calculation and edit modes.

 

N+1

Value of the second calculated parameter in both calculation and edit modes.

 

N+2

Value of the third calculated parameter in both calculation and edit modes.

 

N+3

Value of the fourth calculated parameter in both calculation and edit modes.

 

N+4

Calculation Mode

 

 

 

 

Goodness of fit =

 

 

Edit Mode

Value of dependent variable Y.

 

N+6

Calculation Mode

Row number of maximum mismatch.

 

Edit Mode

Value of second independent variable X2.

 

N+7

Calculation Mode

Number of data rows with good quality.

 

Edit Mode

Value of third independent variable X3.

 

N+8

Calculation Mode

Time of last successful computation in mmddhh format with hours in military time.

 

Edit Mode

Value of fourth independent variable X4.

 

N+9

Calculation Mode

State of outputs:

1 = computed

0 = default; when the module is reset, all values in the data table are marked bad quality and the default values specified by S19 through S22 are output.

 

Edit Mode

Quality of the current data set (selected with S16):

1 = good quality included in computation

0 = bad quality excluded from computation

The current quality can be changed by setting S17 to one. This toggles the quality input to the opposite value.

 

 

155.2   Applications

 

The regression block can be used for economic optimization. It operates on functions described as linear, which means y is a linear function of a. This does not imply that y is a linear function of the measurements forming X. For instance, to identify the cost function of a steam generating unit, a quadratic form is employed.

 

y

=

cost

X1

=

steam flow

X2

=

steam flow2

X3

=

1

 

Y

=

a(1)X1 + a(2)X2 + a(3)1

 

 

 

The equation provides a steady state economic model used by an optimization program to minimize operating expenses.  Another application is modeling of the kinetic parameters in a batch reactor.

 

extent = a(1) x (time,temperature,concentration) + a(2)

 

The extent of the reaction is a laboratory measurement, and f(t,T,C) is a dimensionless group representing relative reaction rate computed by the module. The lab data is entered through the console or control station. The identification of the parameters allows on-line prediction of required batch reaction time, given measurements for temperature and initial concentration.

 

The two preceding applications represent models as power series. Linear regression can also compute model parameters for more complex function forms. The regression block correlates up to four independent variables to a single dependent variable.

 

For instance, the model m = b(1) x (pb(2)) x (qb(3)) contains two independent variables in a nonlinear relationship. The following equation results after taking the log of both sides, making the model linear in the parameters:

 

log(m) = log(b(1)) + b(2) log p + b(3) log(q)

 

making the definitions:

 

y = log(m), X1 = 1, X2 = log(p), and X3 = log(q)

 

The regression block finds the best parameter set a for the equation:

 

y = a(1)X1 + a(2)X2 + a(3)X3

 

 

Figure 155-1 shows the contours of equal m plotted in the p – q plane.

 

 

 

The type of data storage used depends on the situation. Sequential storage retains the last seven data sets and calculates the parameters from them. Use this mode when the correlation is expected to change. Bin storage retains the data sets that give evenly spaced sets across the entire range of the independent variable. It should be used whenever the correlation is not expected to change due to such things as sensor contamination. Bin storage should also be used when the independent variable changes over a wide range, but is not expected to assume all or nearly all of the range of values. For example, a machine that commonly runs at 60 to 80 percent load for extended periods of time would run most efficiently with bin storage. Bin storage retains values that fall within zero to 60 percent and 80 to 100 percent load while frequently updating the fall between values 60 to 80 percent load.

 

 

155.2.1  Regression Block Application Considerations

 

The regression block is very flexible. Determine which combination of data collection and storage techniques is needed for the application. A balance must be maintained when setting data collection, storage and acceptance specifications. More certainty and stability in the calculated coefficients is generally obtained at the expense of speedy adaptation to significant changes in process behavior.

 

Number of Data Sets

The specification of 32 data sets in the data table (the maximum for S7) provides more information about the process and stabilizes the value of calculated model coefficients. However, specification of 16 data sets would speed up adaptation of the coefficients if process behavior was expected to change rapidly.

 

Binned Data Storage

Segregating the data into bins according to the value of x1 (at S2) significantly increases the reliability of the calculated model coefficients. The bins insure that data collected over the entire range of interest for x1 is included in the coefficients, not just the most recent data. The most recent data may be concentrated around a long term operating point of the process, and the coefficients calculated from this data may not be representative of the process outside this operating point.

 

Figure 155-2 shows the effect of the data storage technique. This example shows two models of sampling 20 points of a known function. The function was distorted by the addition of random noise. The first model uses bins for data storage, the second model uses sequential data storage. The first model adequately represents the actual function over the entire range of interest while the second only represents the function in the local range of the data collected.

 

The increased reliability provided by binned data storage comes at the price of slower adaptation to process changes. If some data points are collected at operating ranges entered rarely, they corrupt the currentness of the curve fit. Narrowing the collection range speeds the adaptation process, but renders the coefficients inaccurate outside the range of data collection.

 

Maximum Residual

The maximum residual specified in S23 also has a strong effect on the ability of the regression block to adapt and its rate of adaptation. To reject noisy data, a small residual is desirable. However, if the residual is set too small, all new data will be rejected. To give the regression block some pliancy, a larger residual must be specified.