# LIFE PREDICTION OF LOGARITHMIC ORMAL DISTRIBUTION BASED ON LSS

1. Introduction

When dealing with small data samples in lifetime

prediction, there are fundamentally two different analysis

methods: Bayes method and neural network. Bayes

method is based on prior information to make up for the

reliability data limitation [1]. However, it’s difficult to

obtain prior knowledge for modern high reliable integrated

circuit (IC). Accordingly, there are not extensive

researches about the application of Bayes method in IC life

prediction. Way Kuo [2] firstly used neural network to

estimate Weibull distribution parameters for small

reliability data samples. However, it was found that there

were overfitting phenomena when using neural network

with small samples. For small samples, a proper lifetime

prediction method should be selected based on its

generalization, which is defined as a method's ability to fit

current data but also to predict future data [3]. Support

vector machine (SVM) can be used to life prediction

instead of neural network, which overcomes the overfitting

phenomena and has good generalization for small data

samples [4]. Least square support vector machine (LSSVM)

algorithm originally proposed by Suykens [5] is a

modification to SVM, which is easier to use than SVM

because it adopts the least squares linear system as its loss

function. Based on LSSVM, this paper developed a

lifetime prediction method for IC when small data samples

are available that obey logarithmic normal distribution.

In the rest of this paper, the LSSVM –based lifetime

prediction approach description is presented in section 2. In

section 3 Monte Carlo simulations are used to demonstrate

this method. For comparison, Error back propagation (BP)

neural network is also compared with it. The discussion and

conclusions are given in section 42. Prediction based on LSSVM

2.1. Problem description

In the conventional method of lifetime prediction, the

first step is to obtain failure data from a test which places

no constraints on the mean, variance or range. Based on the

test results, a proper constraint is then imposed such that the

data may statistically fit one of the known distributions.

Then the known distribution parameters can be evaluated.

Once the distribution parameters are evaluated accurately,

the lifetime can be predicted according to the known

distribution. However, when dealing with small data

samples, it can’t be guaranteed to obtain accurate

parameters estimate, any potential inaccuracy on

distribution parameters may lead to increase lifetime

uncertainties [6]. Thus, in this paper, the LSSVM-based

prediction method can directly predict lifetime of small data

samples through building a least squares support vector

regression model without the evaluation of distribution

parameters.

2.2. Neural network

Neural networks have been studied by a lot of

researches [7-11]. It is well known that a neural network

is a universal estimator. It has in general, however, two

main drawbacks for its learning process:

(1) The architecture, including the number of hidden

neurons, has to be determined a priori or modified while

training by heuristic.

(2) The training process in neural networks can easily

be stuck by local minima. Various ways of preventing local

minima, like early stopping, weight decay, etc., are

employed. However, those methods greatly affect the

generalization of the estimated function, i.e., the capacity of

handling new input cases.

2.3. Support vector machines

SVM is an interdisciplinary field of optimization,

statistical learning theory, machine learning and data

mining technique [12-14]. Basically, it can be used for

function estimation and pattern classification. Since the

application of SVM in lifetime prediction is related in

function estimation, the discussion is only related to

functions estimation issues.

SVM is a very useful methodology to formulate the

mathematical program for the training error function used

in any application. No matter which application, SVM

formulates the training process as a quadratic programming

problem for the weights with regularization factor included.

Since quadratic programming problem is a convex function,

the solution returned is global instead of many local ones,

unlike neural networks. This result ensures the high

generalization of the trained SVM regression models over

neural networks.

Another important appear of SVM over other

traditional regression methods is its ability to handle very

high nonlinearity. Similar to nonlinear regression, SVM

transforms the low dimensional nonlinear input data space

into high-dimensional linear feature space through a

nonlinear mapping. Then linear function estimation over the

feature space can be performed. The problem now turns to

find out this nonlinear mapping for its primal formulation.

Nevertheless, SVM dual formulation provides an

inner-product kernel trick, which totally eliminates the

effort of finding the nonlinear mapping in the primal

formulation as necessary in traditional nonlinear regression

methods.

2.4. Least squares support vector regression

The major drawback of SVM is its higher

computational burden because of the required constrained

optimization programming. Major breakthrough has been

obtained at this point with a least squares version of SVM,

called LSSVM. In LSSVM, one works with equality

instead of inequality constraints and a sum squared error

cost function as it is frequently used in training of classical

neural networks. This reformulation greatly simplifies the

problem in such a way that the solution is characterized by

a linear system, more precisely a Karush-Kuhn-Tucker

system, which takes a similar form as the linear system that

one solves in every iteration step by interior point methods

for standard SVM. This linear system can be efficiently

solved by iterative methods such as conjugate gradient. So,

this paper uses LSSVM to predict lifetime instead of SVM.

For least squares support vector regression, what we

care mostly is the generalization ability of learning machine.

Good generalization ability means that it can’t only fit

current data but also to predict future data. The

generalization ability of LSSVM is based on the factors

described in structural risk minimization, which has greater

generalization ability and is superior to the empirical risk

minimization as developed for large sample and adopted in

neural networks in the problem of regression estimation.

For LSSVM, one needs to minimize (1) in order to find the

regression estimation:

N

i

i

T

i

w b e

J w e w w e

1

2

, , 2

1

2

1

min ( , ) (1)

where the first term is an estimate of the empirical risk,

and the second is the confidence interval for estimation.

There may be overfitting if one just minimizes the

empirical risk. In this case, even if one could minimize

the empirical risk down to zero, the amount of errors on the

test set could be big. In order to avoid overfitting and

generalize well, LSSVM minimizes both of empirical risk

and confidence interval. For LSSVM, the regression

estimation problem is formulated as the optimization

problem:

N

i

i

T

i

w b e

J w e w w e

1

2

, , 2

1

2

1

min ( , ) (2)

subject to the equality constraints

y w (x ) b e ,i 1, ,N.

i i

T

i

(3)

With the application of the Mercer’s theorem on the

kernel matrix

(as

K x x x xj i j N

T ij ( i, j ) ( i) ( ), ,

1,,

), it isn’t

required to compute explicitly the nonlinear mapping

()

as this is done implicitly through the use of positive definite

kernel functions

K(xi,xj )

. Usually, several choices for

K(xi,xj )

are possible.

(1) Linear kernel:

j

T

i j i K(x , x ) x x

,

(2) Polynomial kernel:

d

j

T

i j i K(x , x ) (x x / c 1)

(polynomial of degree d, with c

a tuning parameter);

(3) Gaussian Radial Basis Function (RBF) kernel:

( , ) exp( / 2 )

2

2

K xi

x j xi x j

(

is a tuning

parameter)

The solution of this optimization problem is obtained

by Lagrangian function (4)

N

i

i

y i

b e i

x

Tw i

N

i

i

w e

Tw LS SVM

L

1

( ( ) )

1

2

2

1

2

1

(4)

where

i R

are the Lagrange multipliers, the

conditions for optimality are given by :

i i

T

i

i

LS SVM

i i i

i

LS SVM

N

i

i

LS SVM

N

i

i i

LS SVM

y w x b e

L

e i N

e

L

b

L

w x

w

L

0 ( )

0 , 1, ,

0 0

0 ( )

1

1

(5)

After elimination of

w

and

i

e

, the following linear

system is obtained:

1

0

/

1

T

y

b 0

(6)

where

T

N

y [y , , y ] 1 ,

T

N

[ , ] 1

The LSSVM regression formulation is then

constructed as follows:

N

i

y x iK x xi b

1

( ) ( , )

(7)

where

, b

are the solutions to (6).

As we can see from (7), we can obtain the most

suitable regression function that predicts the supervisor’s

response if only we get

, b

and kernel function K(x,xi).

2.5. Prediction of LSSVM

There are three steps in using LSSVM to predict

lifetime, as follows:

(1) Obtain failure data.

(2)Calculate failure rate from

R R i n

n i

n i

R t

i i 0, 1,2, ,

ˆ

,

ˆ

2

1

( )

ˆ

1 0

(8)

(3) Build LSSVM regression model. In this step, we

need select the optimal kernel function and kernel

parameters. Once the optimal kernel function was selected,

the kernel parameters were obtained using k-fold

cross-validation techniques and grid-search method. In

k-fold cross-validation, first the training set is divided into

k subsets of equal size. Sequentially one subset is tested

using the model trained on the remaining subsets. Thus,

each instance of the whole training set is predicted once so

the cross-validation accuracy is the mean square error

(MSE) of actual output value and predictor value.

n

i

yi yi

n

MSE

1

2

( ˆ )

1

(9)

where

y i

represents ture value, and

y i

ˆ

represents

estimate value.

3. Monte Carlo simulation

In this simulation, we use the LSSVM software

package based on Matlab to perform the model training and

testing. A complete sample is generated from the

Logarithmic normal distribution. The sample size is 15 and

the failure times are shown in table 1:

TABLE 1. Failure data obtained from

N Failure time(s) N Failure time(s)

1

92.8841 9

154.4818

2

98.6804 10

156.4329

3

100.1298 11

188.8626

4

134.6422 12

192.9138

5

136.5316 13

199.7002

6

137.1000 14

203.9489

7

147.9441 15

216.1558

8

154.1209 18 154.4818

3.1. Kernel function selection

In order to select the optimal kernel function, we

conducted two different simulation experiments with linear

kernel function and RBF kernel respectively. Simulation 1

had 10 training dataset and 5 testing dataset. Simulation 2

had 5 training dataset and 10 testing dataset.

The parameters

2

and

of RBF kernel are estimated

by 10-fold cross-validation technique using the following

steps:

1. Set aside 2/3 of the data for the training/validation

set and the remaining 1/3 for testing.

2. Starting from i=0, perform 10-fold cross-validation

on the training/validation data for each (

2 ,

)

combination from the initial candidate tuning sets

0

{0.5,5,10,15,25,50,100,250,500}

and

5000,10000}

{0.01,0.05,0.1,0.5,1,5,10,50,100,500,1000, 0

.

3. Choose optimal (

2 ,

) from the tuning sets

i

and

i

by looking at the best cross-validation

performance for each (

2 ,

) combination.

4. If i=imax (usually imax=3 ), go to step 5; eles i:=i+1,

construct a locally refined grid

i ×

i

around the

optimal parameters (

2 ,

) and go to step 3.

5. Construct the regression LS-SVM using the total

data set for the optimal choice of the tuned parameters

(

2 ,

).

6. Assess the test set accuracy by means of mean

square error (MSE).

The linear kernel parameter is estimated by 10-fold

cross-validation technique using the following steps:

1. Set aside 2/3 of the data for the training/validation

set and the remaining 1/3 for testing.

2. Starting from i=0, perform 10-fold cross-validation

on the training/validation data for each

combination

from the initial candidate tuning set

20000}

{0.5,1,5,10,50,100,500,1000,5000,10000,12000,15000. 0

3. Choose optimal

from the tuning set

0

by looking

at the best cross-validation performance for each

combination.

4. If i=imax (usually imax=3 ), go to step 5; eles i:=i+1,

construct a locally refined grid

0

around the optimal

parameters

and go to step 3.

5. Construct the regression LS-SVM using the total

data set for the optimal choice of the tuned parameters

.

6. Assess the test set accuracy by means of mean

square error (MSE).

The prediction estimator of testing database of

simulation experiment 1 are shown in Fig. 1. The prediction

estimator of testing database of simulation experiment 2 are

shown in Fig. 2.

FIGURE 1. Prediction value of testing data sets of simulation 1

FIGURE 2. Prediction value of testing data sets of simulation 2

4. Conclusions

Logarithmic normal distribution is often used to

describe the failure characteristic of IC. It is not unusual

to only have a small set of reliability data samples for

microelectronics devices in nano era. Monte Carlo

simulations indicate that LSSVM prediction method is

effective for analyzing small sample reliability data.

Comparing with BP neural network, LSSVM prediction

method has two advantages:

(1) It improves the effectiveness and accuracy of

lifetime prediction method based on LSSVM of small

sample reliability data. And the accuracy of estimation is

higher of LSSVM method than BP neural network method

for small samples. The reason lies in the different

statistical principle between LSSVM and BP neural

network. BP neural network is based on empirical risk

minimization principle. However, LSSVM is based on

structural risk minimization principle, which minimizes

both of empirical risk and confidence interval.

(2) The simulation experiments show that lifetime

prediction method based on LSSVM is suitable for

predicting lifetime of small failure data samples that obey

logarithmic normal distribution. When 10 failure data

samples are used to build LSSVM model, the estimator of

LSSVM prediction method corresponds closely to the

actual value. Even when only 5 failure data samples are

used to build LSSVM model, the estimator of LSSVM

prediction method still corresponds closely to the actual

value.

In conclusion, a method, for life prediction of small

samples data from logarithmic normal distribution, is

developed based on LSSVM. If only an optimal

regression model is trained, the high accurate of estimation

can be obtained even for a small sample data. So the

LSSVM method can be used for life prediction study for

failure data from logarithmic normal distribution.

- 点赞
- 收藏
- 关注作者

评论（0）

0/1000