LIFE PREDICTION OF LOGARITHMIC ORMAL DISTRIBUTION BASED ON LSS
1. Introduction
When dealing with small data samples in lifetime 
prediction, there are fundamentally two different analysis 
methods: Bayes method and neural network. Bayes 
method is based on prior information to make up for the 
reliability data limitation [1]. However, it’s difficult to 
obtain prior knowledge for modern high reliable integrated 
circuit (IC). Accordingly, there are not extensive 
researches about the application of Bayes method in IC life 
prediction. Way Kuo [2] firstly used neural network to 
estimate Weibull distribution parameters for small 
reliability data samples. However, it was found that there 
were overfitting phenomena when using neural network 
with small samples. For small samples, a proper lifetime 
prediction method should be selected based on its 
generalization, which is defined as a method's ability to fit 
current data but also to predict future data [3]. Support 
vector machine (SVM) can be used to life prediction 
instead of neural network, which overcomes the overfitting 
phenomena and has good generalization for small data 
samples [4]. Least square support vector machine (LSSVM) 
algorithm originally proposed by Suykens [5] is a 
modification to SVM, which is easier to use than SVM 
because it adopts the least squares linear system as its loss 
function. Based on LSSVM, this paper developed a 
lifetime prediction method for IC when small data samples 
are available that obey logarithmic normal distribution.
In the rest of this paper, the LSSVM –based lifetime 
prediction approach description is presented in section 2. In 
section 3 Monte Carlo simulations are used to demonstrate 
this method. For comparison, Error back propagation (BP) 
neural network is also compared with it. The discussion and 
conclusions are given in section 42. Prediction based on LSSVM 
2.1. Problem description
In the conventional method of lifetime prediction, the 
first step is to obtain failure data from a test which places 
no constraints on the mean, variance or range. Based on the 
test results, a proper constraint is then imposed such that the 
data may statistically fit one of the known distributions. 
Then the known distribution parameters can be evaluated. 
Once the distribution parameters are evaluated accurately, 
the lifetime can be predicted according to the known 
distribution. However, when dealing with small data 
samples, it can’t be guaranteed to obtain accurate 
parameters estimate, any potential inaccuracy on
distribution parameters may lead to increase lifetime 
uncertainties [6]. Thus, in this paper, the LSSVM-based 
prediction method can directly predict lifetime of small data 
samples through building a least squares support vector 
regression model without the evaluation of distribution 
parameters.
2.2. Neural network
Neural networks have been studied by a lot of 
researches [7-11]. It is well known that a neural network 
is a universal estimator. It has in general, however, two 
main drawbacks for its learning process:
(1) The architecture, including the number of hidden 
neurons, has to be determined a priori or modified while 
training by heuristic.
(2) The training process in neural networks can easily 
be stuck by local minima. Various ways of preventing local 
minima, like early stopping, weight decay, etc., are 
employed. However, those methods greatly affect the 
generalization of the estimated function, i.e., the capacity of 
handling new input cases.
2.3. Support vector machines
SVM is an interdisciplinary field of optimization, 
statistical learning theory, machine learning and data 
mining technique [12-14]. Basically, it can be used for 
function estimation and pattern classification. Since the 
application of SVM in lifetime prediction is related in 
function estimation, the discussion is only related to 
functions estimation issues. 
SVM is a very useful methodology to formulate the 
mathematical program for the training error function used 
in any application. No matter which application, SVM 
formulates the training process as a quadratic programming 
problem for the weights with regularization factor included. 
Since quadratic programming problem is a convex function, 
the solution returned is global instead of many local ones, 
unlike neural networks. This result ensures the high 
generalization of the trained SVM regression models over 
neural networks.
Another important appear of SVM over other 
traditional regression methods is its ability to handle very 
high nonlinearity. Similar to nonlinear regression, SVM 
transforms the low dimensional nonlinear input data space 
into high-dimensional linear feature space through a 
nonlinear mapping. Then linear function estimation over the 
feature space can be performed. The problem now turns to 
find out this nonlinear mapping for its primal formulation. 
Nevertheless, SVM dual formulation provides an 
inner-product kernel trick, which totally eliminates the 
effort of finding the nonlinear mapping in the primal 
formulation as necessary in traditional nonlinear regression 
methods.
2.4. Least squares support vector regression
The major drawback of SVM is its higher 
computational burden because of the required constrained 
optimization programming. Major breakthrough has been 
obtained at this point with a least squares version of SVM, 
called LSSVM. In LSSVM, one works with equality 
instead of inequality constraints and a sum squared error 
cost function as it is frequently used in training of classical 
neural networks. This reformulation greatly simplifies the 
problem in such a way that the solution is characterized by 
a linear system, more precisely a Karush-Kuhn-Tucker 
system, which takes a similar form as the linear system that 
one solves in every iteration step by interior point methods 
for standard SVM. This linear system can be efficiently 
solved by iterative methods such as conjugate gradient. So, 
this paper uses LSSVM to predict lifetime instead of SVM.
For least squares support vector regression, what we 
care mostly is the generalization ability of learning machine. 
Good generalization ability means that it can’t only fit 
current data but also to predict future data. The 
generalization ability of LSSVM is based on the factors 
described in structural risk minimization, which has greater 
generalization ability and is superior to the empirical risk 
minimization as developed for large sample and adopted in 
neural networks in the problem of regression estimation. 
For LSSVM, one needs to minimize (1) in order to find the 
regression estimation:
 
N
i
i
T
i
w b e
J w e w w e
1
2
, , 2
1
2
1
min ( , )  (1)
where the first term is an estimate of the empirical risk, 
and the second is the confidence interval for estimation. 
There may be overfitting if one just minimizes the 
empirical risk. In this case, even if one could minimize 
the empirical risk down to zero, the amount of errors on the 
test set could be big. In order to avoid overfitting and 
generalize well, LSSVM minimizes both of empirical risk 
and confidence interval. For LSSVM, the regression 
estimation problem is formulated as the optimization 
problem:
 
N
i
i
T
i
w b e
J w e w w e
1
2
, , 2
1
2
1
min ( , )  (2)
subject to the equality constraints
y w (x ) b e ,i 1, ,N.
i i
T
i     
 (3)
With the application of the Mercer’s theorem on the 
kernel matrix
(as 
K x x x xj i j N
T ij  ( i, j )  ( i) ( ), ,
 1,,
), it isn’t 
required to compute explicitly the nonlinear mapping 
()
as this is done implicitly through the use of positive definite 
kernel functions 
K(xi,xj )
. Usually, several choices for 
K(xi,xj )
are possible. 
(1) Linear kernel:
j
T
i j i K(x , x )  x x
,
(2) Polynomial kernel: 
d
j
T
i j i K(x , x )  (x x / c 1)
(polynomial of degree d, with c 
a tuning parameter);
(3) Gaussian Radial Basis Function (RBF) kernel: 
( , ) exp( / 2 )
2
2
K xi
x j   xi  x j 
(
is a tuning 
parameter)
The solution of this optimization problem is obtained 
by Lagrangian function (4)
  
 
  
N
i
i
y i
b e i
x
Tw i
N
i
i
w e
Tw LS SVM
L
1
( ( ) )
1
2
2
1
2
1
 
 (4) 
where 
 i  R
are the Lagrange multipliers, the 
conditions for optimality are given by :
    
   
  
  
i i
T
i
i
LS SVM
i i i
i
LS SVM
N
i
i
LS SVM
N
i
i i
LS SVM
y w x b e
L
e i N
e
L
b
L
w x
w
L
0 ( )
0 , 1, ,
0 0
0 ( )
1
1
 
 
 (5)
After elimination of 
w
and 
i
e
, the following linear 
system is obtained:
1
0
 
   / 
1
T
y
b 0
 (6)
where 
T
N
y [y , , y ]  1  ,
T
N
[ , ]   1 
The LSSVM regression formulation is then 
constructed as follows:
 
N
i
y x iK x xi b
1
( )  ( , )
 (7)
where 
, b
are the solutions to (6). 
As we can see from (7), we can obtain the most 
suitable regression function that predicts the supervisor’s 
response if only we get 
, b
and kernel function K(x,xi).
2.5. Prediction of LSSVM
There are three steps in using LSSVM to predict 
lifetime, as follows:
(1) Obtain failure data. 
(2)Calculate failure rate from 
R R i n
n i
n i
R t
i i 0, 1,2, ,
ˆ
,
ˆ
2
1
( )
ˆ
1 0   
 
 
 
 
(8) 
(3) Build LSSVM regression model. In this step, we 
need select the optimal kernel function and kernel 
parameters. Once the optimal kernel function was selected, 
the kernel parameters were obtained using k-fold 
cross-validation techniques and grid-search method. In 
k-fold cross-validation, first the training set is divided into 
k subsets of equal size. Sequentially one subset is tested 
using the model trained on the remaining subsets. Thus, 
each instance of the whole training set is predicted once so 
the cross-validation accuracy is the mean square error 
(MSE) of actual output value and predictor value.
 
n
i
yi yi
n
MSE
1
2
( ˆ )
1
 (9)
where 
y i
represents ture value, and 
y i
ˆ
represents 
estimate value.
3. Monte Carlo simulation
In this simulation, we use the LSSVM software 
package based on Matlab to perform the model training and 
testing. A complete sample is generated from the 
Logarithmic normal distribution. The sample size is 15 and 
the failure times are shown in table 1:
TABLE 1. Failure data obtained from 
N Failure time(s) N Failure time(s)
1
92.8841 9
154.4818
2
98.6804 10
156.4329
3
100.1298 11
188.8626
4
134.6422 12
192.9138
5
136.5316 13
199.7002
6
137.1000 14
203.9489
7
147.9441 15
216.1558
8
154.1209 18 154.4818
3.1. Kernel function selection
In order to select the optimal kernel function, we 
conducted two different simulation experiments with linear 
kernel function and RBF kernel respectively. Simulation 1
had 10 training dataset and 5 testing dataset. Simulation 2
had 5 training dataset and 10 testing dataset.
The parameters
2 
and
of RBF kernel are estimated 
by 10-fold cross-validation technique using the following 
steps: 
1. Set aside 2/3 of the data for the training/validation 
set and the remaining 1/3 for testing. 
2. Starting from i=0, perform 10-fold cross-validation 
on the training/validation data for each (
2  ,
) 
combination from the initial candidate tuning sets 
 
0
{0.5,5,10,15,25,50,100,250,500}
and
5000,10000}
{0.01,0.05,0.1,0.5,1,5,10,50,100,500,1000, 0 
.
3. Choose optimal (
2  ,
) from the tuning sets 
i
and 
i
by looking at the best cross-validation 
performance for each (
2  ,
) combination.
4. If i=imax (usually imax=3 ), go to step 5; eles i:=i+1, 
construct a locally refined grid 
i ×
i
around the 
optimal parameters (
2  ,
) and go to step 3.
5. Construct the regression LS-SVM using the total 
data set for the optimal choice of the tuned parameters 
(
2  ,
).
6. Assess the test set accuracy by means of mean 
square error (MSE).
The linear kernel parameter is estimated by 10-fold 
cross-validation technique using the following steps: 
1. Set aside 2/3 of the data for the training/validation 
set and the remaining 1/3 for testing. 
2. Starting from i=0, perform 10-fold cross-validation 
on the training/validation data for each
combination 
from the initial candidate tuning set
20000}
{0.5,1,5,10,50,100,500,1000,5000,10000,12000,15000. 0 
 
3. Choose optimal 
from the tuning set
0
by looking 
at the best cross-validation performance for each
combination.
4. If i=imax (usually imax=3 ), go to step 5; eles i:=i+1, 
construct a locally refined grid 
0
around the optimal 
parameters 
and go to step 3.
5. Construct the regression LS-SVM using the total 
data set for the optimal choice of the tuned parameters
.
6. Assess the test set accuracy by means of mean 
square error (MSE).
The prediction estimator of testing database of 
simulation experiment 1 are shown in Fig. 1. The prediction 
estimator of testing database of simulation experiment 2 are 
shown in Fig. 2.
FIGURE 1. Prediction value of testing data sets of simulation 1
FIGURE 2. Prediction value of testing data sets of simulation 2
4. Conclusions
Logarithmic normal distribution is often used to 
describe the failure characteristic of IC. It is not unusual 
to only have a small set of reliability data samples for 
microelectronics devices in nano era. Monte Carlo
simulations indicate that LSSVM prediction method is 
effective for analyzing small sample reliability data. 
Comparing with BP neural network, LSSVM prediction
method has two advantages:
(1) It improves the effectiveness and accuracy of 
lifetime prediction method based on LSSVM of small 
sample reliability data. And the accuracy of estimation is 
higher of LSSVM method than BP neural network method 
for small samples. The reason lies in the different 
statistical principle between LSSVM and BP neural 
network. BP neural network is based on empirical risk 
minimization principle. However, LSSVM is based on 
structural risk minimization principle, which minimizes 
both of empirical risk and confidence interval.
(2) The simulation experiments show that lifetime 
prediction method based on LSSVM is suitable for 
predicting lifetime of small failure data samples that obey 
logarithmic normal distribution. When 10 failure data 
samples are used to build LSSVM model, the estimator of 
LSSVM prediction method corresponds closely to the 
actual value. Even when only 5 failure data samples are 
used to build LSSVM model, the estimator of LSSVM 
prediction method still corresponds closely to the actual 
value.
In conclusion, a method, for life prediction of small 
samples data from logarithmic normal distribution, is 
developed based on LSSVM. If only an optimal 
regression model is trained, the high accurate of estimation 
can be obtained even for a small sample data. So the 
LSSVM method can be used for life prediction study for 
failure data from logarithmic normal distribution.
- 点赞
 - 收藏
 - 关注作者
 
            
           
评论(0)