You are on page 1of 5

Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Probability of Misclassification Under Nelly


Distribution Using Optimal Rule
Nnabude Chinelo Ijeoma1
S. I. Onyeagu (Professor)2
Dr. C. H. Nwankwo3
Department of Statistics, Faculty of Physical Sciences, Nnamdi Azikiwe University,
Awka, Anambra State, Nigeria

Abstract:- Errors of misclassification and their (2𝜆𝑖 )∝ 𝑋 ∝−1 𝑒 −2𝜆𝑖𝑥


probabilities are studied for classification problems 𝑓𝑖 (𝑋) = 𝑖 = 1,2 (1)
Г(∝)
associated with univariate Nelly distribution. The effect
of applying the linear discriminant function (LDF) based  The Robustness of the Linear Discriminant Function
on normality to Nelly populations are assessed by will be Examined in two ways
comparing probability (Optimum) based on the linear
discriminant function (LDF) with those based on the  Supposing that in classifying an observation X from (1),
likelihood ratio rule (LR) for the Nelly. Both theoretical the linear discriminant function (LDF) derived under
and empirical results are presented. the assumption of normality, how are the optimum
(based on all parameters being known) probability of
Keywords:- Errors of Misclassification, Nelly Distribution, misclassification affected?
Linear Discriminant Function, Likelihood Ratio Rate, Error  The optimum probability of misclassification based on
Rate. the likelihood ratio rule will be compared with those
obtained from linear discriminant function.
I. INTRODUCTION
II. THE CLASSIFICATION RULES
The linear discriminant function (LDF) when used to
categorize an observation that belongs to one of two normal A classification rule or classifier is a process by which
population, has numerous advantageous qualities in the elements of the population set are each expected to
classification issues. The parent populations’ multivariate belong to one of the classes given a population whose
normal distributions are assured in the majority of the study. members each belong to one of a number of different sets or
In the univariate case (denoted by 𝑁(𝜇, 𝜎 2 )), the classes. Every component of the population is assigned to
classification problem has been studied by John [6] and the class which it actually belongs in a flawless
Sedransk and Okamoto [10]. Numerous kinds of non- categorization. If certain flaws are present in the
normality have t heir effects studied. The linear discriminant classification, statistical analysis must be used to examine
function’s resilience when the underlining distribution is a the classification. The general solution to the classification
part of Johnson’s system was investigated by Lachenbruch rule is to minimize the total probability of misclassification.
et. al (1975). Further exploring this issue, Ching’anda [4], (Anderson, 1958)
Ching’anda and Subrohmaniam [5] developed distributions
based on sizable samples for both conditional and Suppose that 𝑓𝑖 (𝑥) is the density function of 𝑋 if it
unconditional probabilities of misclassification. Similar comes from the population
studies have been done for the inverse Gaussian distribution
by Amoh and Kocherlakota [2] and Gamma distribution by Πi (𝑖 = 1,2) and we assign 𝑋 to Π1 if 𝑋 is in some
Mahmoud and Moustafa [9].
region 𝑅1 and to Π2 if 𝑋 is in some region 𝑅2.
In this paper, we will consider the probability of
We assume 𝑅1 ∩ 𝑅2 = ∅, 𝑅1 ∪ 𝑅2 = 𝑅
misclassification using optimal rule, when we have two
Let 𝑃𝑖 (𝑖 = 1, 2) be the proportion (Bayes assumption)
classes and sampling from Nelly distributions. The
of population Π1. 𝑃1 + 𝑃2 = 1. The total probability of
probability of misclassification distributions is examined.
misclassification is
The parameters 𝑋1 and 𝑋2 are known coming from Nelly
distributions.

𝐸 = 𝑃1 ∫ 𝑓1 (𝑥) 𝑑𝑥 + 𝑃2 ∫ 𝑓2 (𝑥) 𝑑𝑥
𝑅2 𝑅1

IJISRT23MAY669 www.ijisrt.com 1456


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

𝐸 = 𝑃1 [1 − ∫ 𝑓1 (𝑥) 𝑑𝑥 + 𝑃2 ∫ 𝑓2 (𝑥) 𝑑𝑥 ]
𝑅1 𝑅1

= 𝑃1 − [𝑃1 ∫ 𝑓1 (𝑥) + 𝑃2 ∫ 𝑓2 (𝑥) 𝑑𝑥]


𝑅1 𝑅1

𝐸 = 𝑃1 + ∫𝑅 [𝑃2 𝑓2 (𝑥) − 𝑃1 𝑓1 (𝑥)] 𝑑𝑥 (2)


1

Where 𝐸 is minimized (Neyman Pearson Lemma) if 𝑅1 included the points 𝑋 such that [𝑃2 𝑓2 − 𝑃1 𝑓1 ] < 0 and excludes the
points for which [𝑃2 𝑓2 − 𝑃1 𝑓1 ] > 0. Thus, the classification rule is

𝑓1 𝑃2
𝑅1 ∶ ≥
𝑓2 𝑃1

𝑓1 𝑃2
𝑅2 ∶ <
𝑓2 𝑃1
1
In what follows, assume 𝑃1 = 𝑃2 = 2 , it is well known that if 𝑃1 = 𝑃2 and 𝑓𝑖 (𝑥) is univariate normal; the classification
rule given above is equivalent to Fisher’s Linear Discriminant function. (Lachenbruch, 1975).

 Linear Discriminant Function for the Univariate Normal Distribution


(known 𝜇1 ≠ 𝜇2, and the same Variance 𝜎 2)

let the probability density function of 𝑋 in 𝜋𝑖 (𝑖 = 1, 2) be

1 1 𝑋 − 𝜇𝑖 2
𝑓𝑖 (𝑥) = 𝑒𝑥𝑝 [− ( ) ] − ∞ < 𝑥 < ∞, 𝑖 = 1,2
𝜎√2𝜋 2 𝜎

If 𝜃 is the mean of the observation X and

𝐻0 ∶ 𝜃 = 𝜇1 vs

𝐻𝑎 : 𝜃 ≠ 𝜇2

Then the likelihood when 𝜇1 < 𝜇2

𝑓1 (𝑥) 1 𝑋 − 𝜇1 2 1 𝑋 − 𝜇2 2
𝐿= = 𝑒𝑥𝑝 [− ( ) + ( ) ]
𝑓2 (𝑥) 2 𝜎 2 𝜎

−1 𝑋 − 𝜇1 2 1 𝑋 − 𝜇2 2
𝐿′ = ( ) + ( )
2 𝜎 2 𝜎
−1
= [2𝑋 − (𝜇1 + 𝜇2 )](𝜇2 − 𝜇1 )
2𝜎 2
1 𝜇1 − 𝜇2
= [𝑋 − (𝜇1 + 𝜇2 )] ( ) (3)
2 𝜎

The above equation is the Anderson discriminant function, when the distributions in the populations are univariate normal
with the same variance but different means. 𝐻𝑜 is rejected 𝑖𝑓 𝐿 < 𝐾, where K is a constant from equation 3.

The classification rule specifies as


1
Classify 𝑋 ∈ 𝜋1 𝑖𝑓 𝑋 < 2 (𝜇1 + 𝜇2 ) and

1
Classify 𝑋 ∈ 𝜋2 𝑖𝑓 𝑋 ≥ (𝜇1 + 𝜇2 )
2

IJISRT23MAY669 www.ijisrt.com 1457


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Similarly, when 𝜇1 > 𝜇2 the classification rule becomes
1
Classify 𝑋 ∈ 𝜋2 𝑖𝑓 𝑋 < 2 (𝜇1 + 𝜇2 ) and

1
Classify 𝑋 ∈ 𝜋1 𝑖𝑓 𝑋 ≥ (𝜇1 + 𝜇2 )
2

 Derivation of Classification Rule for Nelly Distribution


We assume that the distributions of X in 𝜋𝑖 is given by Nelly distribution. Then the classification rule for Nelly distribution
is

(2𝜆𝑖 )∝ 𝑋 ∝−1 𝑒 −2𝜆𝑖𝑥


𝑓𝑖 (𝑋) =
Г(∝)

(2𝜆1 )∝ 𝑋∝−1 𝑒−2𝜆1𝑥


𝑓1 (𝑋) Г(∝)
𝐿= 𝑓 (𝑋)
= (2𝜆2 )∝ 𝑋∝−1 𝑒−2𝜆2𝑥
(4)
2
Г(∝)

𝜆 ∝
= ( 1) 𝑒 2(𝜆2− 𝜆1)𝑥 (5)
𝜆2

𝑓1
Since 𝐼𝑛 ≤ 𝐼𝑛𝑘
𝑓2
𝜆1
2(𝜆2 − 𝜆1 )𝑥 ≤ 𝐼𝑛 𝑘− ∝ 𝐼𝑛 ( )
𝜆2

1 𝜆1
𝑋 ≤ [𝐼𝑛 𝑘− ∝ 𝐼𝑛 ( )]
2(𝜆2 − 𝜆1 ) 𝜆2

𝐼𝑛 𝑘 𝛼 𝜆
≤ 2(𝜆2 − 𝜆1 )
- 2(𝜆 𝐼𝑛 (𝜆1)
2 − 𝜆1 ) 2

The classification rule would be

𝑅1 = {𝑋 ; 𝑥 ≥ 𝐵 𝑖𝑓 𝜇1 ≥ 𝜇2 }

𝛼 𝜆
Where 𝐵 𝑖𝑠 = 𝐼𝑛 (𝜆1)
2(𝜆2 − 𝜆1 ) 2

The optimum probabilities of misclassification of Nelly distribution using Linear discriminant function (LDF) and
Likelihood ratio (LR)

For LDF we have

𝐸12 = 𝑃 {𝑋 < 𝐴 / 𝑋 ∈ 𝜋1 ; 𝜇1 , 𝜇2 } 𝑖𝑓 𝜇1 > 𝜇2 and 𝐸21 (𝜇1, 𝜇2) similarly defined.

𝛼
The cumulative distribution function of Nelly distribution with parameters 𝜇1 and 𝛼 (𝜇𝑖 = ) is given by
2𝜆

𝛶(∝, 𝜇𝑥)
𝐹(𝑥, ∝, 𝜇) = = 𝑃(∝, 𝜇𝑥) (6)
Г(∝)
(−1)𝑛 (𝜇𝑥)∝+𝑛
Where 𝛶(∝, 𝜇𝑥) = ∑∞
𝑛=0 𝑛!(∝ +𝑛)

Then using equation 1 we have

(𝜇1 + 𝜇2 )
𝐸12 = 𝐹(𝐴: ∝, 𝜇1 ), 𝜇1 > 𝜇2 : 𝐴 =
2

Therefore, the total optimum probability of misclassification using LDF is given as

IJISRT23MAY669 www.ijisrt.com 1458


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
(𝐸12 + 𝐸21 )
𝐸= 2
𝑓𝑜𝑟 𝑃1 = 𝑃2 = 0.5 (7)

 For Likelihood Ratio



𝐸12 = 𝑃{𝑋 < 𝐵 ǀ 𝑋 ∈ 𝜋1 } = 𝐹(𝐵, ∝, 𝜇), 𝑖𝑓 𝜇1 > 𝜇2

𝛼 𝜆
Recall B = 𝐼𝑛 ( 1)
2(𝜆2 − 𝜆1 ) 𝜆2

Therefore, the total probability of misclassification using LR is given as


∗ + 𝐸∗ )
(𝐸12
𝐸∗ = 2
21
𝑓𝑜𝑟 𝑃1 = 𝑃2 = 0.5 (8)

Therefore, the probabilities of misclassification based on the linear discriminant function (LDF) and likelihood ratio (LR )
rules for various combination of the parameter 𝜆1 , 𝜆2 𝑎𝑛𝑑 ∝ the values chosen are 𝜆1 = 1.0, 𝜆2 = 2.0, 3.0, 4.0 𝑎𝑛𝑑 ∝ =
2.0, 3.0, 4.0, 5.0,…, 15.0

III. RESULTS

Table 1 Comparison of the optimum probability of misclassification based on the LDF and LR when 𝜆1=1 and 𝜆2=2
Parameters Linear Discriminant Function (LDF) Likelihood Ratio (LR)
𝜆1 𝜆2 𝛼 𝐸12 𝐸21 𝐸 𝐸12 ∗ 𝐸21 ∗ 𝐸∗
1 2 2 0.1734 0.0549 0.1142 0.1398 0.0431 0.0914
3 0.4789 0.1078 0.2934 0.3674 0.0771 0.2223
4 0.1156 0.0075 0.0095 0.0830 0.0556 0.0693
5 0.0115 0.0087 0.0101 0.1708 0.1314 0.1511
6 0.0072 0.0052 0.0062 0.1981 0.1474 0.1727
7 0.0059 0.0055 0.0057 0.3880 0.3605 0.3743
8 0.0053 0.0037 0.0045 0.5117 0.3650 0.4382
9 0.0010 0.0007 0.0009 0.2141 0.1572 0.1856
10 0.0005 0.0006 0.0005 0.3322 0.3431 0.3376
11 0.0004 0.0003 0.0004 0.4437 0.3603 0.4020
12 0.00001 0.00004 0.00001 0.1039 0.0282 0.0661
13 0.00001 0.000002 0.000006 0.1171 0.0303 0.0737
14 0.000005 0.000001 0.000003 0.1210 0.0279 0.0745
15 0.0000003 0.0000006 0.0000001 0.1113 0.0212 0.0663

Table 1 gives a comparison between the linear discriminant function of Nelly distribution and likelihood ratio of Nelly
distribution when all the parameters are known; that is (𝜆1 =1, 𝜆2=2 and α =2 to 15)

Table 2 Comparison of the Optimum Probability of Misclassification based on the LDF and LR when 𝜆1=1 and 𝜆2=3
Parameters Linear Discriminant Function (LDF) Likelihood Ratio (LR)
𝜆1 𝜆2 𝛼 𝐸12 𝐸21 𝐸 𝐸12 ∗ 𝐸21 ∗ 𝐸∗
1 3 2 0.1443 0.0213 0.0828 0.0370 0.0047 0.0209
3 0.3823 0.0288 0.2055 0.0626 0.0032 0.0329
4 0.0093 0.0005 0.0049 0.2030 0.0047 0.1041
5 0.0029 0.0010 0.0019 0.0228 0.0084 0.0156
6 0.0022 0.0011 0.0017 0.0484 0.0256 0.0370
7 0.0018 0.0009 0.0014 0.0759 0.0412 0.0585
8 0.0009 0.0006 0.0007 0.1000 0.0699 0.0850
9 0.0008 0.0005 0.0007 0.1702 0.1134 0.1418
10 0.0007 0.0006 0.0007 0.3588 0.3036 0.3312
11 0.0004 0.0003 0.0004 0.3955 0.3245 0.3599
12 0.0003 0.0002 0.0003 0.5845 0.4462 0.5154
13 0.0002 0.0001 0.0002 0.6956 0.5181 0.6069
14 0.000001 0.000001 0.000001 0.9051 0.6947 0.7999
15 0.000001 0.0000006 0.000001 0.000004 0.000006 0.000003

IJISRT23MAY669 www.ijisrt.com 1459


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Table 2 gives a comparison between the linear discriminant function of Nelly distribution and likelihood ratio of Nelly
distribution when all the parameters are known; that is (𝜆1 =1, 𝜆2=3 and α =2 to 15)

Table 3 Comparison of the Optimum Probability of Misclassification based on the LDF and LR when 𝜆1=1 and 𝜆2=4
parameters Linear discriminant function (LDF) Likelihood ratio (LR)
𝜆1 𝜆2 𝛼 𝐸12 𝐸21 𝐸 𝐸12 ∗ 𝐸21 ∗ 𝐸∗
1 4 2 0.0161 0.0071 0.0116 0.0237 0.0105 0.0171
3 0.3360 0.0111 0.1736 0.0106 0.0002 0.0054
4 0.0034 0.0010 0.0022 0.0123 0.0039 0.0081
5 0.0053 0.0013 0.0033 0.0262 0.0065 0.0164
6 0.0040 0.0008 0.0024 0.0326 0.0071 0.0199
7 0.0013 0.0003 0.0008 0.0260 0.0069 0.0165
8 0.0007 0.0002 0.0005 0.0383 0.0136 0.0259
9 0.0006 0.0002 0.0004 0.0711 0.0268 0.0489
10 0.0006 0.0004 0.0005 0.2196 0.1469 0.1833
11 0.0004 0.0001 0.0003 0.1660 0.0664 0.1162
12 0.0003 0.0002 0.0002 0.3967 0.2600 0.3283
13 0.0002 0.0001 0.0002 0.6532 0.5239 0.5886
14 0.000001 0.000006 0.000008 0.6103 0.4085 0.5094
15 0.0000002 0.0000005 0.0000001 0.0000003 0.0000008 0.0000006

Table 3 gives a comparison between the linear discriminant function of Nelly distribution and likelihood ratio of Nelly
distribution when all the parameters are known; that is (𝜆1 =1, 𝜆2=4 and α =2 to 15)

IV. CONCLUSION [9]. Mahmoud and Moustafa (1995). Estimation of a


discriminant function from a mixture of two gamma
Based on the result of the analysis of data, it can be distributions when sample size is small, Mathl.
concluded that the likelihood ratio of the Nelly distribution Comput. Modelling 18 (5), 87-95 (1993).
performs better than its linear discriminant function [10]. Sedransk, N and Okamoto, M. (1971). Estimation of
the probability of misclassification for a linear
REFERENCES discriminant function in the univariate normal case.
Annals of the institute of statistical mathematics 23,
[1]. Abramowitz, M. and Stegun, I. A. (1970) Editors, 419 - 43
Handbook of Mathematical Functions, Dover, New
York, .
[2]. Amoh, R.K. and Kocherlakota, K. (1986). Errors of
misclassification associated with the inverse
Gaussian distribution. Communications in Statistics,
theory and methods, 15(2), pp.589-612.
doi:10.1080/03610928608829139
[3]. Anderson, T. W. (1958). An introduction to
multivariate statistical analysis. 1st edition. John
Wiley and sons, inc., New York.
[4]. Ching’anda E. F. and Subrahmaniam, K. (1976).
Misclassification errors and their distribution:
Robustness, M.Sc thesis, University of Manitoba
[5]. Ching’anda E. F. and Subrahmaniam, K. (1979).
Robustness of the linear discriminant function to non-
normality: Johnson’s system, J. statist. Planning and
inference 3, 69 – 77
[6]. John, S. (1961). Errors in discrimination. The annals
of mathematical statistics vol. 32, No 4, pp. 1125 –
1144. https://www.jstor.org/stable/2237911
[7]. Lachenbruch, P. A., Sneeringer C. and Reve L. T.
(1973). Robustness of the linear and Quadratic
discriminant function to certain types of non-
normality, Commun. Statist. 1, 39 – 57 (1973)
[8]. Lachenbruch, P. A. (1975). Discriminant analysis. 1st
edition Hafner press, New York

IJISRT23MAY669 www.ijisrt.com 1460

You might also like