Necessity Of Confusion Matrix In Cyber Crime

khushi thareja
3 min readJun 6, 2021

What is Confusion Matrix ?

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification. It is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy and most importantly AUC-ROC Curve.

True Positive: You predicted positive and it’s true.
True Negative: You predicted negative and it’s true.
False Positive: (Type 1 Error) You predicted positive and it’s false.
False Negative: (Type 2 Error) You predicted negative and it’s false.

In technical terms, a type I error is the rejection of a true null hypothesis (also known as a “false positive” finding or conclusion; example: “an innocent person is convicted”), while a type II error is the non-rejection of a false null hypothesis (also known as a “false negative” finding or conclusion; example: “a guilty person is not convicted”). Both of them are exceedingly minacious.

Intuitively, type I errors can be thought of as errors of commission, i.e. the researcher unluckily concludes that something is the fact. For instance, consider a study where researchers compare a drug with a placebo. If the patients who are given the drug get better than the patients given the placebo by chance, it may appear that the drug is effective, but in fact the conclusion is incorrect. In reverse, type II errors are errors of omission. In the example above, if the patients who got the drug did not get better at a higher rate than the ones who got the placebo, but this was a random fluke, that would be a type II error. The consequence of a type II error depends on the size and direction of the missed determination and the circumstances. An expensive cure for one in a million patients may be inconsequential even if true.

Trade off between type 1 and type 2 error is very critical in cyber security. Let’s take an example. Consider a face recognition system which is installed infront of adatabase which holds critical information for the company. Consider that the manager comes and the recognition system is unable to recognize him. He tries to log in again and is allowed in.
This seems a pretty normal scenario. But let’s consider another condition. A new person comes and tries to log himself in. The recognition system makes and error and allows him in. Now this is very dangerous. An unauthorized person has made an entry. This could be very damaging to the whole company.
In both the cases there was an error made by the security system. But the tolerance for False Negative here is 0 although we can still bear False Positive.
This shows the critical nature that might vary from use case to use case where we want a tradeoff between the two types of error.

--

--