▶️ Demo | Anomaly Detection - find anomalies in server operational parameters like latency
and threshold
The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.
Let’s say:
If x is normally distributed then it may be displayed as follows.
- mean value,
- variance.
- “~” means that “x is distributed as …”
Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:
We may use the following formulas to estimate Gaussian parameters (mean and variation) for ith feature:
- number of training examples.
- number of features.
So we have a training set:
We assume that each feature of the training set is normally distributed:
Then:
Anomaly if
- probability threshold.
The algorithm may be evaluated using F1 score.
The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Where:
tp - number of true positives.
fp - number of false positives.
fn - number of false negatives.