Model Interpretability & Explainability
Feature Importance
Our anomaly detection models are trained primarily on the Transaction Amount feature. While Location is recorded, it is currently used for metadata purposes. The core assumption is that normal transactions follow a specific statistical distribution (Gaussian), and significant deviations from this distribution are flagged as anomalies.
- Amount: The primary driver. Normal range is ~$80-$120. Anomalies are >$500.
- Time: Implicitly used in time-series visualization, but models are currently point-in-time detectors.
Model Logic Explained
Isolation Forest
Isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. Anomalies are susceptible to isolation and will have shorter path lengths in the random trees.
Local Outlier Factor (LOF)
Measures the local deviation of density of a given data point with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood.
One-Class SVM
Learns a decision function for novelty detection: classifying new data as similar or different to the training set. It maps input data into a high-dimensional feature space and iteratively finds the maximal margin hyperplane which best separates the training data from the origin.
Robust Covariance (Elliptic Envelope)
Assumes the data is Gaussian distributed and fits a robust covariance estimate to the data, ignoring outliers. It defines an ellipse around the central data points; anything outside is an anomaly.