Machine learning observability is the practice of monitoring, troubleshooting, and understanding machine learning models as they move from research to production. An effective ML observability tool should automatically surface issues, identify the root cause, and act as a guardrail for models in production.
Some key goals of ML observability are:
- Reduce the time to detect performance issues or regression
- Guide the model owner to the root cause of the issue
- Facilitate timely resolution of the problem
- Track model performance across different slices of data
- Identify data quality issues that can impact model performance
The 4 pillars of ML observability are:
- Performance analysis – Surface worst performing slices and detect changes in accuracy, recall, F1 score, etc.
- Drift monitoring – Detect changes in the input data distribution over time that can lead to model degradation.
- Data quality – Check for issues like missing data, out-of-range values, and cardinality shifts that can negatively impact the model.
- Explainability – Understand feature importance and why a model made a particular prediction to build confidence and improve the model.
An effective way to achieve ML observability is through the use of an evaluation store. An evaluation store tracks:
- Model performance on different slices of data
- Input features and output predictions
- Ground truth or proxy metrics that correlate with performance
It then uses this data to:
- Validate models during training
- Detect performance issues or data drift after deployment
- Surface the root cause of any issues
- Identify opportunities to improve the model
What benefits does ML observability provide during model training?
Benefits of ML Observability During Model Training
ML observability provides several key benefits during the model training stage:
- Faster issue detection: An ML observability solution can monitor key performance metrics and detect issues with the model faster. This allows model builders to iterate faster and improve their models.
- Guidance to the root cause: When an issue is detected, the observability solution can guide the model builder to the root cause, whether it be a data distribution problem, feature transformation issue, or model overfitting.
- Slice-based analysis: ML observability tools can analyze model performance on different slices of the training data and surface the slices where the model is performing poorly. This provides insights into how to improve the model.
- Prevent training-serving skew: The observability tool can detect if the training data distribution differs significantly from the production data distribution. This helps avoid the training-serving skew problem after model deployment.
- Model selection: The observability solution can track the predictions that multiple candidate models would make on the training data. This allows data scientists to select the best model to promote to production based on the training data performance.
- Data quality checks: Issues like missing data, out-of-range values, and cardinality shifts in the training data can be detected to ensure a clean dataset for training the model.
What are some examples of issues that ML observability can detect in the training data?
Issues ML Observability Detects in Training Data
ML observability tools can detect a number of issues in training data that can negatively impact model performance:
- Data drift – Changes in the distribution of the training data over time. This can lead to model degradation after deployment. ML observability tools can detect feature drift between the baseline training distribution and the current distribution.
- Missing or incomplete data – ML observability can identify features with a high percentage of missing values or incomplete records. This can reduce the amount of training data available.
- Out-of-range values – Values in the training data that fall outside of an expected range. This can cause models to make incorrect predictions on similar out-of-range data in production.
- Label noise – Inaccurate or incorrect labels in the training data. This can reduce the model’s ability to learn the correct patterns from the data.
- Data biases – Biases in the training data along dimensions like gender, race, age, etc. This can cause models to make unfair predictions when deployed.
- Cardinality shifts – Changes in the number of unique values of a feature over time. This can be a sign of data drift.
- Data quality issues – More general data quality problems like duplicate records, inconsistent formatting, or corrupted values. This dirty data can degrade model performance.
By tracking key data quality metrics and comparing them to baselines, ML observability tools can detect these issues during model training. This allows data scientists to fix any problems, clean the data, and improve their models before deployment.
Once models are deployed, ML observability continues to monitor for data drift, label noise, and other issues that can cause models to degrade over time. This allows data scientists to retrain their models when needed to maintain high performance in production.