Continuous Delivery of Safe AI#
The above diagram illustrates a simplified ML development pipeline. A typical pipeline consists of multiple branches with iterative processes, involving multiple contributors. Different forms of bias types can occur in each step.
Defining fairness-aware CI/CD type flows that can automate building multiple artefacts for different teams can support building a common understanding in bias mitigation for organisations. In this article, we demonstrate potential approaches to monitor ML pipelines throughout the development lifecycle.
Prerequisites#
Structuring an ML project requires similar skills to any data-centric research. We recommend reviewing the following tips to support your design and management process. The key requirement to monitor the potential fairness (or security, privacy, etc.) issues is maintaining a clean and maintainable code. There are many great books about maintaining a healthy codebase, so I will not go into details here.
You can check NCSC’s guidance on secure development and deployment of software systems.
The Turing Way project provides a set of useful reading materials to organise your codebase for reproducibility: https://book.the-turing-way.org/reproducible-research/reproducible-research
These resources are free. If you would like to read some books, see the list below.
See also
Recommended books:
Structure and Interpretation of Computer Programs (SICP) by Ableton, Sussman, Sussman
The Pragmatic Programmer by David Thomas, Andrew Hunt
Architecture Patterns with Python: Enabling Test-Driven Development, Domain-Driven Design, and Event-Driven Microservices by Bob Gregory and Harry Percival
CI/CD: Continuous Integration and Delivery#
Before starting automating the parts of your codebase (e.g. testing your code against adversarial scenarios, building releases), I suggest checking the prerequisities of this article and structure your clean and reproducible codebase.
CI/CD stands for Continuous Integration and Continuous Delivery (or Continuous Deployment). It is a software development practice where code changes are automatically tested and integrated (CI), and then automatically delivered or deployed to production (CD), ensuring faster and more reliable updates. Based on your needs, integration and delivery parts can be automated via different libraries, and you can maintain the overall pipeline through tools provided by Github.
Check these examples:
A Good Example of a Complete RAG Application: octodemo/contoso-chat-dhanachavan
Example Actions: microsoft/security-devops-action
Using Tools for Experiment Tracking#
You can use tools like wandb, Neptune, and mlflow to track and maintain records of your ML experiments. These open-source platforms allow you to register models and experiments for easy maintainability.
While these libraries are useful for tracking progress and sharing insights with your team, they aren’t optimized for fairness-focused processes. In such cases, it’s important to foster interdisciplinary conversations and track fairness metrics, discrimination cases, and related outputs.
We developed a set of useful functions to integrate metadata management for fairness and safety monitoring easily with these tools in FAID repository on GitHub. Developers can also add the scripts directly to their codebase for customized tracking formats (see FAID’s Guide on this issue).
If you’re already using these libraries, FAID’s logging behavior will feel familiar.
import random
import wandb
import mlflow
from faid import logging as faidlog
Initialise#
You first initialise the project with the name and config details. These functions basically creates the metadata for the experiment.
# Init weight&biases
run = wandb.init(project= project, config= config)
# Init mlflow
mlflow.set_experiment(project)
# init the log files --> this will create model, data, fairness-experiment, risk and transparency logs
faidlog.init()
# create (or get) fairness experiment context
ctx = faidlog.ExperimentContext(name=experiment_name)
Run Experiments and Record Logs#
After initialising and create a managable workflow, you can log whatever metric, outcome, or any other variable you want to store and monitor using these libraries.
# Log the hyperparameters
# simulate training
epochs = 10
offset = random.random() / 5
for epoch in range(2, epochs):
acc = 1 - 2 ** -epoch - random.random() / epoch - offset
loss = 2 ** -epoch + random.random() / epoch + offset
# log metrics to wandb
metrics = {"acc": acc, "loss": loss}
wandb.log(metrics)
mlflow.log_params(metrics)
ctx.add_model_entry(key="metrics", entry=metrics)
FAID fairness recording format has four main entries: Context, Model, Data, and Metrics. This categorisation is designed to ease the transfer these experiment-level metadata to the final transparency artefact: Use Case Cards, Model Cards, and Data Cards.
Why Do You Need a Separate Logging Library for Fairness?#
When experimenting with data and models, you generate a large amount of information. However, not all of this information is relevant to fairness. Fairness researchers require a specific set of data points and metrics to assess and ensure fairness throughout the machine learning pipeline. Extracting and providing this specific information can become an additional burden for ML engineers, who may already be managing numerous tasks.
A dedicated logging library for fairness can streamline this process. By proactively defining fairness requirements and automatically monitoring the parameters related to these requirements, we can:
Reduce Workload: Automate the extraction and logging of relevant fairness metrics, freeing ML engineers from the manual task of selecting and providing this information.
Minimize Errors: Decrease the likelihood of mistakes that can occur with manual data handling and reporting.
Ensure Consistency: Maintain a standardized approach to logging fairness-related data, which can enhance the reliability and comparability of fairness assessments.
Enhance Focus: Allow fairness researchers to concentrate on analyzing the relevant data without the distraction of unrelated information.
Using FAID with CMF and DVC#
The design principles of our metadata management shares the similar design decisions with CMF and DVC to allow quick and easy integration with these data and model driven version control systems. While developing our tool, we also checked the compatability of using the tool together with CMF and DVC to allow version control throughout the fairness research lifecycle.
Note
#TODO: Add VCS workflows
Opening Data and Models#
Creating an open-source ML repository demands additional requirements to responsibly release the outputs. Model Openness Framework defines three layers while assessing openness and completeness of AI development artifacts.
Open Model layer includes model architecture, model parameters, technical report, evaluation results, model card, data card and sample model outputs.
Open Tooling layer includes the codes of training, inference, evaluation, data used for evaluation, supporting libraries and tools, and all components from Open Model layer.
Open Science layer includes research paper, all datasets, data pre-processing code, model parameters, model metadata, and all components of Open Tooling layer.
Delivering all these artifacts continuously in a transparent way requires a good carefully designed comprehensive orchestration flow.