Benchmarking Performance

State of the Art Automated
Machine Learning in 2023

Finding the Best AutoML Platforms

The promise of AutoML training is to make machine learning accessible to the everyday predictor, but the reality is that each system has different levels of complexity and cost to use. The high barrier to entry has resulted in a distinct lack of performance benchmarking.

We tested Google Cloud AI, Microsoft Azure AutoML, Amazon Sagemaker Autopilot, and Akkio on a number of open source real-world datasets. Models were benchmarked based on achieved accuracy and F1 scores, as well as training time and cost.

Model Accuracy

Accuracy is the global measurement of how often a model is correct when it predicts an outcome in the validation set - a random selection of training data (typically 20%) that is held back and run against the model to measure performance. Accuracy is a good performance metric but can sometimes be misleading. We found accuracy performance to be generally similar across the benchmarked set.

F1 Scores

F1 score is a combined measurement of precision and recall. Precision is the percentage of time the model is actually correct when it predicts an outcome (True Positives / Predicted Positives). Recall is portion of total outcomes the model correctly predicts (True Positives / Actual Positives). F1 score ranges from a low of 0 to a high of 1, and is useful in evaluating relative performance between models. Like accuracy - F1 scores were similar across the board.

Training Time and Cost

Training cost is important because you have to commit to it up-front, before you know if the model you build will even work. Its also very likely that you will want to make some adjustments to your data and retrain (for example you accidentally include a causal variable). Training speed enables you to iterate rapidly and avoid expending a lot of time down unproductive paths. Here Akkio is the clear winner with training times around 100x faster, and no cost to train models.

Dataset Library

We selected a wide range of open datasets for the benchmark. The datasets range in size from under 1K records to around 300K records. They include a variety of data types including categories, text, dates and numbers. Several of the datasets have anonymized features (where the true backing information is private and obfuscated with PCA). The model targets include both binary and multi-category classifications. You can download the datasets here:

Summary of Results

Overall, the model performance in terms of both accuracy and F1 score was reasonably similar between Akkio, Microsoft Azure, Google Cloud, and Amazon Sagemaker. Akkio is the only no-code solution in the benchmark set, and it pulls ahead on both training time (1 minute per model) and cost (free) - which unlocks the ability for a new class of non-technical business users to apply machine learning to their workflows.
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.