AI & ML Academy - Azure Machine Learning Platform
Welcome to the AI & ML Academy (AIA) - ML Platform!
Overview
Azure Machine Learning Services is a platform and not just a notebook to run ML models. AML service brings enterprise users the ability to train, test and deploy their models across a host of environments to support their machine learning applications. The goal is to run our machine learning models in an evergreen production environment as an application and not only one-time experiments.
The purpose of this module is to outline the different run-time environments. We’ve setup a decision matrix and plotted the different Azure Services applicable to Artificial Intelligence or Machine Learning. The matrix will have two spectrums(scenarios): Compute Type and ML Lifecycle. The cells of this matrix will define the options we have across the Azure Platform to deploy and run Machine Learning models. The purpose of this module will be to define where each ML service resides and more importantly the use case, tools, personas, frameworks and languages that support them.
The first scenario of Azure Machine Learning is the Compute Type. AML is a cross-platform environment that spans from Windows, Linux and Azure. These run-time environments can be classified into two categories; Hybrid or Cloud. A Hybrid environment is a physical or virtual environment that runs on a workstation or set of servers running in a Data Center or cloud hosting. The Cloud for this discussion will be Azure running as a PaaS or SaaS service but is not limited to only Azure.
The second scenario of Azure Machine Learning is the ML lifecycle. We have different resource requirements depending on where the model resides in the lifecycle. These two stages of the ML lifecycles are Training/Test & Inference. (We’ve simplified the lifecycles to reduce the complexity for this discussion)
Here is the decision matrix outlining the scenarios and the resulting run-time environments. There are four options in this matrix and the environment names will be Developer, Hosted, Unmanaged and Managed. The main purpose of this matrix is to serve as a learning tool and table of content for this learning resource.
ML Lifecycle | Compute Type | |
Hybrid | Cloud | |
Training/Testing | Developer Environment | Hosted Environment |
Inference | Unmanaged Environment | Managed Environment |
Here are the definitions for each environment that are part of this matrix. (For simplicity we’ve defined four and realize there are other combination like Multi-cloud but are focusing on the most prevalent environments)
- Developer Environment – This is an environment for a Pro-Developer Data Scientist to train and test their models. These models can be run on your local machine, physical or virtual servers and in an IaaS environment in the cloud. The key criteria is the Data scientist will build out their environment from scratch (OS, Language, Framework and IDE). This provide versatility but high degree of maintenance to reproduce your model. The mindset of “It runs on my computer” won’t be acceptable so an audit trail will be required at release time.
- Unmanaged Environment – This environment is an extension of the Developer Environment but migrates it to a production environment that is fully managed by the end-user. This gives the end-user all the versatility of the open-source ecosystem and provides the best-of-breed approach. A typical end-user would be a ML Engineer who has a published model and set of artifact to deploy to production. They will ensure it runs in an evergreen environment across it lifetime to ensure accuracy and availability.
- Hosted Environment – A hosted environment will utilize a common runtime environment in terms of OS, Languages, Frameworks and IDEs. The hosted environment is setup and supported by Azure. The most common environment will be setup to ensure versatility but reduce setup time for these users. A typical user for this environment is a Data Scientist who might be experts in model development but needs to hand-off their code set to experts to deploy into production. A great example is Azure ML Notebooks.
- Managed Environment – This environment is similar to a PaaS or SaaS Azure service where the developer isn’t concerned with setup of their environment or availability of the service. Their main concern is the scalability of the environment to service the traffic to score the data. Managed environment start to transition from a Data Scientist to a more Low-code option to streamline adoption by Citizen Developers. Managed environments are not typically leveraged by Pro-Developers since they like to Build their own thru a Software Development Lifecycle.
The recommended migration path across lifecycles are defined based on the color of the cells. A Hybrid environment will typically start with the developer environment for training/testing and then deploy to an unmanaged environment. This unmanaged environment can be hosted in Azure but always in virtual environment like AKS. There is crossover but the versions of software you need to replicate in the Developer environment will not likely match one in the managed environment. The same holds true with the Cloud Environment and migration from Hosted to Managed. This is a bigger gap since it is not only a match across software versions but also skill sets of the ML team.
The rest of this module will outline the list of ML services that run in each of these environments. We will provide a set of learning resources for your readiness and adoption. We hope this overview provides the context you require to utilize these environments based on your machine learning ecosystem.
Developer Environment
This is an environment for a Pro-Developer Data Scientist to train and test their models. These models can be run on your local machine, physical or virtual servers and in a IaaS environment in the cloud. The key criteria is the Data scientist will build out their environment from scratch (OS, Language, Framework and IDE). This provides versatility but a high degree of maintenance to reproduce your model. The mindset of “It runs on my computer” won’t be acceptable so an audit trail will be required at release time.
For example, a data scientist wants to run their experiment on their local PC in the WSL environment can setup their conda environment with the necessary requirements file for their machine learning experiment. They can leverage VS code to execute this code on an ad-hoc basis for training and exploratory data analysis. There local machine doesn’t have enough horsepower so they migrate their environment to a Azure Data Science VM. The additional GPUs will reduce training runtime. Likewise, they can leverage a LINUX VM directly and setup the environment from scratch per their local machine configuration. The typical use case is physical, virtual or IaaS compute. This compute environment is attached to Azure Machine Learning Service thru the Python SDK to track, monitor and register models experiments as required.
Here are the Azure Services that support AI & ML workloads
- Local
- Remote Azure VM (VM, ACI, AKS)
- ML.NET
The learning resources will not be a deep dive into how to support and develop in these environments. The intent is how to deploy AI & ML into these compute environments. The set of learning resources are curated to help you get to a level 300 understanding.
Resource | Level | Training Assets URL |
Local | 100 | Fundamentals of machine learning in the cloud |
200 | Enhance your Azure Machine Learning experience with the VS Code extension | |
300 | Train an image classification model with Azure Machine Learning | |
Remote Azure VM | 100 | Supercharge Your Azure ML Development Through Visual Studio Code |
200 | AzureML in a Day | |
300 | Machine Learning Cheat Sheet | |
ML.NET | 100 | ML.NET Comparison Sheet |
200 | On .NET Live - Adding Machine Learning to your .NET Apps with ML .NET | |
300 | ML.NET tutorialsi |
Unmanaged Environment
This environment is an extension of the Developer Environment but migrates it to a production environment that is fully managed by the end-user. This gives the end-user all the versatility of the open-source ecosystem and provides the best-of-breed approach. A typical end-user would be a ML Engineer who has a published model and set of artifact to deploy to production. They will ensure it runs in an evergreen environment across it lifetime to ensure accuracy and availability.
After model selection, they leverage the Azure CLI to deploy this model and environment to a LINUX VM in Azure. This Azure VM will run as the scoring engine (Inference) for production applications or as an MVP app to monitor model performance. The hardware profile for the inference instance will be different than the training instance. The ML Engineer will need to right size the environment for the amount of traffic more than the size of the train/test data sets.
Here are the Azure Services that support AI & ML workloads
- Local Web Service (Docker Image)
- Remote VM (Azure VM, Kubernetes, AKS)
- Windows ML
The learning resources will not be a deep dive into how to support and develop in these environments. The intent is how to deploy AI & ML into these compute environments. The set of learning resources are curated to help you get to a level 300 understanding.
Hosted Environment
A hosted environment will utilize a common runtime environment in terms of OS, Languages, Frameworks and IDEs. The hosted environment is setup and supported by Azure. The most common environment will be setup to ensure versatility but reduce setup time for these users. A typical user for this environment is a Data Scientist who might be experts in model development but needs to hand-off their code set to experts to deploy into production. A great example is Azure ML Notebooks.
These environments will be setup with a preconfigured set of ML Frameworks, Languages, packages and tools. This is to ensure setup time is minimal for end-user. This environment is typically setup for a Data Scientist who wants to build machine learning models but not an expert on configuration and infrastructure. A data scientist will train/test new models, conduct exploratory data analysis, or ad-hoc experiments. The simplicity of the environment is a big benefit but the flexibility to build a multitude of models for highest accuracy might be restrained depending on the configuration of these environments. It is highly recommended to review the configuration so you fully understand the options.
Here are the Azure Services that support AI & ML workloads
- DS VM
- Synapse Analytics Spark Pools
- Synapse Analytics Workspace
- Azure Machine Learning Notebooks
- Azure Machine Learning Attached Compute
- Azure Databricks
- HDInsight
The learning resources will not be a deep dive into how to support and develop in these environments. The intent is how to deploy AI & ML into these compute environments. The set of learning resources are curated to help you get to a level 300 understanding.
Resource | Level | Training Assets URL | |
DSVM | 100 | Data Science VIrtual Machine Overview (DSVM) | |
200 | Data Science Tools on DSVM | ||
300 | Samples on the DSVM | ||
Synapse Analytics Workspace | 100 | Machine Learning Experiences in Azure Synapse | |
200 | Cognitive Services in Azure Synapse ANalytics | ||
300 | Train ML model without code | ||
Synapse Analytics Spark Pools | 100 | Synapse Analytics Spark Pools Overview | |
200 | Apache Spark MLLib running on Synapse Analytics | ||
300 | SynapseML running on Synapse Analytics Spark Pools | ||
AML Notebooks | 100 | Azure Machine Learning Studio Notebooks | |
200 | How to run Jupyter notebooks on AML | ||
300 | Image Classification Using Notebooks in AML | ||
AML Compute | 100 | Learn how to utilize right compute for AML Training jobs | |
200 | What is an Azure Machine Learning Compute instance | ||
300 | AML Training Compute Targets | ||
Databricks | 100 | Azure Databricks Overview | |
200 | Deploy models for inference and prediction | ||
300 | Model training examples |
Managed Environment
This environment is similar to a PaaS or SaaS Azure service where the developer isn’t concerned with setup of their environment or availability of the service. Their main concern is the scalability of the environment to service the traffic to score the data. Managed environment start to transition from a Data Scientist to a more Low-code option to streamline adoption by Citizen Developers. Managed environments are not typically leveraged by Pro-Developers since they like to Build their own thru a Software Development Lifecycle.
These environments are pre-built environments with varying degrees of model portability. The purpose is to reduce machine learning model development/deployment time. A software engineer might not have extensive data science skills but with Cognitive Services they can reuse prebuilt models to score their data for sentiment analysis. Likewise, a database developer might want to run a batch inference script (T-SQL) to score the new data in the database platform. Lastly, a data scientist who needs to wear multiple hats can leveage the Azure ML Service for an end-to-end for training and scoring their data.
Here are the Azure Services that support AI & ML workloads
- Applied AI
- Cognitive SErvices
- SQL Server Machine Learning Service
- Azure SQL Managed Instance
- AML Inference Cluster
- AML Attached Compute
- AML Managed Endpoints
- Azure Batch
- SQL Edge
- IoT Edge
The learning resources will not be a deep dive into how to support and develop in these environments. The intent is how to deploy AI & ML into these compute environments. The set of learning resources are curated to help you get to a level 300 understanding.