Magic: The Gathering (MTG) Artist Recognition
Skills Used: PyTorch, Docker, MLFlow, AzureML, GitHub, Poetry.
Check out a web app using the model.
Summary
This project uses a fully connected neural network on top of a resnet convolutional neural network (CNN) trained to identify which artist created the art for a given Magic: The Gathering (MTG) card. I chose to train it to recognize some of the more well-known artists in MTG history, which you can see in the parameters for the data pipeline. Models are trained automatically through the use of a combination of Azure Devops pipelines and Azure ML training jobs. Check out below for more details. It is 82% accurate in determining from which of the six artists a given piece of art is from.
This project is deployed as a Dockerized Streamlit web app as indicated at the top of this page.
Feel free to check out the the code for the model training. This is how I structured it:
root (config files, linting files, project requirements, etc.)
├── cicd (cicd pipelines and scripts for uploading training jobs)
├── mtg_artist_classifier (code for model building and data pipeline scripts)
│ ├── classifier (code needed for model such as custom fully connected layer definition, training script, etc.)
│ └── data (code needed for data pipelines)
You can also check out the code for the streamlit app, which contains modules to download some images, download the model, then dockerize the app for deployment as a web app.
Finally, here is the code for the data pipelines that pull all the images and push them to Azure blob storage for the model to train.
Details
Data: I used Data Version Control (DVC) to build a data pipeline that will pull all the art for given artists. The pipeline is defined in the dvc.yaml file, and the scripts that are run in the pipeline are all located in here. The pipeline
- Downloads all the art for a collection of artists defined in the dvc_params.yaml python script. This script makes use of the Magic: The Gathering python sdk and the scryfall REST API to download images of cards for a given list of artists.
- Defines subsets of the images downloaded as training images and validation images, and copies them according to their assignment. Note that you won’t see the downloaded images in the repo, as dvc tracks them instead of loading all the images up to GitHub.
Model Building: I used PyTorch in this project as my deep learning framework. All of the required files for training the model are located in the classifier folder. I created a fully connected layer which I attached at the top of a ResNet.
For training, when there are changes to the main branch an Azure Devops pipeline is kicked off and sends a training job to my Azure ML workspace. The training script is run on an Azure cluster in Azure ML. MLFlow logs relevant training parameters and metrics, and a new version of the model is created from this training job. From there I can decide to do a deployment with the new model.
Model Deployment: After a training job, I can take a look at details of the training job and resulting model. I have a separate repository for my web app, and if I want the web app to use the new model, then I’ll update the name/version of the model in the model deployment config. Scripts are run to download the specified model and to build the docker container for the streamlit app, then it is deployed.
Questions? Comments? Let me know what you think! Reach out to me on LinkedIn.