Continuous Deployment of Python eggs with VSTS on Azure
This blog shows how to create a basic continuous deployment (CD) pipeline for Python code with Visual Studio Team Services (VSTS) on Azure. Building a full CI/CD pipeline on VSTS is a bit of a challenge because Python is a not first-grade citizen on the Azure stack (yet), so we'll focus on building a Python egg from a repository and putting that egg on a file share.
Our use case looks like this:
Our Python code is hosted on a Git repository in VSTS and is released to multiple machines in the Azure Cloud.
New commits on the master branch in the Git repository trigger a process that builds a Python egg and releases it on a file share. The Python egg is an uncompiled, zipped Python module that is system independent.
The file share is mounted on a machine with JupyterHub, used for development, and one with Airflow, used for our jobs. New eggs get installed in the virtual environment on both machines so that the everyone has access to our latest and greatest code.
We'll start with a repository called
example_project/ ├── exampleproject/ <- Python package with source code. │ └── __init__.py <-- Make the folder a package. └── process.py <-- Example module. ├── tests/ <- Tests for your Python package. └── test_process.py <-- Tests for process.py. ├── README.md <- README with info of the project. ├── setup.py <- Install and distribute your module. └── vsts_build.bat <- Windows build script.
setup.py specifies how to build install your Python package and create an egg out of it:
# setup.py import os from setuptools import setup, find_packages def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() setup( name="exampleproject", description="Example project.", author="Henk Griffioen", long_description=read('README.md'), packages=find_packages(), )
Normally, you can specify which folders to include or exclude with
find_packages(), but the Hosted VS2017 agent has a version of
setuptools that seems to have problems with submodules when doing that.
vsts_build.bat builds the egg and is rather simple:
:: vsts_build.bat @echo off echo Running build on "%AGENT_NAME%" with ID: %AGENT_ID%. :: Uncomment the following lines to get some info on the agent: :: @dir %AGENT_WORKFOLDER% :: @dir %AGENT_BUILDDIRECTORY% :: @dir %BUILD_SOURCESDIRECTORY% :: @dir C:\Python27 :: @dir C:\ :: Build the egg. C:\Python27\python.exe -W ignore setup.py bdist_egg
This simple Powershell script uses Python 2.7 (🔔🔔🔔 shame) to build the egg. Python 3.6 is also available on the Hosted VS2017 agent.
Make sure that you can build the egg with
python setup.py bdist_egg on your own machine.
Build & release
Now that our repository is set up, we can create the build & release pipeline. We'll assume you already have an agent queue with the Hosted VS2017 agent. Instead of having two steps for the Build and Release, we'll build the egg and put it on a fileshare in one step.
In VSTS go to 'Build & Release' -> 'Builds' -> '+ New'. Start with an empty process, give it a nice name and choose the queue with the Hosted VS2017 agent.
The first task is to get the sources. Select your repository under 'This project' and the master branch.
The next task will build the egg and call our
Add a 'Batch Script' and point the 'Path' to the
vsts_build.bat in the repository.
Add the task 'Run Inline Azure Powershell' (you may need to search for it) so that earlier deployed eggs are deleted. Under 'Script to the run` add:
Param( [string]$SecretKey ) $context = New-AzureStorageContext -StorageAccountName "<STORAGE-ACCOUNT-NAME>" -StorageAccountKey "$SecretKey" Remove-AzureStorageFile –ShareName "<SHARE-NAME>" –Path "<FOLDER>/example_project-v0.0.0-py2.7.egg" -Context $context
The key of the Storage Account is a parameter for this script, so set the following under 'Argument':
Under 'Control Options' enable
Continue on Error so that your pipeline doesn't error out when there's no egg on the file share to remove.
Now that we have build our artifact, we can publish it. Select the 'Publish Build Artifacts' task and configure it.
We'll cheat a bit and put the uploading of the egg in our Build process (best practices is to create a separate Releaste step). Create a new task of type 'Azure Storage Upload'. This task is not available in VSTS by default, you can install it for free trough the VSTS Marketplace. When installed configure the following:
The final build process should look something like:
Go the tab 'Variables' to add a variable called
SecretKey with your the Storage Account key.
This key grants full access to the blob-storage account (it's better to generate a short term SAS-token).
In the tab 'Triggers' you can define when the build should start.
Enable the Continous Integration based on new code in the master branch.
Click 'Save & queue' to save your pipeline. Each time a new git commit is pushed to master, a new egg will be placed on the file share!
We've build a basic pipeline to deploy new Python code on a file share. This code is an egg distribution of the Python module. If you want to build a more sophisticated pipeline, you'll have to use resources not available to VSTS & Azure by default and for instance, host your own agent or use your own Jenkins server.
Airflow Tutorial for Data Pipelines
August 11, 2017
GoDataDriven open source contribution: July 2017 edition
July 31, 2017
Continuous Deployment of Python eggs with VSTS on Azure
July 28, 2017
Hadoop and LDAP, as seen through Venetian blinds
July 01, 2017
GoDataDriven open source contribution: June 2017 edition
June 30, 2017
Vendor Free Data Science
June 19, 2017