Add Git support for Visual Studio Team Services for Azure Databricks
Most people on Azure are likely to also use VSTS as their git repository. VSTS support authorization tokens and git operations just like GIT itself so it should be an easy addition I would think.
We have now GA:d VSTS integration for Azure Databricks. Find out more about it on this documentation page:
It would be nice if you also add Git support for Visual Studio Team Services for AWS Databricks!
Has this been done?
works now in my Azure Databricks subscription, just tested it
AdminAli Ghodsi (Admin, Databricks) commented
VSTS integration is around the corner in Q1'19. Please stay tuned. We are excited to integrate Microsoft Azure Databricks with Visual Studio Team Services. // Ali Ghodsi, CEO, Co-Founder of Databricks
I struggle to understand how a project in Azure can not integrate with VSTS. Especially a company with the expertise to create and deliver spark clusters on-demand at the click of a button.
Ankur Jain commented
Need to have VSTS in place along with CI/CD.
Me personally, I just want to check in my notebooks to VSTS. CI/CD might come later.
Need this to manage and backup codes. And hope databricks can access through Visual Studio.
Murray Foxcroft commented
We need GIT integration for versioning of notebooks and cluster set up so that we can deploy a version of a notebook on top of a version of a cluster, all build through a CI/CD pipeline, whether that pipeline is VSTS/TeamCity/Jenkins etc, it should be generic. this will allow us to compare versions, deploy and roll back etc. Other source control systems may be helpful (e.g. SVN) but GIT seems to have won out here - key would be to support any GIT provider (GitHub, VSTS, BitBucket etc)
Josh Fennessy commented
Would like to see support for connection to VSTS Git repository as well as integration with CI/CD and automated testing pipelines with VSTS. I think the Azure Data Factory integration is a good model to investigate as you review this feature.
Currently, we are working through this using a process of exporting/importing notebooks from our VSTS Git repository.
We also have a solution that integrates with Data Factory for the begin of automated testing, but ideally we'd be able to create unit tests directly from a notebook.
For me, it's synchronization with VSTS.
We want to keep all our solution code on VSTS, not have it spread across two different repos.
Databricks should model itself after Azure Data Factory’s Continuous Delivery AND Environment Promotion capability.
Having VSTS git support will make our data scientist work efficient. Integration with VSTS git is therefore a must
Chris G commented
Any integration with VSTS would be much appreciated.
Jamie Stewart commented
Yes please! Azure integration to vsts git is a must.
Looking for the same feature, is there any alternative to sync my Notebook with VSTS. Currently how are we saving our notebook changes to VSTS ?
As a notebook author
I want to be able to access my VSTS repository from multiple Databricks workspaces
So that I can synchronize my notebooks in a Databricks workspace user directory with VSTS and be able to synchronize with other Databricks workspaces via VSTS
That’s a story stating what I need to accomplish.
Some acceptance criteria:
Given a new or existing Databricks workspace and a user directory
I can securely clone any of my VSTS repositories to my user directory and synchronize the clone with VSTS, including changes in both directions, viewing differences, and resolving merge issues.
Given a scheduled notebook
When I synchronize the notebook with VSTS
Then the schedule is also persisted such that synchronizing with VSTS from any Databricks workspace would also synchronize the schedule
And it is acceptable if schedules must be added by a deployment process triggered externally since there are cluster dependencies
So the CI/CD flows could be externally controlled as long as there were api endpoints that enables cloning and synchronization all with Azure Active Directory security and authentication.
Is their a preview version of this functionality? If yes, will this be marked as under consideration at least? Thanks.
Ps. Totally agree it changes the game, and I hope it integrates with rather than seeks to control VSTS. Continuous delivery best works when code can be validated prior to deploy, and ADBricks’ ability to spawn on demand clusters with production configuration is a great way to do this!
Ryan Chynoweth commented
Using VSTS git and the CI/CD would be a gamechanger for us. Microsoft did a great job with Azure Data Factory v2 integration with VSTS. Ideally, it would sync notebooks when I want to publish/commit code to the repository. Then I could have a automated build/release to deploy and test. It helps for a more seamless and automated transition between development, test, and production environments.