Git integration should be on folder/repo level, not just single notebooks
The current git integration only integrates single notebooks. But with notebook flows and other features, we have a whole folder structure of notebooks with interdependencies. So instead of integrating a single notebook with a file in git, you should be able to connect a Databricks folder to a git repo and branch, pursing/pulling the whole thing in one (just like a normal git repo).
I know this can be accomplished using the workspace api and CLI, but it would be great to be able to do this without involving the desktop at all.
I agree, being able to branch an entire workspace and deploy across instances of databricks would enable us to use the notebooks in a way that is more like a typical SDLC flow. I also agree that all artifacts of a workspace (clusters, notebooks, libraries) should be bound to source control. Code in spark isn't just code, the physical cluster configurations matter as well and should be controlled like code.
My organization is currently undergoing a transition to the Databricks platform from a homegrown solution, and we've been up to this point fairly heavy git/github users. Without a better git integration story moving existing projects to Databricks is going to be rather painful.
I second that this would be a huge improvement. Apart from being more conventient it would make the process less prone to errors and more in sync with regular software development conventions.
@evenv you say this is possible through the workspace api and CLI. Could you share how you accomplish this? I would really like to use this feature.
This would be a big quality of life improvement, and also allow our developers to have more control over their feature branches while staying within the databricks ui
Chris Klein commented
I second this, except I think GIT should be tied to the workspace and not just a folder. What our team needs is a reliable way to migrate code through DEV/INT/QA to PROD environments. Workspaces work well as an analog of an environment, but there isn't a good way to migrate notebooks between them. The API is unreliable at best for this purpose. What I'd love is a way to pin an entire notebook to a GIT branch.
Bonus points is job and cluster configurations be stored in GIT too, sine they are a part of the workspace.