3 votes4 comments · Product Feedback » External libraries / applications · Flag idea as inappropriate… · Admin →
Goran, we updated user guide with instructions for installing TF 1.14 and TF 2.0 beta on Runtime 5.4 ML:
* Azure: https://docs.azuredatabricks.net/applications/deep-learning/single-node-training/tensorflow.html#install-tensorflow-1-14-and-2-0-beta-on-dbr-5-4-ml
* AWS: https://docs.databricks.com/applications/deep-learning/single-node-training/tensorflow.html#install-tensorflow-1-14-and-2-0-beta-on-dbr-5-4-ml
Thanks a lot for your input!
Glad to see that you found the solution! Runtime ML uses Conda to manage packages and there are some known compatibility issues between conda and pip, e.g., the "distutils installed" error message you got. A workaround, as you did, is to upgrade the required dependencies in conda first and then use pip to install the package you want.
We will update our user guide soon and provide those instructions. Thanks for letting us know your solution!
1 vote2 comments · Product Feedback » External libraries / applications · Flag idea as inappropriate… · Admin →
Trevor, sorry for the late reply. You can install PyMC and attach it to a Runtime ML cluster. See instructions: https://docs.databricks.com/user-guide/libraries.html#workspace-libraries. We are not aware of issues with running PyMC on Databricks. So please let us know if you hit any issues.
Goran, we will provide instructions to upgrade TF to 1.13 or 2.0 alpha in Runtime ML 5.4 release notes. So you can try out TFP 0.6.0.
Since you are on Azure Databricks, could you try the example notebook here:
The difference is the location we use to save checkpoints. In Azure Databricks Runtime 5.3 ML, we introduced an optimized FUSE mount at file:/dbfs/ml. See https://docs.azuredatabricks.net/applications/deep-learning/data-prep/ddl-storage.html#prepare-storage-for-data-loading-and-model-checkpointing
For the Horovod failure, it might be caused by a race condition when two Horovod processes trying to create /root/.keras folder at the same time. If you hit the error, re-run the cell should work.
Let me know if you still experience errors.
This is supported by SparkSubmitTask via API: https://docs.databricks.com/api/latest/jobs.html#jobssparksubmittask. You can find an example here: https://docs.databricks.com/api/latest/examples.html#spark-submit-job-api-examples.