Xiangrui Meng

My feedback

  1. 3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Xiangrui Meng commented  · 
    Xiangrui Meng commented  · 

    Glad to see that you found the solution! Runtime ML uses Conda to manage packages and there are some known compatibility issues between conda and pip, e.g., the "distutils installed" error message you got. A workaround, as you did, is to upgrade the required dependencies in conda first and then use pip to install the package you want.

    We will update our user guide soon and provide those instructions. Thanks for letting us know your solution!

  2. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Xiangrui Meng commented  · 

    Trevor, sorry for the late reply. You can install PyMC and attach it to a Runtime ML cluster. See instructions: https://docs.databricks.com/user-guide/libraries.html#workspace-libraries. We are not aware of issues with running PyMC on Databricks. So please let us know if you hit any issues.

    Goran, we will provide instructions to upgrade TF to 1.13 or 2.0 alpha in Runtime ML 5.4 release notes. So you can try out TFP 0.6.0.

  3. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Product Feedback » Notebooks  ·  Flag idea as inappropriate…  ·  Admin →
    Xiangrui Meng commented  · 

    Since you are on Azure Databricks, could you try the example notebook here:

    https://docs.azuredatabricks.net/applications/deep-learning/distributed-training/mnist-tensorflow.html

    The difference is the location we use to save checkpoints. In Azure Databricks Runtime 5.3 ML, we introduced an optimized FUSE mount at file:/dbfs/ml. See https://docs.azuredatabricks.net/applications/deep-learning/data-prep/ddl-storage.html#prepare-storage-for-data-loading-and-model-checkpointing

    For the Horovod failure, it might be caused by a race condition when two Horovod processes trying to create /root/.keras folder at the same time. If you hit the error, re-run the cell should work.

    Let me know if you still experience errors.

  4. 1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    3 comments  ·  Product Feedback  ·  Flag idea as inappropriate…  ·  Admin →
    Xiangrui Meng commented  · 

Feedback and Knowledge Base