Product Feedback

  • Hot ideas
  • Top ideas
  • New ideas
  • My feedback
  1. Ability to connect a Git Repo to Azure DevOps Server, currently this only works for Azure DevOps Service....

    Ability to connect a Git Repo to Azure DevOps Server, currently this only works for Azure DevOps Service

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  2. Add Azure Files Storage to supported mount list

    dbutils.fs.mount should be able to mount to Azure Files Storage
    This is the service Azure offers for one to store his code\files\outputs, this should be accessible from Databricks

    18 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  3. Add S3 PutObjectAcl Funtion to dbfsutils.

    Our clusters write to cross account S3 Buckets. I already configured BucketOwnerFullControl ACL on spark configuration.
    But need access this output datas from more additional account roles for audit, etc.

    I want you to improve the dbfsutils (or little functions) to able S3 PutObjectACL operation.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  4. Ablity to Download pickled files to a local machine, created in a notebook.

    I need a simply way to download pickled model files created by notebooks and on the cluster driver. This doesn't appear to be possible with no public IP set.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  5. dbutils.fs operations supported wildcards

    Support some sort of wildcard or partial filename matching for the 'from' argument to cp(), rm(), and mv().

    48 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  6. Writing streams back to the mounted Azure Data Lake Store being fully distributed (like reading)

    The fact that you can mount and directly read huge ADLS stream into spark dataframe is great.
    But writing data back to ADLS doesn't really work.
    Spark API saves it into multiple chunks (one per partition, HDFS) but not directly into ADLS but on top of ADLS HDFS. So it is not one distributed stream but many local substreams. Could it be fixed?
    Because right now I have either to:
    a) collect all data to the driver - not scalable
    b) repartition into 1 partition and save it - slow and still file name needs to be clean up
    c)…

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  7. Don't require same column order when importing into DWH using sqldwh connector

    At the moment the sqldwh connector expects the columns in the dataframe to be in the same order as in the dwh table. But this is not always the case. E.g.: in the DWH table we have a surrogate ID as the first column in all Dimension tables. But this is autogenerated, so it is not inside of the dataframes. For the sqldwh connector to work we had to use the workaround to put the surrogate ID as the last column in DWH table.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  8. Workspace API should sync cell headers

    When syncing the workspace to Python files using the databricks cli, the cell headers are not included. We're using this feature for Git integration, and since the headers are not included in the sync, they can't be used.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  9. Save Dataset to CSV: Port formats supported by preview's top 1000 to "download full results" and Dataset.write.csv

    There are many data formats, such as array<string> and list<string> that display well in the preview -- and download well to CSV from the preview -- but throw UnsupportedOperationExceptions when attempting to download the full results from the preview or saving to CSV directly. Can the preview capability be ported into the "download full results" option in Preview and Dataset.write.csv?

    Originally posted to forums: https://forums.databricks.com/questions/12195/csv-data-source-does-not-support-array-data-type-b.html

    7 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  10. Create Table from CSV: Add support for timestamp formats

    With files that you obtain from an external source, timestamps sometimes have rearranged orders. Being able to specify the format like with a Joda specification would eliminate friction to getting going.

    2 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  11. You could perform hive metadata commands without needing a cluster

    We have jobs that load data into s3 every day (parquet), and we create external tables on top of them to be able to run sql with databricks. It would be nice if a job didn't need to spin up a cluster just to run the create table, because I believe all that is happening are hive metadata operations.

    Thanks!

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    1 comment  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  12. Azure Table Storage Connector

    Enable the connection from Databricks to Azure Table Storage.

    6 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  13. Matrix of data storage format/source conversions

    it would be helpful, coming from a pandas/sklearn workflow to have a image/table of the "from" and "to" how to move data around from formats and sources within Databricks. Much like Odo(https://odo.readthedocs.io/en/latest/_images/conversions.png).

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  14. Create Table based on Cosmos DB

    Add the ability to create a table based on a Cosmos DB collection or a specific query.

    3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  15. 3 votes
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    0 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  16. mounted data showed up in the storage tab of the spark UI

    I mounted an S3 bucket and didn't realize the mount persisted after detaching my cluster, so was getting charged for extra SSD storage on AWS when I could've been unmounting the data after use everyday.

    1 vote
    Vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    2 comments  ·  Data import / export  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Feedback and Knowledge Base