When a file is saved to object storage (or HDFS, ....) with Spark, it will create a directory of the file name and then a number of "parts" underneath this directory. Consequently, when you click on the data asset filename and select download, you are actually downloading a directory of file size 0MB which when opened will just be an empty file with the correct name. Ideally, the system should understand that this filename is a directory, go get the correspond part files, join them and return as one file. The workaround, is using the "coalesce" function to join all files as one, navigating to object storage, selecting the part file under your directory and saving it. It will have a long file name (e.g. part-00000-c6e283b2-770a-4a6a-9fa2-1856eb402d51.csv-attempt_20180309172944_0217_m_000000_0) without context to the original file name nor the correct extension. This is important as it currently means user's will either assume they didn't save correctly, can't access their data or have to suffer the long workaround if they are able to figure out how to navigate to object storage. In addition, my object storage page does not open in Firefox and therefore, I navigate to Chrome.
Why is it useful?
|Who would benefit from this IDEA?||Everyone that wants to download an output file|
How should it work?