As we're getting into deep learning & neural network based modeling, there is a need to allow projects or models to benefit from GPU based compute. Today when we provision DSx or WML, we have to pre select a HW config. Thereafter all projects get built on that same instance. As a result everyone gets clumped on the same HW profile.
A model that does simple classification perhaps gets the same performance capability/guarantee as a model that runs ARN. This becomes even more complex as one provisions a GPU enabled HW and deploys DSx on it. The platform has no visibility into the type of hw acceleration available, therefore there are no tuning knobs available to the modeller.
There should be a way for CPU/GPU allocation per model, or a selection that can be provided for low cost compute when needed. Say for eg, we are training images that requires many hyper parameters and data is very rich, but dirty. I might better train on a low cost infra, and then switch to a GPU enabled one when I get to an economic efficiency with cleaner data.
This would make DSx & WML deployment for large enterprise much more energy efficient. Almost like model & deployment based workload management, so a guarantee in performance is provided to those models that need them for training perhaps, or critical online streaming situations. And even segregated by user profile so departments who do light Auto ML / Model Flows type of work are not straining those who are running elaborate programs on the same DSx instance.
Why is it useful?
|Who would benefit from this IDEA?||Developers Scientists|
How should it work?