Given that model can be affected by the set of data used for training it, it would be nice to have better control over how the data is split up.
A fairly easy improvement is to allow the user to set a seed for controlling the splits, so you can replicate the split in the future.
A more complex change is to allow the user to specify how may different variations of the train/test split to generate and run models against, either combining the results into an ensemble or selecting the best of the variations as the "final" result.
Why is it useful?
|Who would benefit from this IDEA?|
How should it work?