When working with PySpark, it is common to need realtime access to stdout and log4j from the JVM side of Spark. When running PySpark from a terminal, Python stdout, JVM stdout, and log4j logs sent to the console will all be available in that same terminal. However, this is only because the Python process and the JVM process are set to independently output to the same terminal location. When working with notebooks, only the Python stdout will be shown. It would be ideal to be able to view any JVM stdout and log4j console logs in realtime even while working within a PySpark notebook.
Why is it useful?
|Who would benefit from this IDEA?|
How should it work?