June 1st, 2021

Dataproc - June 1st, 2021 [Change, Deprecate, Fix]

Services

## Change New [sub-minor versions](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions) of Dataproc images: 1.3.91-debian10, 1.3.91-ubuntu18, 1.4.62-debian10, 1.4.62-ubuntu18, 1.5.37-centos8, 1.5.37-debian10, 1.5.37-ubuntu18, 2.0.11-centos8, 2.0.11-debian10, and 2.0.11-ubuntu18. * **Rollback Notice:** See the [June 29, 2021 release note](https://cloud.google.com/dataproc/docs/release-notes#June%5F29%5F2021) rollback notice. ## Change **Image 1.3 - 2.0** * All jobs now share a single `JobthreadPool`. * The number of Job threads in the Agent is configurable with the `dataproc:agent.process.threads.job.min` and `dataproc:agent.process.threads.job.max` [cluster properties](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties#dataproc%5Fservice%5Fproperties%5Ftable), defaulting to 10 and 100, respectively. The previous behavior was to always use 10 Job threads. ## Change **Image 2.0** * Added snappy-jar dependency to Hadoop. * Upgraded versions of Python packages: `nbdime 2.1` \-> `3.0`, `pyarrow 2.0` \-> `3.0`, `spyder 4.2` \-> `5.0`, `spyder-kernels 1.10` \-> `2.0`, `regex 2020.11` \-> `2021.4`. ## Deprecate **Image 1.5 and 2.0** * Agnets no longer publish a `/has_run_before` sentinel file. If you use a fork of [connectors initialization-actions](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors), then sync from head. ## Fix **Image 1.3 - 2.0** * [SPARK-35227](https://issues.apache.org/jira/browse/SPARK-35227): Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit. ## Fix **Image 2.0** * Fixed the problem that the environment variable PATH was not set in YARN containers. * [SPARK-34731](https://issues.apache.org/jira/browse/SPARK-34731): ConcurrentModificationException in EventLoggingListener when redacting properties.