March 24th, 2023

Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration

Services

## Change Cloud TPUs now support the [PyTorch 2.0 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes. ### **Beta Features** #### PJRT runtime * Checkout our newest [document](https://github.com/pytorch/xla/blob/r2.0/docs/pjrt.md); PjRt is the default runtime in 2.0. * New Implementation of `xm.rendezvous` with XLA collective communication which scales better ([#4181](https://github.com/pytorch/xla/pull/4181)) * New PJRT TPU backend through the C-API ([#4077](https://github.com/pytorch/xla/pull/4077)) * Use PJRT to default if no runtime is configured ([#4599](https://github.com/pytorch/xla/pull/4599)) * Experimental support for torch.distributed and DDP on TPU v2 and v3 (`[#4520](https://github.com/pytorch/xla/pull/4520)`) #### FSDP * Add `auto_wrap_policy` into XLA FSDP for automatic wrapping ([#4318](https://github.com/pytorch/xla/pull/4318)) ### **Stable Features** #### Lazy Tensor Core Migration * Migration is completed, checkout this [dev discussion](https://dev-discuss.pytorch.org/t/pytorch-xla-2022-q4-dev-update/961) for more detail. * Naively inherits LazyTensor ([#4271](https://github.com/pytorch/xla/pull/4271)) * Adopt even more LazyTensor interfaces ([#4317](https://github.com/pytorch/xla/pull/4317)) * Introduce XLAGraphExecutor ([#4270](https://github.com/pytorch/xla/pull/4270)) * Inherits LazyGraphExecutor ([#4296](https://github.com/pytorch/xla/pull/4296)) * Adopt more LazyGraphExecutor virtual interfaces ([#4314](https://github.com/pytorch/xla/pull/4314)) * Rollback to use `xla::Shape` instead of `torch::lazy::Shape` ([#4111](https://github.com/pytorch/xla/pull/4111)) * Use TORCH\_LAZY\_COUNTER/METRIC ([#4208](https://github.com/pytorch/xla/pull/4208)) #### Improvements & Additions * Add an option to increase the worker thread efficiency for data loading ([#4727](https://github.com/pytorch/xla/pull/4727)) * Improve numerical stability of torch.sigmoid ([#4311](https://github.com/pytorch/xla/pull/4311)) * Add an api to clear counter and metrics ([#4109](https://github.com/pytorch/xla/pull/4109)) * Add `met.short_metrics_report` to display more concise metrics report ([#4148](https://github.com/pytorch/xla/pull/4148)) * Document environment variables ([#4273](https://github.com/pytorch/xla/pull/4273)) * Op Lowering * `_linalg_svd` ([#4537](https://github.com/pytorch/xla/pull/4537)) * `Upsample_bilinear2d` with scale ([#4464](https://github.com/pytorch/xla/pull/4464)) ### **Experimental Features** #### TorchDynamo (`torch.compile`) support * Checkout our newest [doc](https://github.com/pytorch/xla/blob/r2.0/docs/dynamo.md). * Dynamo bridge python binding ([#4119](https://github.com/pytorch/xla/pull/4119)) * Dynamo bridge backend implementation ([#4523](https://github.com/pytorch/xla/pull/4523)) * Training optimization: make execution async ([#4425](https://github.com/pytorch/xla/pull/4425)) * Training optimization: reduce graph execution per step ([#4523](https://github.com/pytorch/xla/pull/4523)) #### PyTorch/XLA GSPMD on single host * Preserve parameter sharding with sharded data placeholder ([#4721)](https://github.com/pytorch/xla/pull/4721) * Transfer shards from server to host ([#4508](https://github.com/pytorch/xla/pull/4508)) * Store the sharding annotation within XLATensor(#[4390](https://github.com/pytorch/xla/pull/4390)) * Use d2d replication for more efficient input sharding ([#4336](https://github.com/pytorch/xla/pull/4336)) * Mesh to support custom device order. ([#4162](https://github.com/pytorch/xla/pull/4162)) * Introduce virtual SPMD device to avoid unpartitioned data transfer ([#4091](https://github.com/pytorch/xla/pull/4091)) ### **Ongoing development** * Ongoing Dynamic Shape implementation * Implement missing `XLASymNodeImpl::Sub` ([#4551](https://github.com/pytorch/xla/pull/4551)) * Make `empty_symint` support dynamism. ([#4550](https://github.com/pytorch/xla/pull/4550)) * Add dynamic shape support to `SigmoidBackward` ([#4322](https://github.com/pytorch/xla/pull/4322)) * Add a forward pass NN model with dynamism test ([#4256](https://github.com/pytorch/xla/pull/4256)) * Ongoing SPMD multi host execution ([#4573](https://github.com/pytorch/xla/pull/4573)) ### **Bug fixes & improvements** * Support int as index type ([#4602](https://github.com/pytorch/xla/pull/4602)) * Only alias inputs and outputs when `force_ltc_sync == True` ([#4575](https://github.com/pytorch/xla/pull/4575)) * Fix race condition between execution and buffer tear down on GPU when using `bfc_allocator` (`[#4542](https://github.com/pytorch/xla/pull/4542)`) * Release the GIL during TransferFromServer ([#4504](https://github.com/pytorch/xla/pull/4504)) * Fix type annotations in FSDP ([#4371](https://github.com/pytorch/xla/pull/4371))