Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration
Share
Services
## Change
Cloud TPUs now support the [PyTorch 2.0 release](https://github.com/pytorch/pytorch/releases), via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.
### **Beta Features**
#### PJRT runtime
* Checkout our newest [document](https://github.com/pytorch/xla/blob/r2.0/docs/pjrt.md); PjRt is the default runtime in 2.0.
* New Implementation of `xm.rendezvous` with XLA collective communication which scales better ([#4181](https://github.com/pytorch/xla/pull/4181))
* New PJRT TPU backend through the C-API ([#4077](https://github.com/pytorch/xla/pull/4077))
* Use PJRT to default if no runtime is configured ([#4599](https://github.com/pytorch/xla/pull/4599))
* Experimental support for torch.distributed and DDP on TPU v2 and v3 (`[#4520](https://github.com/pytorch/xla/pull/4520)`)
#### FSDP
* Add `auto_wrap_policy` into XLA FSDP for automatic wrapping ([#4318](https://github.com/pytorch/xla/pull/4318))
### **Stable Features**
#### Lazy Tensor Core Migration
* Migration is completed, checkout this [dev discussion](https://dev-discuss.pytorch.org/t/pytorch-xla-2022-q4-dev-update/961) for more detail.
* Naively inherits LazyTensor ([#4271](https://github.com/pytorch/xla/pull/4271))
* Adopt even more LazyTensor interfaces ([#4317](https://github.com/pytorch/xla/pull/4317))
* Introduce XLAGraphExecutor ([#4270](https://github.com/pytorch/xla/pull/4270))
* Inherits LazyGraphExecutor ([#4296](https://github.com/pytorch/xla/pull/4296))
* Adopt more LazyGraphExecutor virtual interfaces ([#4314](https://github.com/pytorch/xla/pull/4314))
* Rollback to use `xla::Shape` instead of `torch::lazy::Shape` ([#4111](https://github.com/pytorch/xla/pull/4111))
* Use TORCH\_LAZY\_COUNTER/METRIC ([#4208](https://github.com/pytorch/xla/pull/4208))
#### Improvements & Additions
* Add an option to increase the worker thread efficiency for data loading ([#4727](https://github.com/pytorch/xla/pull/4727))
* Improve numerical stability of torch.sigmoid ([#4311](https://github.com/pytorch/xla/pull/4311))
* Add an api to clear counter and metrics ([#4109](https://github.com/pytorch/xla/pull/4109))
* Add `met.short_metrics_report` to display more concise metrics report ([#4148](https://github.com/pytorch/xla/pull/4148))
* Document environment variables ([#4273](https://github.com/pytorch/xla/pull/4273))
* Op Lowering
* `_linalg_svd` ([#4537](https://github.com/pytorch/xla/pull/4537))
* `Upsample_bilinear2d` with scale ([#4464](https://github.com/pytorch/xla/pull/4464))
### **Experimental Features**
#### TorchDynamo (`torch.compile`) support
* Checkout our newest [doc](https://github.com/pytorch/xla/blob/r2.0/docs/dynamo.md).
* Dynamo bridge python binding ([#4119](https://github.com/pytorch/xla/pull/4119))
* Dynamo bridge backend implementation ([#4523](https://github.com/pytorch/xla/pull/4523))
* Training optimization: make execution async ([#4425](https://github.com/pytorch/xla/pull/4425))
* Training optimization: reduce graph execution per step ([#4523](https://github.com/pytorch/xla/pull/4523))
#### PyTorch/XLA GSPMD on single host
* Preserve parameter sharding with sharded data placeholder ([#4721)](https://github.com/pytorch/xla/pull/4721)
* Transfer shards from server to host ([#4508](https://github.com/pytorch/xla/pull/4508))
* Store the sharding annotation within XLATensor(#[4390](https://github.com/pytorch/xla/pull/4390))
* Use d2d replication for more efficient input sharding ([#4336](https://github.com/pytorch/xla/pull/4336))
* Mesh to support custom device order. ([#4162](https://github.com/pytorch/xla/pull/4162))
* Introduce virtual SPMD device to avoid unpartitioned data transfer ([#4091](https://github.com/pytorch/xla/pull/4091))
### **Ongoing development**
* Ongoing Dynamic Shape implementation
* Implement missing `XLASymNodeImpl::Sub` ([#4551](https://github.com/pytorch/xla/pull/4551))
* Make `empty_symint` support dynamism. ([#4550](https://github.com/pytorch/xla/pull/4550))
* Add dynamic shape support to `SigmoidBackward` ([#4322](https://github.com/pytorch/xla/pull/4322))
* Add a forward pass NN model with dynamism test ([#4256](https://github.com/pytorch/xla/pull/4256))
* Ongoing SPMD multi host execution ([#4573](https://github.com/pytorch/xla/pull/4573))
### **Bug fixes & improvements**
* Support int as index type ([#4602](https://github.com/pytorch/xla/pull/4602))
* Only alias inputs and outputs when `force_ltc_sync == True` ([#4575](https://github.com/pytorch/xla/pull/4575))
* Fix race condition between execution and buffer tear down on GPU when using `bfc_allocator` (`[#4542](https://github.com/pytorch/xla/pull/4542)`)
* Release the GIL during TransferFromServer ([#4504](https://github.com/pytorch/xla/pull/4504))
* Fix type annotations in FSDP ([#4371](https://github.com/pytorch/xla/pull/4371))