To monitor the efficiency of the GKE training JobSet, the following two GKE
Share
Services
## Feature
Feature
To monitor the efficiency of the GKE training JobSet, the following two GKE system metrics are available in Preview:
* `kubernetes.io/jobset/scheduling_goodput`: the fraction of time that all the resources required to run the training JobSet are available.
* `kubernetes.io/jobset/proxy_runtime_goodput`: the fraction of time that all required accelerators are productive. This metric provides an estimate of the real runtime goodput.
For details about GKE metrics, see [Kubernetes metrics](https://docs.cloud.google.com/monitoring/api/metrics%5Fkubernetes#kubernetes-kubernetes). For details about goodput metrics that are used to measure efficiency, see[Monitor goodput with the ML Goodput Measurement library](https://docs.cloud.google.com/tpu/docs/goodput#jobset-dashboard).
You can also view these new GKE metrics in the [JobSet monitoring dashboard](https://docs.cloud.google.com/kubernetes-engine/docs/tutorials/tpu-multislice-kueue#monitor%5Fthe%5Fworkloads).
What else is happening at Google Cloud Platform?
Read update
Services
Share
Data products in Knowledge Catalog is Generally Available (GA)
about 2 hours ago
Services
Share
Read update
Services
Share