Docs - Operational
Docs
Previous page
Next page
Collapse group
Frida
Login - Operational
Login
Storage - Operational
Compute - Operational
Compute
GPU0 on node ixh has been successfully replaced and the node is back in production. Please, benchmark your runs against earlier ones and report any discrepancies.Thank you for your patience.
Node ixh is down due to overheating of GPU0. We are working on a resolution with support.
Thank you for your patience.
We are reinstating the node and we'll monitor the status.
Maintenance has completed successfully. We performed the full set of updates & upgrades on all cluster nodes. The storage cluster has also undergone the full set of updates & upgrades. Most of the cluster is operational and ready to accept jobs. One node is currently kept in maintenance mode due to HW issues that require physical maintenance, it should resume operation in the next few days.
Cluster cannot be accessed at the moment. This incident was created by an automated monitoring service.
The RDC cooling was partially fixed, we're bringing the cluster back to production. We'll be monitoring the status. During the week the remaining RDC cooling issues will be resolved.We appreciate your patience
The RDC cooling is experiencing malfunction. As a preventative measure we're forced to shutdown the cluster; all jobs will be canceled.We appreciate your patience.
The RDC cooling was fixed, we're currently bringing the cluster back to life, performing the scheduled FRIDA maintenance, and monitoring the status.
The RDC cooling is experiencing malfunction. As a preventative measure we're forced to shutdown the cluster; all jobs will be canceled. While working to fix the issue we'll also perform the scheduled FRIDA maintenance to keep the downtime at minimum. We appreciate your patience.
We are currently investigating the incident.
Mar 2025 to May 2025
Next