rdc@fri - Major malfunction of one of the nodes – Incident details
Storage experiencing degraded performance
Major malfunction of one of the nodes
Resolved
Degraded performance
Started 5 months agoLasted 13 days
Affected
Frida
Degraded performance from 12:21 PM to 10:40 AM
Compute
Degraded performance from 12:21 PM to 10:40 AM
Updates
Resolved
Resolved
The GPUs on node ixh has been successfully replaced and the node is back in production. Please, benchmark your runs against earlier ones and report any discrepancies.
Thank you for your patience.
Update
Update
Node ixh is down as during the replacement of GPU6 issues with GPU0 have been detected. We're coordinating a resolution with support.
Thank you for your patience.
Identified
Identified
Node ixh is down due to overheating of GPU6. We are working on a resolution with support.