rdc@fri - Major malfunction of one of the nodes – Incident details

Storage experiencing degraded performance

Major malfunction of one of the nodes

Resolved
Degraded performance
Started 5 months agoLasted 13 days

Affected

Frida

Degraded performance from 12:21 PM to 10:40 AM

Compute

Degraded performance from 12:21 PM to 10:40 AM

Updates
  • Resolved
    Resolved

    The GPUs on node ixh has been successfully replaced and the node is back in production. Please, benchmark your runs against earlier ones and report any discrepancies.

    Thank you for your patience.

  • Update
    Update

    Node ixh is down as during the replacement of GPU6 issues with GPU0 have been detected. We're coordinating a resolution with support.

    Thank you for your patience.

  • Identified
    Identified

    Node ixh is down due to overheating of GPU6. We are working on a resolution with support.

    Thank you for your patience.

  • Investigating
    Investigating
    We are currently investigating this incident.