Outage with CopperEgg App
Postmortem

Cause - Nestdb B cluster’s master server stopped due to AWS maintenance.

Prevention - Switched slave as a master on Admin to prevent the data loss for sites using nestdb B clusters and stopped push replicator service in master. Removed entry of nestdb-b-1 server which was stopped. Later started all the services of that server which got stopped and manually replicated all the data that running master server (nestdb-b-4) collected meanwhile repairing another server so that both the servers have the same data files and are in sync with each other. Then started the push replicator service in the master and waited for the Journal Queue to drop to 0. After that added nestdb-b-1 server to the admin as a slave.

Data Loss - There was a data loss before we switched the nestdb-b-4 as a master on admin and removed nestdb-b-1. For one hour nestdb-b-1 entry was there on admin as master hence some data loss from 2.55 p.m IST to 3.55 p.m IST.

Posted Feb 24, 2021 - 00:04 CST

Resolved
This incident has been resolved.
Posted Feb 16, 2021 - 09:56 CST
Update
We are continuing to investigate this issue.
Posted Feb 16, 2021 - 05:11 CST
Investigating
NGINX has crashed in one of our data store servers due to an underlying hardware issue. Engineering has been notified and they are investigating this
Posted Feb 16, 2021 - 05:10 CST
This incident affected: Web Application Interface and Alerting.