Zingle App Loading Issues
Incident Report for Medallia Concierge
Postmortem

Incident Report - Zingle App Issues

Customer Impact

On August 19th - Sept 1st, 2022, the Zingle platform experienced degraded service where customers were unable to send/receive messages. Zingle engineers were immediately alerted to the outage and began working to resolve the issues.

Cause

On August 19th, 2022, a database (DB) swap & Redis server split took place during a scheduled downtime maintenance window. After the facilitation of a DB swap and Redis split, two issues became apparent.

  1. The memory available for our DB and our web servers was inadequate for the average web traffic we received.
  2. The new connection strings were not correctly configured across all our hardware to ensure all hardware is running correctly and connected to our data store.

The two issues above caused a cascading failure where the web servers aggressively tried to grab more connections of the DB than the DB was configured for, causing sporadic but consistent data connection issues across the entire platform. As a result, message receiving and message sending were impacted as well as data importing and Inbox stability. Zings were also impacted as well from executing correctly.

Resolution Steps

After much monitoring and observations to understand where the issues were, work began to upgrade all hardware that communicates with the DB. Work also began to upgrade the DB hardware. This required an emergency maintenance window of approx. 30 minutes on Sept 1st. We also increased the RAM used per web server thread to take advantage of the available memory.

The Zingle engineering team has taken the following corrective actions to prevent this scenario from occurring again in the future:

Any change to our DB hardware that requires a physical move of a DB will now go through a more stringent process to understand the impacts that will take place. We will also write out a plan of action on how to ensure all hardware is updated correctly when we facilitate this move. Refactoring of certain code that was DB inefficient has taken place. Also, a plan to improve the performance of our DB schema has also taken place with an investigation on where we can improve our DB performance.

We sincerely apologize for this inconvenience caused during that time, as we know Zingle is mission-critical to many teams and businesses. If you have any questions or concerns, please do not hesitate to reach out to your account manager or our Support team.

Posted Sep 13, 2022 - 12:55 EDT

Resolved
This incident has been resolved.
Posted Aug 22, 2022 - 23:07 EDT
Monitoring
We have identified the cause of these issue and implemented the necessary updates to resolve it.

The web app should be loading normally now for most users, but some users may need to refresh the app. Outbound messages should also be working normally now.

We will continue to monitor this issue to ensure full system recovery.
Posted Aug 22, 2022 - 22:01 EDT
Investigating
Our engineering teams are continuing to investigate issues affecting the Zingle App for services in the Production environment, and implementing the necessary mitigating steps. Updates will continue to be provided as they are available.
Posted Aug 22, 2022 - 21:05 EDT
Update
We are continuing to monitor for any further issues.
Posted Aug 22, 2022 - 20:42 EDT
Update
We thank you for your patience as our teams continue to work on the fix for Zingle Services. We have identified the cause of this issue and have implemented a configuration update to correct them.

We will continue to monitor this issue and ask that you subscribe to updates here - https://status.zingle.me/
Posted Aug 22, 2022 - 19:52 EDT
Monitoring
We have identified the cause of these issue and have implemented a configuration update to correct them.

This issue should be resolved for most users, and the web app loading normally now. Affected users may need to refresh the app.

We will continue to monitor this issue. Thanks for your patience.
Posted Aug 22, 2022 - 13:26 EDT
Update
We are continuing to work on a fix for this issue. Some users may continue to see errors (404 page) accessing the Web App, or sending outbound messages or replies.
Posted Aug 22, 2022 - 12:21 EDT
Identified
We've identified the source of the elevated access errors and are taking measures to resolve them. Affected users may receive a 404 error page upon loading the web app, or see errors when sending outbound messages.

Users should attempt to reload the Zingle app as we continue we're continuing to make changes to improve system performance and ensure full recovery for all affected users.
Posted Aug 22, 2022 - 11:39 EDT
Investigating
We have identified an issue impacting the Zingle App, where users are unable to access their Zingle Inbox in the Production environment. Our team is fully engaged and actively working on resolving these issues as soon as possible.

We'll continue to post updates as we have them. We appreciate your patience
Posted Aug 22, 2022 - 11:10 EDT
This incident affected: Web App - [app.zingle.me] (Web Inbox, SMS Messaging, Zings, API, Hospitality Integration System).