We use caching broadly throughout our application. Caching provides a way for us to temporarily store data that would otherwise need to be retrieved over and over again. As an example, we store whether an account is active or not in this caching system, avoiding the need to check with Stripe - our payment provider - if an account is active or not. Among other things, using this temporary storage helps to create a quick and responsive user experience within the application.
Caching requires a server for temporary data storage, and we are currently using the same server for both background job processing and UI/UX speed-ups. This is the root of why the web application was experiencing errors.
Our caching server went down because of a script we ran to fix a problem with our background jobs, but since the shared server was entirely unavailable, it caused outages in the web app as well because it depended on the same server.
Once we were able to unblock our job processor infrastructure, after 12 minutes, the app went back online.
Our Engineering team will be working on separating the caching that’s used for background jobs from the caching that’s used for UI/UX.