What happens when you rely on a third-party service and it goes down? And how do you even know it’s down until your own product stops working or hangs?
In this Spotlight on Cloud, learn how the United States Digital Service (USDS) made its systems more fault tolerant. Aaron Wieczorek, site reliability engineer at USDS, analyzes the challenges of proactively monitoring third-party services and details USDS’s black box monitoring solution, which uses modern open source tools like Prometheus and Grafana to provide monitoring, incident response, and root cause analysis for events outside of the team’s control.
Recorded on September 5, 2019. See the original event page for resources for further learning or watch recordings of other past events.
O’Reilly Spotlight explores emerging business and technology topics and ideas through a series of one-hour interactive events. In live conversations, participants share their questions and ideas while hearing the experts’ unique perspectives, insights, fears, and predictions for the future.
In every edition of Spotlight on Cloud, you’ll learn about the complex, ever-evolving world of the cloud. You’ll discover how successful companies have adopted and embraced this massive network of shared information and how you can follow their lead to transform your organization and prepare for the Next Economy.