Skip to main content

New Cloud Security Podcast by Google is here

Applications fail. Containers crash. It’s a fact of life that SRE and DevOps teams know all too well. To help navigate life’s hiccups, we’ve previously shared how to debug applications running on Google Kubernetes Engine (GKE). We’ve also updated the GKE dashboard with new easier-to-use troubleshooting flows. Today, we go one step further and show you how you can use these flows to quickly find and resolve issues in your applications and infrastructure. 

In this blog, we'll walk through deploying a sample app to your cluster and configuring an alerting policy that will notify you if there are any container restarts observed. From there, we'll trigger the alert and explore how the new GKE dashboard makes it easy to identify the issue and determine exactly what's going on with your workload or infrastructure that may be causing it.

Setting up

Deploy the app
This example uses a demo app that exposes two endpoints: an endpoint at /, which is just a "hello world", and a /crashme endpoint, which uses Go's os.Exit(1) to terminate the process. To deploy the app in your own cluster, create a container image using Cloud Build and deploy it to GKE. Then, expose the service with a load balancer. 

Once the service is deployed, check the running pods:

  ✗ kubectl get pods
NAME                                     READY   STATUS    RESTARTS   AGE
restarting-deployment-54c8678f79-gjh2v   1/1     Running   0          6m38s
restarting-deployment-54c8678f79-l8tsm   1/1     Running   0          6m38s
restarting-deployment-54c8678f79-qjrcb   1/1     Running   0 
 Notice that RESTARTS is initially at zero for each pod. Use a browser or a command line tool like curl to access the /crashme endpoint. At this point, you should see a restart:

  ✗ kubectl get pods
NAME                                     READY   STATUS    RESTARTS   AGE
restarting-deployment-54c8678f79-gjh2v   1/1     Running   1          9m28s
restarting-deployment-54c8678f79-l8tsm   1/1     Running   0          9m28s
restarting-deployment-54c8678f79-qjrcb   1/1     Running   0          9m28s
Each request to that endpoint will result in a restart. However, be careful to not do this more often than every 30 seconds or so, otherwise, the containers will go into CrashLoopBackOff, and it will take time for the service to be available again. You can use this simple shell script to trigger restarts when as needed:

  while true;
curl http://$IP_ADDRESS:8080/crashme; 
sleep 45; 

where $IP_ADDRESS is the IP address of the load balancer you've already created. 

Why do container restarts matter? Well, restarts, to a certain degree, are an expected part of a container’s typical lifecycle in Kubernetes. Too many container restarts, however, could affect the availability of your service, especially when expanded over a larger number of replicas for a given Pod. Not only do excessive restarts degrade the service in question, but they also risks affecting other services downstream that use it as a dependency.

In real life,the culprit for a large number of restarts could be a poorly designed liveness probe, issues like deadlocks in the application itself, or misconfigured memory requests that result in OOMkilled errors. So, it is important for you to proactively alert on container restarts to preempt potential degradation that can cascade across multiple services. 

Configure the alert

Now, you're ready to configure the alert that will notify you when restarts are detected. Here's how to set up your alerting policy:

1 Configure the alert.jpg

You can use the metric, filtered to the specific container name (as specified in the deployment yaml file). Configure the alert to trigger if any timeseries exceeded zero—meaning if any container restarts are observed. 

With the setup done, you are ready to test and see what happens!

Testing the alert

When you’re ready, start the looped script to hit the /crashme endpoint every 45 seconds. The restart_count metric is sampled every 60 seconds, so it shouldn't take very long for an alert to show up on the dashboard:

2 Testing the alert.jpg

You can mouse-over the incident to get more information about it:

3 incident.jpg
Then click on "View Incident". This takes you to the Incident details screen, where you can see the specific resources that triggered it—in this case, the incident is generated by the container.

4 View Incident.jpg

Next, you can click on View Logs to see the logs (in the new Logs Viewer!)—it's immediately apparently that the alert is triggered by the containers restarting:

5 View Logs.jpg

This is all very nicely tied together and makes troubleshooting during an incident much easier!

In summary….
The latest GKE dashboard includes many improvements over previous iterations. The new alerts timeline is intuitive, and incidents are clearly marked so that you can interact with them to get the full details of exactly what happened, all the way down to the container logs that tell you the actual problem.

As an oncall SRE or DevOps engineer for a service running on GKE, the GKE dashboard makes it easier for you to respond to incidents. You're now able to go from an incident all the way to debug logs quickly and easily and reduce the time it takes to triage and mitigate incidents. For a short overview on how to troubleshoot services on GKE, check out this video:


Popular posts from this blog

Use Vault for Gmail Confidential Messages and Jamboard Files

Google vault will be supporting two new formats in the future, Gmail confidential mode emails & Jamboard files stored in Google Drive. Google Vault gives you a chance to retain, hold, search, and export data to support your organization’s retention and eDiscovery needs. This dispatch includes support for new information types with the goal that you can thoroughly oversee your association's information. What happens when individuals in your association sends confidential messages? Vault can hold, retain, search, and export all confidential mode messages sent by users in your association. Messages are constantly accessible to Vault, notwithstanding when the sender sets a termination date or denies access to private messages. Here’s an example of what will see in Vault when they search for and preview this email sent by . But It’ll not work vise versa. Admins can hold, retain, search and export message headers and s

Zoom’s Work Transformation Summit on Jan. 19: Fresh Approaches for Moving Forward

These past two years have undoubtedly reshaped work. More specifically, these past two years — shuffling between remote, in-person, and hybrid work scenarios — reshaped what employees expect out of their jobs, how they want to work, and what the office means to them.  Organizations are challenged with making big decisions to meet those expectations, and those decisions will dramatically alter how they hire, manage their facilities, buy technology, and maintain productivity. Simply adjusting policies and retooling previous work models won’t do. It takes a comprehensive reimagining. To help organizations navigate this next phase of work, Zoom is hosting our  Work Transformation Summit  on Jan. 19, a free, half-day virtual event designed to provide you and your organization with meaningful strategies, creative approaches, and innovative solutions for redefining work.  Summit attendees will have the opportunity to hear from peers and industry experts on the importance of embracing technolo

Access well-known educational technology tools straight from Google Classroom.

  We're making it simpler for instructors to use popular EdTech products that are most effective for their class right in Google Classroom with a new seamless integration of single sign-on, assigning, and grading. With the help of this feature, teachers can find, assign, and grade interesting content for their classes, and both teachers and students can access their EdTech tools without needing to navigate to other websites or apps or go through a cumbersome login process that requires remembering numerous usernames and passwords. This offers a more simplified experience when using technology to affect learning, in addition to saving instructors and students time. We partnered with 15+ EdTech companies to build custom add-ons, including Kahoot!, Pear Deck, IXL, and Nearpod.  Admins :  In order for educators to use add-ons, district administrators must provide access to them. For further information on how to install the add-ons functionality and specific add-ons for a domain, OU, o