Effective Monitoring Systems for Cloud Infrastructure | Sela | Sela.
Sela. | Cloud Better.

Monitoring Alerts in GCP by integrating Cloud Operations with Notification Channels

An integral part of ensuring applications run smoothly on the cloud is to ensure the infrastructure runs smoothly, and one way of doing this is to anticipate problems and handle them before they escalate into a system downtime. A simple and effective way of doing this is to set up effective monitoring systems. Systems that monitor the health of the infrastructure on important low-level metrics such as CPU usage, disk usage, memory usage, etc. These metrics going out of bounds can alert us to potential problems and preventive actions can be taken.

Yogesh Golande, Head of Engineering - Sela

An integral part of ensuring applications run smoothly on the cloud is to ensure the infrastructure runs smoothly, and one way of doing this is to anticipate problems and handle them before they escalate into a system downtime. A simple and effective way of doing this is to set up effective monitoring systems. Systems that monitor the health of the infrastructure on important low-level metrics such as CPU usage, disk usage, memory usage, etc. These metrics going out of bounds can alert us to potential problems and preventive actions can be taken.

How can these monitoring systems alert us?

In this blog, we take a look at Google Cloud’s Cloud Operations (formerly StackDriver) and the options available to get monitoring alerts.

Cloud operations Monitoring measures the health of cloud resources and applications by providing visibility into metrics such as CPU usage, disk I/O, memory, network traffic, and uptime. These measurements, which are known as metrics, are made repeatedly over time and constitute a time series of measurements. After we put systems in place to collect this data, the next step is to determine what values are safe, what values indicate a “warning” and what values are a “danger” sign. This is a matter of tuning and review. The next step is determining how the SRE team gets notifications whenever any of the metrics are in a warning or danger state.

There are various channels available for these notifications — some out of the box, some that can be integrated into applications and dashboards. We take a look at some of these and explain how to set them up. It is a matter of company policy to decide which notification channels are to be used and different channels can be used for different types of notifications — based on components, severity, etc.

Some Notification channel’s integration lets us create automated and programmatic workflows in response to alerts. For example, Pub/Sub is a publish/subscribe queue that lets us build loosely coupled applications. Using Pub/Sub as our notification channel makes it easy to integrate with third-party providers for alerting and incident response workflows, we can use the Pub/Sub as a notification channel through the API and the Google Cloud Console.

Different types of Notification Alerts and Creating Channels

There are multiple types of notification alerts provided by Google Cloud such as:

  1. Email
  2. MobileApp
  3. PagerDuty
  4. SMS
  5. Slack
  6. Webhook
  7. Pub/Sub

In this blog, we will focus on Slack, Webhook, and Pub/Sub. Slack is a popular tool for internal and external communication used by a lot of organizations. Webhooks allow custom handling of notification and can be integrated with existing applications and dashboards. Pub/Sub also allows for custom handling where one notification can be handled in different ways.

Let us dive in and start setting up the notification channel.

setting up the notification channel

To configure a notification channel, you must have one of the following Identity and Access Management roles on the Workspace’s host project:

-Monitoring NotificationChannel Editor

-Monitoring Editor

-Monitoring Admin

-Project Editor

-Project Owner

When you are creating an alerting policy, you have the option to select any configured notification channel and add it to your policy. You can pre-configure your notification channels, or you can configure them as part of the process of creating an alerting policy. To create a notification channel we need to follow the below steps:

  1. In the Cloud Console, we need to select Monitoring or search monitoring on the search window.
  2. This will take you to the Monitoring section.
  3. By using the Monitoring navigation pane, we need to verify that the correct Workspace is selected.
  4. In the Monitoring navigation pane, click the notifications “Alerting”.
  5. Under the Alerting section click Edit notification channels on the top.

To add a new notification channel, locate the channel type such as “SMS” or “SLACK” or “PUB/SUB” and click Add new, and then follow the channel-specific instructions as per the below instruction.

Notification via Slack

Monitoring Slack integration allows your alerting policies to post to a Slack channel when a new incident is created. The slack free version provides features of 10 apps or service integrations.

To set up Slack notifications, do the following:

  1. In Slack, create a Slack channel at the Slack site and record the channel URL.
  1. In the Cloud Console, select Monitoring and Go to Monitoring
  1. Select Alerting and then select Edit notification channels.

 

 

  1. In the Slack section, click Add new which brings you to the Slack sign-in page.
  2. Enter your Slack workspace URL.
  3. Select and click Allow to enable Google Cloud Monitoring access to your Slack workspace

 

 

  1. When you create an alerting policy, select Slack in the Notifications section and choose your Slack configuration.

Notification via Webhook:

Webhooks are aimed at explicit invocation where the publisher (client) retains full control of the webhook’s execution. This means that the execution timing and the processing of the alert message are the server’s responsibility. Moreover, although a webhook does retry deliveries a few times upon failure if the target endpoint is unavailable for too long, notifications are dropped entirely. A few use cases/examples for webhook would be:

  1. Upon completion of an audit the webhook triggers a script to pull the page details report and send an email to the addresses in the notifications field of the recently completed auditing.
  1. Upon completion of an audit the webhook triggers a script to pull a report from the recently completed audit and display it in a dashboard.
  1. Integrate the Slack messaging app with your webhooks. Upon completion of the audit, your script is triggered to pull data such as rule failures or console errors. A webhook is also available on Slack and that is triggered as soon as the data is pulled and a message is sent to the relevant Slack channel.

 

To configure Webhooks notifications, do the following:

 

  1. The webhook handler: Identify an endpoint URL that will receive webhook data from Monitoring.
  1. In the Webhooks section, click Add new and follow the instructions and click Save.
  1. When you create an alerting policy, select Webhook in the Notifications section and choose your webhook configuration.

Basic authentication

In addition to the webhook request sent by Cloud Monitoring, basic authentication utilizes the HTTP specification for the username and password. Cloud Monitoring requires your server to return a 401 response with the proper WWW-Authenticate header.

III. Notification via Pub/Sub:

 

Pub/Sub is a publish/subscribe queue that lets us build loosely coupled applications. Using Pub/Sub as our notification channel makes it easy to integrate with third-party providers for alerting and incident response workflows, we can use the Pub/Sub as a notification channel through the API and the Cloud.

When to use Pub/Sub Notifications — Pub/Sub supports both explicit (push) and implicit (pull) invocation. In pull mode, the subscriber has control over when to pull the message from the queue and how to process an alert message. Pub/Sub provides a durable queue in which messages wait as long as necessary until the subscriber pulls the message.

Pub/Sub is an access-controlled queue, with access managed by Cloud Identity and Access Management, meaning that the subscriber needs to be able to authenticate using a user or service account. Messages delivered to Pub/Sub never leave the Google network.

An example of where to use Pub/Sub is if an alert message needs to be transformed before it is sent to be processed. Consider this scenario where an uptime check that you configured to check the health of your load balancers is failing. As a result, an alert is fired and a message is published to your Pub/Sub channel. A Cloud Function is triggered as soon as a new message hits the Pub/Sub topic. The function reads the message and identifies the failing load balancer. The function then sends a command to change the DNS record to point to a failover load balancer.

 

 

Integration with Pub/Sub can be possible through GUI and Code. Here we show the configuration through a coded way to be familiar with commands.

To send notifications to a Pub/Sub topic, do the following:

  1. Click the following button to enable the Pub/Sub API for your project:

-Enable Pub/Sub API

-Ensure that the correct Google Cloud project is selected.

-If an Enable button is displayed, then click that button.

-If API Enabled is displayed, then the API is already enabled.

  1. Create a Pub/Sub topic, as described in creating a topic, if you don’t already have one. The topic must exist before you can use it as a notification channel. The following command creates a topic called notificationTopic.

gcloud pubsub topics create notificationTopic

  1. Create a notification channel that uses the topic. You can use the Monitoring API, the gcloud command-line tool, or the Cloud Console.

To use the Cloud Console to create the notification channel, go to the Edit notification channels window and then do the following:

  1. In the Cloud Pub/Sub section, click Add new.
  2. Enter a display name for your channel, enter the Pub/Sub topic name, and then click Add channel.

To use the Monitoring API or the gcloud command-line tool to create the notification channel, see creating channels for information and examples.

  1. Authorize the notifications service account to publish to each Pub/Sub topic that you are using as a notification channel.

When you create the first Pub/Sub channel, Cloud Monitoring creates a service account for the Monitoring Notification Service Agent, for the project in which the channel was created. This service account manages the sending of notifications to Pub/Sub-based notification channels in this project.

This service account has an ID with the following structure:

service-[PROJECT_NUMBER]@gcp-sa-monitoring-notification.iam.gserviceaccount.com

You can see this account on the IAM page, not the Service accounts page, of the Cloud Console.

To authorize this account to publish to a topic, you must give the service account the pubsub.publisher IAM role for the topic. The following command does this for the notificationTopic topic:

 

gcloud pubsub topics add-iam-policy-binding \

 

projects/[PROJECT_NUMBER]/topics/notificationTopic — role=roles/pubsub.publisher \

 

— member=serviceAccount:service-[PROJECT_NUMBER]@gcp-sa-monitoring-notification.iam.gserviceaccount.com

 

-If the command succeeds, it returns output like the following:

 

Updated IAM policy for topic [notificationTopic].

 

bindings:

‐ members:

 

‐ serviceAccount:service-[PROJECT_NUMBER]@gcp-sa-monitoring-notification.iam.gserviceaccount.com

 

role: roles/pubsub.publisher

 

etag: BwWcDOIw1Pc=

 

version: 1

 

Note that the project number is not the same as the project ID. Project IDs are typically strings that reflect the project name, like my-test-project. Project numbers are unique numerical identifiers. You can find the project name, ID, and number on the project’s landing page in the Cloud Console or you can retrieve it with the following command:

 

gcloud projects describe [PROJECT_ID] — format=”value(project_number)”

 

  1. Add the Pub/Sub channel to an alerting policy by selecting Pub/Sub as the channel type and a named topic as the notification channel.

 

Editing or deleting the channel: To edit or delete a notification channel by using the Cloud Console, select Monitoring and go to the “Alerting” section of the Monitoring navigation pane. Edit the notification channel and delete the notification you no longer need for you to notify.

 

Once the integration is completed, a notification will be sent to the mentioned channel, below is an example of slack.

Alert notifications give timely awareness of problems in your cloud applications so you can resolve the problems quickly. In the landscape of managing infrastructure, this is a small but important part and hopefully will help speed up your journey to effective and efficient infrastructure management.