top of page
premier.png

Using Epsagon for tracing and monitoring in AWS

Updated: Oct 7, 2022

Problem statement

A modern microservices architecture allows a large application to be separated into smaller independent parts, with each part having its functionality and responsibility. To serve a single user request, a microservices-based application can call on many internal microservices to compose its response. The advantage here is that different teams/developers can work on their microservices and use languages/frameworks that suit best in a particular case, but end-to-end visibility in daily operations might be weak. Many different tools can help with monitoring, logging, and tracings, such as Amazon CloudWatch for metrics and logs, Container insights, and AWS X-Ray. In this post, we will take a look at Epsagon as a microservices-native observability platform for container and serverless environments.


Solution overview

Epsagon is a solution that allows you to monitor and troubleshoot issues in microservice environments. It's designed to make Dev and Ops teams more efficient by identifying problems, correlating data, and finding root causes. It was acquired by Cisco in 2021.

Epsagon makes it easy to monitor your cloud services, container orchestrators (Kubernetes and AWS ECS), and serverless functions. Monitor the CPU and memory utilization of your containers and the duration and cold-starts of your serverless functions.

Epsagon has several integrations that can be easily and securely applied to your AWS environment.

ree

Getting started

The first thing that we can do is to sign-up for Epsagon and add AWS integration.

ree

Epsagon provides a CloudFormation template that will deploy the IAM role and other required AWS resources in the account.

ree

The stack creates its CloudTrail for "write-only" actions, trail S3 bucket, CloudWatch Log group, and IAM role for cross-account access.

ree

IAM role contains only policies that are required for Epsagon monitoring.

ree

Epsagon also needs to add a subscription filter to log groups, create event rules and add Lambda Layers.

ree

Only the Epsagon account can assume the given role + it is protected by ExternalId.

ree

ECS cluster monitoring


ECS does not require any configuration. Epsagon just uses the IAM role and gets all information from AWS. The clusters tab shows the status and utilization of the cluster and the number of running services.

ree

EC2 and Fargate clusters are supported. The Services tab shows utilization, number of running tasks, task definition, and other details.

ree
ree

The instances tab shows details about every node.

ree

The tasks tab provides also provides container details and logs:

ree
ree
ree

Tracing requires extra development, but there are many frameworks and libraries for different programming languages.

ree

Tracing looks as follows:

ree

EKS cluster monitoring

Kubernetes integration requires a Helm chart installation.

ree

You will be provided with a command for Helm installation.

ree


Once the Helm chart is installed, you will see your Kubernetes cluster in the list.

ree

Kubernetes control plane metrics are shown in the relevant tab.

ree

You can see all nodes and their status

ree

configuration of every node,

ree

and metrics for every node.

ree

You can see controllers such as Deployments and DaemonSets,

ree

manifests for every controller,

ree

metrics

ree

and events.

ree

All pods information is also available

ree

manifest of every pod

ree

metrics

ree

and events

ree

Every container can also be checked

ree

We can see ports, volumes, and other information about containers.

ree

and metrics.

ree

Metrics are useful for dashboards and alerting which will be demonstrated below.


Graphs and tracing

You can use the Epsagon framework and libraries for adding tracing to your containers or functions. Graphs show total requests and errors for a given period, as well as latency.

ree

Every trace provides a graph, timeline, and sequence of requests.

ree
ree
ree


Dashboards

First of all, we can build a high-level view dashboard, for example, the top 5 errors by application.

ree

see API gateway endpoints throughput and latency,

ree

error codes, invocations, and exceptions

ree

There are various predefined dashboards for serverless and containers:

ree

Kubernetes dashboard shows overall resource utilization and can be filtered by namescape.

ree

Every pod and container can be monitored.

ree
ree

Lambda monitoring

We can see a lot of useful information about the Lambda function, such as the number of invocations and errors, duration, and estimated cost, in one place

ree

By clicking on any function you see more details with graphs and logs. You will also have a direct link to the resource in the AWS console if you need to navigate there for further checking.

ree

Lambda logs are the same as you can find in the CloudWatch log group.

ree

Service map

Epsagon automatically builds a service map.

ree

You can see details about every connection

ree

and identify a problem.

ree

Every component of a map provides the following graphs with success/error rate and duration for different types of requests.

ree

Kubernetes applications require an extra effort for adding a tracing mechanism. After that, you can visualize a service map like this:


ree

Incidents and alerts


Epsagon can be integrated with Slack, PageDuty, and other popular services:

ree

Thresholds can be configured for any available metric

ree

The alerts page contains information about issues, notification channels, assignees, and the capability to mute an alarm.

ree

Conclusion

Epsagon is quite an interesting tool that allows to visualize and analyze metrics, logs, and traces. It works with AWS services such as ECS, EKS, Lambda, Kinesis, API Gateway, S3, DynamoDB, Step Functions and provides the capability of custom data collecting. Epsagon provides a free trial (14 days) with the opportunity to test all features, start onboarding your team, and monitor your applications.

Comments


bottom of page