Wednesday, August 5, 2020

ELK Integration

In this blog, i will summarize about ELK, Before going further let me answer why you need a log aggregation solution. With Cloud native applications running multiple Micro services, with their own lifecycle and log mechanism debugging a production issue is a nightmare without a centralized log aggregation solution

ELK consists of 3 open source products 

E - Elastic NO SQL database 
L - Logstash log pipeline tool that ingests logs from diff sources and send them over to Elastic
K - Kibana visualization tool to view aggregated logs



Pretty straight forward to use from an application developer's perspective, all you need is some config changes in terms of pointing your application configuration to the elastic endpoint and port

Sunday, March 8, 2020

Building high availability systems

High availability is is one of the fundamental basic concept and can affect one badly if the design is flawed or you choose an infrastructure that don't provide the support for it. A highly available infrastructure (Uptime target 99.995%) has the following traits
  • Hardware redundancy
  • Software and application redundancy
  • Data redundancy
  • The single points of failure eliminated
Here are some of the best practices which can help building systems with high availability
1. Automatically detect outages - Your system should be smart enough to detect the possible outages by monitoring metrics, health of application, VMs, nodes

2. Eliminate Single Point of Failure High Availability vs. Redundancy - IT infrastructures must have backup components to replace the failed system.

3. Invest in high end VM (high RAM, CPU, storage) for setting up your failover instance capable enough to handle real time traffic for short duration

3. Ensure the failover instance is not co-located in same network as your primary instance.

4. Ensure the primary (active) and secondary (passive) instance stays in sync - example the primary instance would be interacting to another systems via configuration, the same configuration should be replicated to secondary, the state of both should be same even post upgrades

5. Invest in highly available and reliable Domain Name System (DNS) service which supports Health Checks and monitoring, automatic DNS failover,  Health checks - DNS Load Balancing
Ref http://blog.cloudharmony.com/2012/08/comparison-and-analysis-of-managed-dns.html

6. Invest in reliable Load Balancer which can detect unhealthy targets, stop sending traffic to them, and then spread the load across the remaining healthy targets. Again avoid single point of failure. By implementing redundancy for the load balancer itself, you can eliminate it as a single point of failure.