Artwork

Tharun Shiv에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tharun Shiv 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.
Player FM -팟 캐스트 앱
Player FM 앱으로 오프라인으로 전환하세요!

How a Site Reliability Engineer monitors the infrastructure? SRE | Tharun Shiv

7:45
 
공유
 

Manage episode 317171885 series 3112412
Tharun Shiv에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tharun Shiv 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Site reliability engineering

Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.

Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:

  1. Create
  2. Monitor
  3. Manage
  4. Destroy

Let's dive deep into each one of them.

Create

1. Provision virtual machines / PXE Baremetals

SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.

2. Setup services

Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.

3. Optimize the infrastructure

Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.

4. Write monitoring scripts

When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure

5. Write automation scripts

If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.

6. Manage users on the machines

  continue reading

50 에피소드

Artwork
icon공유
 
Manage episode 317171885 series 3112412
Tharun Shiv에서 제공하는 콘텐츠입니다. 에피소드, 그래픽, 팟캐스트 설명을 포함한 모든 팟캐스트 콘텐츠는 Tharun Shiv 또는 해당 팟캐스트 플랫폼 파트너가 직접 업로드하고 제공합니다. 누군가가 귀하의 허락 없이 귀하의 저작물을 사용하고 있다고 생각되는 경우 여기에 설명된 절차를 따르실 수 있습니다 https://ko.player.fm/legal.

Site reliability engineering

Site Reliability Engineering, also popularly referred to as the SRE, is a role in Computer Science Engineering where the main purpose is to provision, maintain, monitor, and manage the infrastructure in order to provide maximum application uptime and reliability. SRE is an emerging role, but the tasks that the SRE does were always there ever since the first application that was developed. The scope of the software developers ends where they write code to develop the application and right from setting up the infrastructure, the various services that run on them, the network connectivity that is required, providing a platform for the application to run and making sure every part of the application is up and running reliably 24x7 is the duty of an SRE. In fact, we can consider Site Reliability Engineers are the strong bridge between the users and a reliable application.

Now, in order to explain the different responsibilities of an SRE, I have divided it into 4 different categories. I have always seen SRE this way, and definitely not as some ad-hoc process. The four categories in which I would classify the tasks of a Site Reliability Engineer are:

  1. Create
  2. Monitor
  3. Manage
  4. Destroy

Let's dive deep into each one of them.

Create

1. Provision virtual machines / PXE Baremetals

SREs are responsible for provisioning the virtual machines with the requested resources in terms of CPU, memory, disks, network configurations, and operating system. They are also responsible to be rack aware during provisioning. Example operating systems involve Linux Ubuntu, CentOS, Windows.

2. Setup services

Example technologies involve NGINX, Apache, RabbitMQ, Kafka, Hadoop, Traefik, MySQL, PostgreSQL, Aerospike, MongoDB, Redis, MinIO, Kubernetes, Apache Mesos, Marathon, MariaDB, Galera.

3. Optimize the infrastructure

Since there are several components and services that are being used in the infrastructure, there is a scope for improvements in terms of performance, efficiency, and security. The SRE optimizes the components by keeping them up to date, choosing the right service for the right job, patching the servers.

4. Write monitoring scripts

When the SRE are involved in maintaining an infrastructure of any size, they never underestimate any component of the infrastructure and write a monitoring script to monitor the components and metrics of each and every one of them. This provides the ability to get real-time alerts on any of the components malfunctioning and also a better view of the infrastructure. The SRE uses programming languages like Bash, Python, Golang, Perl, and tools like daemon processes, Riemann, InfluxDB, OpenTSDB, Kafka, Grafana, Prometheus, and APIs to monitor the infrastructure

5. Write automation scripts

If there are more than 10 steps to be performed and chances are that the task has to be performed more than once, the SRE never hesitate to automate the task. This saves time and also prevents human error. The SRE uses programming languages like Bash, Python, Golang, Perl, Ansible to automate the tasks.

6. Manage users on the machines

  continue reading

50 에피소드

Minden epizód

×
 
Loading …

플레이어 FM에 오신것을 환영합니다!

플레이어 FM은 웹에서 고품질 팟캐스트를 검색하여 지금 바로 즐길 수 있도록 합니다. 최고의 팟캐스트 앱이며 Android, iPhone 및 웹에서도 작동합니다. 장치 간 구독 동기화를 위해 가입하세요.

 

빠른 참조 가이드

탐색하는 동안 이 프로그램을 들어보세요.
재생