Work closely with solution architects, application development team to ensure adherence to best practices in design and coding w.r.t SRE principles.
Assist development team & other relevant teams to tune the applications/configurations for critical systems to comply with the NFR before going live in production and ensure the performance recommendations are part of the change request process.
Improve application stability & operational efficiency by developing scripts to automate tasks.
Ensure appropriate governance w.r.t framework usage across multiple delivery streams and enhance the framework capability to meet the upcoming requirements.
Participate & contribute to resiliency validation exercises with proper reporting.
Define critical performance KPIs, set alert rules and roll-out monitoring dashboards for Production with timely reporting to the stakeholders.
Assist Prod-Ops team to investigate critical production incidents and come up with root cause analysis and ensure permanent closure of the incidents.
Analyse patterns of production incidents and set-up appropriate alerting/monitoring mechanisms in the system to catch the issues before hand.
Qualifications
Bachelor's Degree of Computer Science with equivalent work experience of 8 years.
Minimum 3 years of hands on experience in JAVA/J2EE , Spring Boot, JavaScript, Ajax, SQL.
Minimum 2 years of hands on experience in container technology such as Red Hat Openshift, Docker, Kubernetes and DevOps Tools such as Jenkins, Bitbucket, JIRA.
Minimum 2 years of hands on experience in data storage technology such as PostgreSQL, MongoDB, SQL Server, Oracle11g.
Minimum 2 years of hands on experience in application monitoring with CA Wily, Grafana, Kibana, Prometheus.
Having 3 years of experience in production support and issue management is a plus.