Site Reliability Engineer
Location: Pune, India
Workplace Type: Onsite
Shift: US Shift
About the Role
We are seeking an experienced Site Reliability Engineer to join our dynamic team in Pune. In this role, you will be instrumental in managing our multi-cloud infrastructure, focusing on AWS and Azure. You will be responsible for setting up and maintaining the infrastructure to support our cloud migration and future division expansion. This position offers a unique opportunity to work in a global environment, collaborate with Automotive and corporate IT teams, learn new skills, and shape the future direction of our infrastructure. The ideal candidate will have a strong background in cloud computing, infrastructure as code, and automation, with a proactive approach to problem-solving and performance optimization. You will be part of the Tech Ops / SRE Team, which operates in a sharing and learning culture to maintain continuous access to our products.
Key Responsibilities
- Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation.
- Balance feature development speed and reliability with well-defined service-level objectives.
- Manage day-to-day operations of AWS/Azure Infrastructure.
- Build and document automation processes for Infrastructure as a Service/Infrastructure as code.
- Manage backup and patch management processes.
- Provide adequate support in architecture planning, migration, and installation for new projects.
- Lead the structural/architectural design of platforms, middleware, databases, and backups according to system requirements.
- Conduct technology capacity planning by reviewing current and future requirements.
- Strategize and implement disaster recovery plans, including creating and implementing backup and recovery plans.
- Manage day-to-day operations by troubleshooting issues, conducting root cause analysis (RCA), and developing fixes.
- Plan for and manage upgrades, migrations, maintenance, backups, installations, and configurations.
- Review technical performance and deploy ways to improve efficiency and fine-tune performance.
- Develop shift rosters to ensure no disruption in the tower.
- Create and update SOPs, Data Responsibility Matrices, operations manuals, and daily test plans.
- Provide weekly status reports to client leadership and internal stakeholders.
- Leverage technology to develop Service Improvement Plans (SIP) through automation.
Required Skills & Qualifications
- Bachelor’s degree (or equivalent) in computer science or a related discipline with at least 7 years of experience.
- Strong understanding and hands-on experience with EKS, including configuring, deploying, maintaining, troubleshooting, upgrading, and monitoring EKS on AWS.
- Hands-on experience with CI/CD pipelines and DevOps tooling, including Git-based version control (GitLab preferred), pipeline design and maintenance, automated builds, testing, and deployments for cloud-native and containerized workloads.
- Hands-on Experience with Linux Server, AD, LDAP, DNS, Network Storage, AWS Compute services (EC2, FSX, Managed AD, Route 53, etc…).
- Ability to program using scripting with tools or languages, such as PowerShell, Python, Ansible, Terraform, and Bash.
- Familiarity with ITSM processes like Incident, Problem, and Change Management using ServiceNow (preferable).
- Proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
- Strong interpersonal skills, analytical and problem-solving ability, along with strong written and verbal communication.
- Ability to communicate ideas in both technical and non-technical ways.
- A strong capacity for teamwork and a sense of ownership, with the ability to work independently and be self-driven.
- Experience with Infra Cloud Computing Consulting.