Site Reliability ArchitectCompany: IncedoHybridAustin, TX, USA

Overview

Application

Position: Senior Consultant - Site Reliability Architect

Location: Austin, TX

Type: Full-Time

About the company:

Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.

As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.

With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.

Job Overview

We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.

In this role, you will act as a thought leader and architect, driving end-to-end design, architecture, and implementation of scalable, resilient, and secure cloud-native platforms on AWS.. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.

This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.

Required Qualifications

10+ years of experience in Site Reliability, Observability, Production Support, Cloud Architecture or related roles, with a strong focus on architecture and strategy
Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
Strong understanding of microservices architecture, APIs, and distributed systems
Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration

Strong hands-on experience with AWS services, including:

Compute & Networking: VPC, EC2, ECS/EKS, Lambda
Databases: RDS, Aurora, DynamoDB
Storage & CDN: S3, CloudFront
Security: IAM, KMS, Security Groups, NACLs

Proven experience designing multi-account, multi-region AWS architectures

Deep understanding of:

Cloud networking and distributed systems
Security and compliance best practices
Scalability, resiliency, and fault-tolerant design patterns

Hands-on expertise with Terraform (or similar IaC tools)

Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, etc.)

Strong experience with DevSecOps principles and CI/CD pipelines

Excellent problem-solving and analytical skills
Demonstrated ability to lead cross-functional initiatives and influence technical direction

Preferred Qualifications

AWS Certifications (e.g., Solutions Architect – Associate or Professional)
Experience working in financial services, banking, or regulated environments
Background in Site Reliability Engineering (SRE) practices and production support models

Key Responsibilities

Design, architect, and build cloud-native infrastructure and application services on AWS
Lead end-to-end infrastructure design for application platforms, microservices, and shared services
Implement and manage Infrastructure as Code (IaC) using Terraform
Design and maintain highly available, scalable, secure, and cost-optimized AWS architectures
Troubleshoot and resolve complex infrastructure and application service issues
Provide architectural guidance and technical leadership across engineering teams
Drive adoption of DevSecOps best practices across the SDLC
Establish and enhance monitoring, observability, and alerting frameworks