Position: Senior Consultant - Site Reliability Architect
Location: Austin, TX
Type: Full-Time
About the company:
Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.
As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.
With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.
Job Overview
We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.
In this role, you will act as a thought leader and architect, driving end-to-end design, architecture, and implementation of scalable, resilient, and secure cloud-native platforms on AWS.. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.
This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.
Required Qualifications
- 10+ years of experience in Site Reliability, Observability, Production Support, Cloud Architecture or related roles, with a strong focus on architecture and strategy
- Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
- Strong understanding of microservices architecture, APIs, and distributed systems
- Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
Strong hands-on experience with AWS services, including:
- Compute & Networking: VPC, EC2, ECS/EKS, Lambda
- Databases: RDS, Aurora, DynamoDB
- Storage & CDN: S3, CloudFront
- Security: IAM, KMS, Security Groups, NACLs
Proven experience designing multi-account, multi-region AWS architectures
Deep understanding of:
- Cloud networking and distributed systems
- Security and compliance best practices
- Scalability, resiliency, and fault-tolerant design patterns
Hands-on expertise with Terraform (or similar IaC tools)
Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, etc.)
Strong experience with DevSecOps principles and CI/CD pipelines
- Excellent problem-solving and analytical skills
- Demonstrated ability to lead cross-functional initiatives and influence technical direction
Preferred Qualifications
- AWS Certifications (e.g., Solutions Architect – Associate or Professional)
- Experience working in financial services, banking, or regulated environments
- Background in Site Reliability Engineering (SRE) practices and production support models
Key Responsibilities
- Design, architect, and build cloud-native infrastructure and application services on AWS
- Lead end-to-end infrastructure design for application platforms, microservices, and shared services
- Implement and manage Infrastructure as Code (IaC) using Terraform
- Design and maintain highly available, scalable, secure, and cost-optimized AWS architectures
- Troubleshoot and resolve complex infrastructure and application service issues
- Provide architectural guidance and technical leadership across engineering teams
- Drive adoption of DevSecOps best practices across the SDLC
- Establish and enhance monitoring, observability, and alerting frameworks