compLogoSite Reliability ArchitectCompany: IncedoHybridAustin, TX, USA
Position: Senior Consultant - Site Reliability Architect
Location: Austin, TX
Type: Full-Time

About the company:
Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.

As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.

With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.

Job Overview
We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.

In this role, you will act as a thought leader and architect, driving end-to-end design, architecture, and implementation of scalable, resilient, and secure cloud-native platforms on AWS.. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.

This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.

Required Qualifications
  • 10+ years of experience in Site Reliability, Observability, Production Support, Cloud Architecture or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
Strong hands-on experience with AWS services, including:
  • Compute & Networking: VPC, EC2, ECS/EKS, Lambda
  • Databases: RDS, Aurora, DynamoDB
  • Storage & CDN: S3, CloudFront
  • Security: IAM, KMS, Security Groups, NACLs
Proven experience designing multi-account, multi-region AWS architectures
Deep understanding of:
  • Cloud networking and distributed systems
  • Security and compliance best practices
  • Scalability, resiliency, and fault-tolerant design patterns
Hands-on expertise with Terraform (or similar IaC tools)
Experience with monitoring and observability tools (CloudWatch, Prometheus, Grafana, etc.)
Strong experience with DevSecOps principles and CI/CD pipelines
  • Excellent problem-solving and analytical skills
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction

Preferred Qualifications
  • AWS Certifications (e.g., Solutions Architect – Associate or Professional)
  • Experience working in financial services, banking, or regulated environments
  • Background in Site Reliability Engineering (SRE) practices and production support models

Key Responsibilities
  • Design, architect, and build cloud-native infrastructure and application services on AWS
  • Lead end-to-end infrastructure design for application platforms, microservices, and shared services
  • Implement and manage Infrastructure as Code (IaC) using Terraform
  • Design and maintain highly available, scalable, secure, and cost-optimized AWS architectures
  • Troubleshoot and resolve complex infrastructure and application service issues
  • Provide architectural guidance and technical leadership across engineering teams
  • Drive adoption of DevSecOps best practices across the SDLC
  • Establish and enhance monitoring, observability, and alerting frameworks