1,755 Site Reliability Engineer jobs in the United States
Associate Site Reliability Engineer/Site Reliability Engineer (Redwood City)
Posted 12 days ago
Job Viewed
Job Description
C3 AI (NYSE: AI), is the Enterprise AI application software company. C3 AI delivers a family of fully integrated products including the C3 Agentic AI Platform, an end-to-end platform for developing, deploying, and operating enterprise AI applications, C3 AI applications, a portfolio of industry-specific SaaS enterprise AI applications that enable the digital transformation of organizations globally, and C3 Generative AI, a suite of domain-specific generative AI offerings for the enterprise.Learn more at: C3 AI
We are looking for Associate Site Reliability Engineer /Site Reliability Engineer to join our team at our HQ in Redwood City, CA.
Responsibilities:
- Maximize system uptime and availability, ensuring functional and performance SLAs.
- Establish end-to-end monitoring and alerting on all critical aspects.
- Solve complex problems for critical services and build automation to prevent problem recurrence.
- Influence and create new designs, architectures, standards, and methods for supporting the platform.
- Initiate and lead scripting and automation to streamline system updates and upgrades.
- Set up critical infrastructure, tools, and framework to streamline the deployment cycle.
- Work cross-functionally with Services and Engineering teams.
Qualifications:
- BS or MS in Computer Science, related field, or equivalent professional experience.
- Demonstrated experience in deploying, managing, and operating scalable and fault-tolerant Linux/Kubernetes/JVM-based infrastructure in AWS, GCP, and other public clouds.
- Expertise in Linux Operating Systems, Networking, and Database concepts.
- Experience deploying, upgrading, and troubleshooting Kubernetes clusters and workloads.
- Experience with Cassandra (or another NoSQL alternative).
- Expertise in cloud providers, such as Amazon Web Services, Azure, and GCP.
- Experience with configuration management systems such as Puppet.
- Experience in Bash or Python; to automate and monitor systems.
- Experience with IaC tools like Ansible or Terraform.
- Excellent problem-solving, critical thinking, and communication skills.
- Experience supporting as a DevOps or sys admin for commercial SaaS solutions.
C3 AI provides excellent benefits, a competitive compensation package and generous equity plan.
California Base Pay Range
$116,000 $168,000 USD
C3 AI is proud to be an Equal Opportunity and Affirmative Action Employer. We do not discriminate on the basis of any legally protected characteristics, including disabled and veteran status.
#J-18808-LjbffrSite Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Remote Only**
Contract
$50/hr - $100/hr
You'll closely collaborate with fellow cloud architects and engineers specializing in AWS to design, define, develop, test, and debug cloud solution components. You'll have the chance to work within a GitOps-based framework to create and manage container apps and use products like Kubernetes to further the mission. Use Python to automate across our ecosystem, and work to stabilize and monitor various types of cloud infrastructure following SRE guidelines and principles. Work to resolve immediate issues as a part of an incident response team.
**You Have:**
+ 2+ years of experience with basic SRE principles and guidelines.
+ Experience with troubleshooting, investigating, and resolving issues across many types of cloud infrastructure. Comfortable working during an emergency/incident.
+ 3+ years of experience with the development of tools and processes to drive DevSecOps maturity by automating builds, regression testing, monitoring, and pushing releases across environments
+ 3+ years of experience with troubleshooting, triaging, and resolving issues in CI/CD pipeline failures or latency
+ 3+ years of experience in working with Linux and troubleshooting and upgrading network configuration in RedHat, Ubuntu, and AL flavors
+ Experience with developing enterprise cloud-native platforms using Kubernetes, Docker, or CI/CD tools, including GitHub Actions or GitLab CI/CD
+ Experience with employing an Infrastructure as Code (IaC) approach to managing cloud environments, specifically using Terraform, Terragrunt, and Cloudformation
+ 2+ years of experience with Python automation and frameworks.
+ Ability to obtain a security clearance
+ Bachelor's degree
**Nice If You Have:**
+ Experience in working with GitOps tools (Flux, ArgoCD)
+ CKAD or CKA Certification
+ AWS Certification, including Solutions Architect, DevOps Engineer, Networking, or Security
+ Experience with Python frameworks like FastAPI, Typer and other python tools like pandas and boto3.
**You will receive the following benefits:**
+ Medical Insurance - Four medical plans to choose from for you and your family
+ Dental & Orthodontia Benefits
+ Vision Benefits
+ Health Savings Account (HSA)
+ Health and Dependent Care Flexible Spending Accounts
+ Voluntary Life Insurance, Long-Term & Short-Term Disability Insurance
+ Hospital Indemnity Insurance
+ 401(k) including match with pre and post-tax options
+ Paid Sick Time Leave
+ Legal and Identity Protection Plans
+ Pre-tax Commuter Benefit
+ 529 College Saver Plan
TG Federal is an Equal Opportunity Employer. All applicants must be currently authorized to work on a full-time basis in the country for which they are applying, and no sponsorship is currently available. Employment is subject to the successful completion of a pre-employment screening. Accommodation will be provided in all parts of the hiring process as required under MRP's Employment Accommodation policy. Applicants need to make their needs known in advance.
Be The First To Know
About the latest Site reliability engineer Jobs in United States !
Site Reliability Engineer

Posted today
Job Viewed
Job Description
Insight Global is looking for a seasoned SRE to join one of our largest technology clients' multifaceted and fast-paced Infrastructure, Planning and Processes organization where you will be working as a Senior SRE Engineer. The position will be part of a fast-paced crew that develops and maintains sophisticated internal cloud provisioning products. The team works with various other business units such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure & systems needs.
As an SRE, youll also be working in conjunction with various teams such as software engineering to deploy these new products and manage our infrastructure, associated processes and systems. Keen attention to detail, problem-solving abilities, and a solid knowledge base are essential.
This role pays between $60-$65/hour depending on skillset and years of experience.
We are a company committed to creating inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity employer that believes everyone matters. Qualified candidates will receive consideration for employment opportunities without regard to race, religion, sex, age, marital status, national origin, sexual orientation, citizenship status, disability, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to Human Resources Request Form ( . The EEOC "Know Your Rights" Poster is available here ( .
To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: .
Skills and Requirements
-4+ years of proven experience.
-Bachelor's degree in Computer Science, Information Technology, or related field, or equivalent experience.
-System admin and Windows admin experience in an on prem infrastructure environment
-Proficient with Kubernetes, dockers & virtualization.
-Openstack and Ansible experience -Background in Databases like SQL (MySQL) and timeseries DBs like Prometheus.
-Experience with data analytics/visualization tools like Kibana, Grafana, Splunk etc.
-Strong knowledge of networking principles and protocols, including TCP/IP, DNS, DHCP, and VLANs.
-Proficient using source code management and binary repository systems like GitLab, GitHub, Artifactory, Perforce etc.
-Knowledge of monitoring systems such as Zabbix,
-Prometheus, PagerDuty and/or similar systems.
Advanced knowledge of standard methodologies related to security. null
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to
Site Reliability Engineer

Posted today
Job Viewed
Job Description
**Your role and responsibilities**
Site Reliability Engineer, IBM Corporation, San Jose, CA:
* Ensure the reliability, scalability, and performance of the data analysis product.
* Design, develop, and optimize scalable data collection and visualization pipelines to enable efficient analysis and insights.
* Build and refine advanced forecasting and anomaly detection models to drive data-driven decision-making and improve system performance.
* Design and implement real-time dashboards using React and Node.js to provide critical performance insights.
* Respond to and manage incidents as part of an on-call rotation, troubleshoot system issues, and implement resolutions to enhance reliability.
* Automate infrastructure provisioning, configuration, and deployment through scripting and CI/CD pipelines, while establishing robust monitoring and alerting systems to proactively address anomalies.
* Employ Infrastructure as Code (IaC) principles to provision and maintain servers, databases, networking, and cloud resources, while planning for capacity and scalability to meet growing demands.
* Ensure security and compliance by implementing best practices, managing updates, and participating in audits.
* Collaborate with software developers, data analysts, and other stakeholders to optimize system performance, while maintaining accurate documentation and creating runbooks for operational excellence.
* Drive continuous learning and innovation by identifying opportunities for optimization, leveraging emerging technologies, and implementing solutions to improve system reliability and efficiency.
* Utilize: Data Analytics, Python, PL/SQL, Data Warehousing, ETL (Extract, Transform and Load), GIT, NumPy, Pandas, Scikit-learn.
Required: Master's degree or equivalent in Computer Science, Engineering or related (employer will accept a Bachelor's degree plus five (5) years of progressive experience in lieu of a Master's degree) and one (1) year of experience as a Software Developer or related. One (1) year of experience must include utilizing Data Analytics, Python, PL/SQL, Data Warehousing, ETL (Extract, Transform and Load), GIT, NumPy, Pandas, Scikit-learn. $226158 to $267100 per year. Full time. AV161.
**Required technical and professional expertise**
Master's degree or equivalent in Computer Science, Engineering or related (employer will accept a Bachelor's degree plus five (5) years of progressive experience in lieu of a Master's degree) and one (1) year of experience as a Software Developer or related. One (1) year of experience must include utilizing Data Analytics, Python, PL/SQL, Data Warehousing, ETL (Extract, Transform and Load), GIT, NumPy, Pandas, Scikit-learn.
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
Site Reliability Engineer

Posted today
Job Viewed
Job Description
At American Express, our culture is built on a 175-year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. As part of Team Amex, you'll experience this powerful backing with comprehensive support for your holistic well-being and many opportunities to learn new skills, develop as a leader, and grow your career.
Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.
**How will you make an impact in this role?**
Most of our software development focuses on delivery new features while optimizing existing systems, building infrastructure, and eliminating work through automation. As part of the SRE team, you'll have the opportunity to manage the complex challenges at scale which are unique to American Express, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving, and willingness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big and try new things in a blame-less environment. We promote self-direction to work on relevant projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
Our Software Engineers not only understand how technology works, but how that technology intersects with the people who count on it every single day. Today, creative ideas, insight and new points of view are at the core of how we craft a more powerful, personal and fulfilling experience for all our customers. Member of a Site Reliability Engineering team reporting to a Senior Engineer, Senior Engineering Manager or Engineering Director. We are looking for someone who is passionate about good design and excellent code; someone who can find solutions to hard technical and functional challenges and can work well within and across teams.
**Responsibilities:**
+ Serve as a member of an agile development team that drives discovery, build and implementation of Non-functional requirements.
+ Actively participates in architecture and engineering discussions.
+ Participate in code reviews, ad-hoc pair programming; contribute to iterative improvement of tools, automation and practices used by team.
+ Provide Operational support with building platform monitoring tools/dashboards, ad hoc reports.
+ Build, implement and advise on recovery tooling to adhere to enterprise standards and/or frameworks.
+ Responsible for availability, proactive monitoring / alerting, capacity planning, performance (reducing latency and increasing efficiency) to include testing for technical platforms.
+ Ensure application data flows are accurate and up to date with the objective to increase the knowledge base of all support teams and drive reliability.
+ Facilitates the resolutions of non-application issues (3rd party upstream issues, infrastructure issues, storage, database, network, file transfer etc.)
**Qualifications:**
+ Experience in REST API design and implementation. Micro-Services. Event Based Architecture, Stream processing/Queue - Solace, Kafka.
+ Experience with Databases - Couchbase (Or a different Document DB/ NoSQL), PostgreSQL.
+ Familiar with developer tools like Git, Jenkins, IntelliJ IDEA, Jira, Confluence.
+ Have at least 5 years of experience with Java backend (J2EE).
+ 2 years of experience in Reactive Programming (asynchronous programming paradigm) ideally with Vertx and RxJava.
+ Proven understanding of cloud technologies (eg. docker, Kubernetes, jaeger, open tracing, prometheus).
+ Demonstrated experience in using modern software engineering tools : git workflows, gradle, load testing tools, mock frameworks.
+ A BS or MS degree in Computer Science, Computer Engineering or similar discipline, or equivalent work experience
+ Working in an environment which includes modern web frameworks and complex transaction processing systems leveraging a broad set of technology stacks
+ Write clean code, perform peer code reviews and architecture reviews.
+ Good communication skills - able to explain concepts to product managers and business partners in ways that are relevant to them
+ High levels of energy, engagement, and ownership. Positive attitude is always welcome.
+ Curiosity to learn new technologies and code them into working prototypes
+ Excellent in communication, ability to learn fast & adaptive.
+ Attention to detail with strong thought leadership and analytical abilities
**Qualifications**
Salary Range: $85,000.00 to $150,000.00 annually bonus benefits
The above represents the expected salary range for this job requisition. Ultimately, in determining your pay, we'll consider your location, experience, and other job-related factors.
We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:
+ Competitive base salaries
+ Bonus incentives
+ 6% Company Match on retirement savings plan
+ Free financial coaching and financial well-being support
+ Comprehensive medical, dental, vision, life insurance, and disability benefits
+ Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
+ 20 weeks paid parental leave for all parents, regardless of gender, offered for pregnancy, adoption or surrogacy
+ Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)
+ Free and confidential counseling support through our Healthy Minds program
+ Career development and training opportunities
For a full list of Team Amex benefits, visit our Colleague Benefits Site .
American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law. American Express will consider for employment all qualified applicants, including those with arrest or conviction records, in accordance with the requirements of applicable state and local laws, including, but not limited to, the California Fair Chance Act, the Los Angeles County Fair Chance Ordinance for Employers, and the City of Los Angeles' Fair Chance Initiative for Hiring Ordinance. For positions covered by federal and/or state banking regulations, American Express will comply with such regulations as it relates to the consideration of applicants with criminal convictions.
We back our colleagues with the support they need to thrive, professionally and personally. That's why we have Amex Flex, our enterprise working model that provides greater flexibility to colleagues while ensuring we preserve the important aspects of our unique in-person culture. Depending on role and business needs, colleagues will either work onsite, in a hybrid model (combination of in-office and virtual days) or fully virtually.
US Job Seekers - Click to view the " Know Your Rights " poster. If the link does not work, you may access the poster by copying and pasting the following URL in a new browser window: eligibility to work with American Express in the U.S. is required as the company will not pursue visa sponsorship for these positions.
**Job:** Technologies
**Primary Location:** US-Florida-Sunrise
**Schedule** Full-time
**Req ID:** 25014483