617 Observability jobs in the United States
AWS Observability Engineer
Posted today
Job Viewed
Job Description
Summary:
We are seeking an experienced AWS Observability Engineer to lead the design, deployment, and maintenance of scalable AWS infrastructure using Infrastructure-as-Code tools like CloudFormation. In this role, you'll develop and manage comprehensive monitoring, alerting, and dashboarding solutions for microservices, applications, and infrastructure, while implementing centralized logging through CloudWatch and CloudTrail Logs.
Duties:
* AWS Observability Engineer will integrate observability into CI/CD pipelines to enable early detection and remediation, ensure high availability and disaster recovery of critical services, and automate cloud resource provisioning
* Collaborating closely with engineering teams, you will define SLOs/SLIs, and drive system reliability improvements based on telemetry data
* Participate in incident response and postmortem analysis to support long-term operational excellence* 3+ years of hands-on experience with AWS services (EC2, ECS/EKS, RDS, S3, CloudWatch, etc.)
* Experience with monitoring tools (Datadog, New Relic, CloudWatch)
* Experience with Infrastructure-as-Code (CloudFormation, Terraform)
* Experience with SAST, DAST, API Security, and WAF tools
* Strong understanding of networking, security best practices, and incident response
* Strong understanding of AWS architecture and standards
* Deep knowledge of observability principles: metrics, logs, traces, and events
* Familiarity with logging pipelines and aggregation tools (ELK, CloudWatch Logs Insights)
* Solid grasp of secure coding best practices
* Proficiency in scripting or coding (Python, Bash, or similar)
Preferred Qualifications:
* AWS Certifications (e.g., Solutions Architect, DevOps Engineer)
* Experience with OpenTelemetry or similar observability standards
* Experience integrating observability into CI/CD tools (GitHub Actions, Jenkins, Azure DevOps)
* Exposure to APM tools and synthetic monitoring
Observability Engineering Manager
Posted today
Job Viewed
Job Description
Join to apply for the Observability Engineering Manager role at Canonical
Join to apply for the Observability Engineering Manager role at Canonical
Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and industry leaders in many sectors. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in 75+ countries and very few office-based roles. Teams meet two to four times yearly in person, in interesting locations around the world, to align on strategy and execution.
The company is founder-led, profitable, and growing.
We are hiring an Observability Engineering Manager who will lead the development of the distributed tracing or service mesh products as part of our Observability group.
Engineering managers at Canonical are always coders who are able to review and lead both architecture and code. They should also be astute judges of character, and comfortable setting expectations and holding colleagues accountable to them.
We are building an observability stack that is easy to deploy and operate on Kubernetes. This is part of a broader initiative to deliver the world's best suite of open source tools, where we provide deep integration and automation for best-of-breed open source offerings that cover metrics, logging, telemetry, alerting, tracing and profiling. Our goal is to make observability tools integral and pervasive across software practices .
Our implementation of Kubernetes operators is opinionated, resilient, and scalable, providing deep insights out of the box. The user experience is polished and seamless for the end-users, and its administrators will enjoy smooth, lightweight Day 1 and Day 2 operations. We are excited to be improving the state of the art for open-source observability.
This is an exciting opportunity for a software engineer passionate about open source software, Linux, Kubernetes, and Observability. Build a rewarding, meaningful career working with the best and brightest people in technology at Canonical, the growing international software company behind Ubuntu.
As an Engineering Manager at Canonical, you must be technically strong, but your main responsibility is to run an effective team and develop the colleagues you manage. You will develop and review code as a leader, but know that the best way to improve the product is to ensure that the whole team is focused, productive and unblocked. You are expected to help them grow as engineers, do meaningful work, do it outstandingly well, find professional and personal satisfaction, and work well with colleagues and the community. You will also be expected to be a positive influence on culture, facilitate technical delivery, and regularly reflect with your team on strategy and execution. You will collaborate closely with other Engineering Managers, product managers, and architects, producing an engineering roadmap with ambitious and achievable goals.
We expect Engineering Managers to be fluent in the programming language, architecture, and components that their team uses. Code reviews and architectural leadership are part of the job. The commitment to healthy engineering practices, documentation, quality and performance optimisation is as important, as is the requirement for fair and clear management, and the obligation to ensure a high-performing team.
Location: This role can be home based in the EMEA or Americas regions.
The role entails
- Manage a distributed team of engineers and its observability portfolio
- Organize and lead the teams processes in order to help it achieve its objectives
- Conduct one-on-one meetings with team members
- Identify and measure team health indicators
- Interact with a vibrant community
- Review code produced by other engineers
- Attend conferences to represent Canonical and its Observability Stack
- An exceptional academic track record from both high school and university
- A proven track record of professional experience of software delivery
- Professional software development experience, preferably with a track record in open source
- Willingness to travel up to 4 times a year for internal events
- Professional written and spoken English
- Experience with Linux (Debian or Ubuntu preferred)
- Excellent interpersonal skills, curiosity, flexibility, and accountability
- Passion, thoughtfulness, and self-motivation
- Excellent communication and presentation skills
- Result-oriented, with a personal drive to meet commitments
- Experience as an engineering manager, with a track record of building great, high performance teams
- Professional Python development experience
- A working knowledge of Go
- Open source contribution experience
- Interest and experience with container technologies
- A proven understanding of the importance of observability and monitoring for keeping software running smoothly
- Experience designing and implementing observability solutions
We consider geographical location, experience, and performance in shaping compensation worldwide. We revisit compensation annually (and more often for graduates and associates) to ensure we recognize outstanding performance. In addition to base pay, we offer a performance-driven annual bonus or commission. We provide all team members with additional benefits which reflect our values and ideals. We balance our programs to meet local needs and ensure fairness globally.
- Distributed work environment with twice-yearly team sprints in person
- Personal learning and development budget of USD 2,000 per year
- Annual compensation review
- Recognition rewards
- Annual holiday leave
- Maternity and paternity leave
- Team Member Assistance Program & Wellness Platform
- Opportunity to travel to new locations to meet colleagues
- Priority Pass and travel upgrades for long-haul company events
Canonical is a pioneering tech firm at the forefront of the global move to open source. As the company that publishes Ubuntu, one of the most important open-source projects and the platform for AI, IoT, and the cloud, we are changing the world of software. We recruit on a global basis and set a very high standard for people joining the company. We expect excellence; in order to succeed, we need to be the best at what we do. Most colleagues at Canonical have worked from home since our inception in 2004. Working here is a step into the future and will challenge you to think differently, work smarter, learn new skills, and raise your game.
Canonical is an equal opportunity employer
We are proud to foster a workplace free from discrimination. Diversity of experience, perspectives, and background create a better work environment and better products. Whatever your identity, we will give your application fair consideration.
Seniority level
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Engineering and Information Technology
- Industries Software Development
Referrals increase your chances of interviewing at Canonical by 2x
Sign in to set job alerts for Engineering Manager roles.Chicago, IL $175,000.00-$00,000.00 4 weeks ago
Chicago, IL 165,000.00- 223,000.00 1 day ago
Rosemont, IL 165,000.00- 223,000.00 1 day ago
Chicago, IL 160,000.00- 200,000.00 1 week ago
Chicago, IL 135,000.00- 160,000.00 5 months ago
Chicago, IL 170,000.00- 190,000.00 1 day ago
Northbrook, IL 175,000.00- 220,000.00 2 weeks ago
Downers Grove, IL 141,436.00- 200,000.00 1 day ago
Remote Engineering Manager - 170- 190k (Wearable Med Device)Chicago, IL 170,000.00- 190,000.00 1 day ago
Remote Engineering Manager - 170- 210k (Payment Processing)Chicago, IL 170,000.00- 210,000.00 1 day ago
Engineering Manager - Solutions EngineeringGreater Chicago Area 182,500.00- 240,000.00 1 week ago
Chicago, IL 184,000.00- 311,000.00 1 day ago
Director, Site Reliability Engineering (Remote) Senior Manager, Solutions Engineering, Strategic CentralChicago, IL 151,900.00- 249,900.00 2 days ago
Software Engineering Manager - Sustaining Engineering Senior Engineering and Permitting Manager (Remote - Midwest Region) Software Engineering Manager (Backend SaaS)Were unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrObservability Platform Engineer
Posted 1 day ago
Job Viewed
Job Description
Job Description
We are in need of 3 Platform Engineers to build and operationalize observability capabilities across the SIEM ecosystem. These resources will lead efforts in designing integrated monitoring solutions for tools like Cribl, Vector, Splunk, Snowflake, ADX, and Log Analytics. Their work will ensure continuous visibility into system health, enabling proactive fault detection and performance management. These resources will leverage either or both Grafana and PowerBI for dashboarding.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy:
Skills and Requirements
• Lead the architecture and implementation of comprehensive observability solutions across the SIEM stack, including fault detection and system health metrics.
• Create advanced, scalable dashboards and alerting systems tailored for complex, multi-source telemetry data environments.
Apply deep expertise in time-series or SQL-based query languages to monitor ingestion rates, error rates, and latency across all technologies. null
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal employment opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment without regard to race, color, ethnicity, religion,sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military oruniformed service member status, or any other status or characteristic protected by applicable laws, regulations, andordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to
Software Engineer, Observability
Posted 2 days ago
Job Viewed
Job Description
Software Engineer, Observability
Seattle, Washington, United States
Software and Services
Summary
Posted: Aug 21, 2025
Role Number: 200600978-3337
The Apple Services Engineering (ASE) team is building the next generation of foundational tools that empower software developers at Apple to build products that our customers love. The Observability team within ASE is a fast moving, highly skilled team this is designing and building a suite of platforms and services that help Apple engineers observe and get insights into their systems.
If the thought of working with petabytes of data interests you, this is the place to be. Our systems must scale globally, stay highly available, and “just work”, while supporting some of the largest services in world. That’s a tall order, and we’re looking to add more talented and passionate engineers who love challenges. If you’d love to join this amazing team, we’d love to hear from you.
Description
Your responsibilities will include :
Requirement gathering across cross functional teams
-
Developing practical, fault-tolerant high-performance distributed systems.
-
Leading and participating in technical design discussions across cross functional teams. Gain in-depth understanding of the domain and come up with creative ideas in the domain.
-
Willingness to lead an independent research in the field of work.
-
Mentor other engineers in the team.
You will have the courage and experience to be frank and ambitious but humble enough to listen to others. We want your thoughts on how we can move faster, be more creative, and deliver tools and ideas to empower developers around the world. We expect you to challenge the status quo, to care about the details, the end user, and how it all comes together.
We are looking for enthusiastic developers to join as a member of this collaborative and friendly team. You should be someone with ideas and passion for software delivered as a service to maximize reuse, efficiency, and simplicity. Your work will impact millions of Apple users and is necessary to the success of some of the most visible current and future features.
Minimum Qualifications
-
BS or MS in CS or equivalent
-
5+ years of industry experience
-
Experience with Java
-
Experience with designing, implementing and supporting highly scalable infrastructure services
-
Deep understanding and work experience in distributed systems
-
Deep understanding of core CS concepts including data structures, algorithms and concurrent programming
-
Strong attention to detail and excellent analytical capabilities
-
Great communication skills
Preferred Qualifications
-
Experience with Observability solutions using OpenTelemetry, Prometheus, Grafana
-
Experience building Observability platforms is preferred
-
Experience designing and using columnar storage
-
Familiarity with time series database internals
-
Passion for developing and testing clear, robust code
-
Ability to learn and apply new technologies and frameworks.
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $139,500 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.Learn more about Apple Benefits. (
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.Learn more about your EEO rights as an applicant ( .
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.Learn more about your EEO rights as an applicant ( .
Apple will not discriminate or retaliate against applicants who inquire about, disclose, or discuss their compensation.
Apple participates in the E-Verify program in certain locations as required by law.Learn more about the E-Verify program ( .
Apple is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities. Reasonable Accommodation and Drug Free Workplace policy Learn more .
Apple is a drug-free workplace. Reasonable Accommodation and Drug Free Workplace policy Learn more .
Apple will consider for employment all qualified applicants with criminal histories in a manner consistent with applicable law. If you’re applying for a position in San Francisco, review the San Francisco Fair Chance Ordinance guidelines applicable in your area.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Engineer - Network Observability
Posted 6 days ago
Job Viewed
Job Description
Engineer - Network Observability page is loaded
Engineer - Network Observability Apply locations 7000 Target Pkwy N,NCD-0375 Brooklyn Park,MN 55445 time type Full time posted on Posted Yesterday job requisition id R000399266 The pay range is $73,200.00 - $131,700.00Pay is based on several factors which vary based on position.These include labor markets and in some instancesmay include education, work experience and certifications. In addition to your pay, Target cares about and invests in you as a team member, so that you can take care of yourself and your family. Target offers eligible team members and their dependents comprehensive health benefits and programs, which may include medical, vision, dental, life insurance and more, to help you and your family take care of your whole selves.Other benefits for eligible team members include 401(k), employee discount, short term disability, long term disability, paid sick leave, paid national holidays, and paid vacation.Find competitive benefits from financial and education to well-being and beyond at .
About us:
Working at Target means helping all families discover the joy of everyday life. We bring that vision to life through our values and culture. Learn more about Target here .
As an Engineer, you serve as a technical specialist delivering the engineering that powers the product. You develop keen insight into the technical architecture and design to deliver robust and scalable software components. You constantly demonstrate the depth of your expertise by solving engineering problems. You are passionate about the quality of software and balance between speed of delivering new features and robustness of the software components you implement. You can handle operational issues with little or no oversight. You actively review code to ensure the software quality and functional accuracy is maintained across the team. You are keen to learn the design and architecture of the product and participate in ceremonies that can influence both.
As a member of the Network Observability team, you will design and manage telemetry pipelines, build authoritative data systems, and develop data models and APIs that power analytics and automation. Your work ensures the team has real-time visibility into network performance through platforms like Grafana, ClickHouse, and DOMO/GreenField. You create reliable sources of truth for topology and configuration, seamlessly integrating with CMDBs and automation tools. By engineering robust ETL pipelines and enabling closed-loop automation, you empower the team to make data-driven decisions, respond faster to incidents, and plan effectively for future capacity needs.
Use your skills, experience and talents to be a part of groundbreaking thinking and visionary goals. As an engineer, youll take the lead as you
Use your technology acumen to provide input to assist with evaluation of new technologies and contribute to the design, lifecycle management, and total cost of ownership of services. Contribute to research and proof-of-concept initiatives for new technologies and assist with code review and design review, writes, organizes and maintains code based on designs. With guidance, delivers high-performance, scalable, repeatable, and secure deliverables. Participate in structured construction, automation, debugging, and implementation activities, ensuring architectural and operational requirements and best practices are met. Participate in disaster recovery planning and disaster recovery activities and participate in functional integration and regression testing and ability to automate test scripts. Resolve frequently encountered technical issues and monitors systems capacity with minimal assistance. Search and understand metadata about various data sources and metrics. Adhere to change and incident management standards and expectations.
Core responsibilities are described within this job description. Job duties may change at any time due to business needs.
About you:
4 year degree or equivalent experience
1+ years of software development experience
Demonstrates familiarity with current and emerging technologies in own scope of responsibility, and develops ability to apply these technologies
Understands concepts of package solutions and package specific programming language with knowledge of development objects
Demonstrates and continuously builds upon domain-specific knowledge
Demonstrates proficiency in at least one computer language
Understands the concepts of distributed programming and applies it to their domain
Knowledge of the different data structures in your chosen programming language and how to apply them.
Maintains technical knowledge within areas of expertise
Stays current with new and evolving technologies via formal training and self-directed education
Preferred Skillsets:
Demonstrates proficiency in building and maintaining observability systems
Understands network technology fundamentals
Understands hardware and operating systems
Benefits Eligibility
Please paste this url into your preferred browser to learn about benefits eligibility for this role:Americans with Disabilities Act (ADA)
In compliance with state and federal laws, Target will make reasonable accommodations for applicants with disabilities. If a reasonable accommodation is needed to participate in the job application or interview process, please reach out to#J-18808-Ljbffr
Principal Engineer - Observability
Posted 8 days ago
Job Viewed
Job Description
Job DescriptionJob Description
CoreWeave is the AI Hyperscaler™, delivering a cloud platform of cutting edge services powering the next wave of AI. Our technology provides enterprises and leading AI labs with the most performant, efficient and resilient solutions for accelerated computing. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe. CoreWeave was ranked as one of the TIME100 most influential companies of 2024.
As the leader in the industry, we thrive in an environment where adaptability and resilience are key. Our culture offers career-defining opportunities for those who excel amid change and challenge. If you're someone who thrives in a dynamic environment, enjoys solving complex problems, and is eager to make a significant impact, CoreWeave is the place for you. Join us, and be part of a team solving some of the most exciting challenges in the industry.
CoreWeave powers the creation and delivery of the intelligence that drives innovation.
About this Role:
We are seeking a highly experienced and strategic Principal Engineer to lead the architecture, development, and operations of our Observability product. In this role, you will help shape the vision and direction on how customers monitor, troubleshoot and run their AI workloads effectively, at scale. You will have direct access to customers and work closely with engineering stakeholders across multiple teams to advance the development of our unified Observability experience across CoreWeave products.
What You'll Do:
- Lead the strategy and implementation for Observability, ensuring alignment with business goals and performance objectives.
- Design and implement advanced solutions, including low-latency, high-scale Observability pipelines across all products.
- Build solutions that offer insights to customers for rapid troubleshooting of their AI workloads.
- Champion initiatives to improve the reliability, durability, and self-healing capabilities of Observability metrics, and assume operational responsibilities.
- Help shape customer experience by promoting unparalleled visibility into our systems' performance and reliability with customer facing metrics and dashboards.
- Analyze telemetry for production systems to identify opportunities for improvement in performance and reliability.
- Develop operational review practices for storage engineering to assess performance against targets and iterating on those targets.
- Act as a trusted advisor to senior leadership, providing insights on storage industry trends and advocating for investments in storage technologies.
- Collaborate with engineering, infrastructure, and product teams to ensure storage solutions align with evolving project requirements and technical architecture.
- Mentor and guide engineering teams on best practices in product engineering, fostering a customer-focused approach to systems design and technical excellence.
Who You Are:
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- 10+ years of experience in distributed systems, with a focus on reliability and scale.
- Proven experience leading storage product engineering projects and building products to address customer needs.
- Proficiency in one or more programming (e.g. Go, C, Rust).
- Good understanding of distributed observability systems such as ClickHouse for telemetry at scale.
- Strong understanding of cloud computing infrastructure using Kubernetes, scalable architectures, and automation.
- Excellent analytical and problem-solving skills, with the ability to synthesize problem statements from interactions with customers and design solutions given ambiguous requirements.
- Strong communication and interpersonal skills, able to convey storage engineering strategies and practices to technical and non-technical audiences.
- Prior experience with building Observability solutions is a plus.
The base salary range for this role is $206,000 to $303,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
What We Offer
The range we've posted represents the typical compensation range for this role. To determine actual compensation, we review the market rate for each candidate which can include a variety of factors. These include qualifications, experience, interview performance, and location.
In addition to a competitive salary, we offer a variety of benefits to support your needs, including:
- Medical, dental, and vision insurance - 100% paid for by CoreWeave
- Company-paid Life Insurance
- Voluntary supplemental life insurance
- Short and long-term insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Ability to Participate in Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family-Forming support provided by Carrot
- Paid Parental Leave
- Flexible, full-service childcare support with Kinside
- 401(k) with a generous employer match
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
Our Workplace
While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration
California Consumer Privacy Act - California applicants only
CoreWeave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to , , , , , , , , , veteran status, or genetic information.
As part of this commitment and consistent with the Americans with Disabilities Act (ADA) , CoreWeave will ensure that qualified applicants and candidates with disabilities are provided reasonable accommodations for the hiring process, unless such accommodation would cause an undue hardship. If reasonable accommodation is needed, please contact:
Export Control Compliance
This position requires access to export controlled information. To conform to U.S. Government export regulations applicable to that information, applicant must either be (A) a U.S. person, defined as a (i) U.S. or , (ii) U.S. lawful permanent (green card holder), (iii) refugee under 8 U.S.C. § 1157, or (iv) asylee under 8 U.S.C. § 1158, (B) eligible to access the export controlled information without a required export authorization, or (C) eligible and reasonably likely to obtain the required export authorization from the applicable U.S. government agency. CoreWeave may, for legitimate business reasons, decline to pursue any export licensing process.
Software Engineer, Observability
Posted 9 days ago
Job Viewed
Job Description
Nearly every company in the world runs on custom software: Gartner estimates that up to 50% of all code is written for internal use. This is the operational software for refunding orders, underwriting loans, onboarding employees, analyzing transactions, and providing customer support. But most companies don't have adequate resources to properly invest in these tools, leading to a lot of old and clunky internal software or, even worse, users still stuck in manual and spreadsheet flows.
At Retool, we're on a mission to bring good software to everyone. We're building a new type of development platform that combines the benefits of traditional software development with a drag-and-drop UI editor and AI, making it dramatically faster to build internal tools. We believe that the future of software development lies in abstracting away the tedious and repetitive tasks developers waste time on, while creating reusable components that act as a force multiplier for future developers and projects. The result is not just productivity, but good software by default. And that's a mission worth striving for.
Today, our customers span from small startups building their first operational tools to Fortune 500 companies building mission-critical apps for thousands of users across their business. Interested in joining us? Let us know!
WHAT YOU'LL DO:
In this role, you will build, integrate, and evangelize observability platforms and solutions for our products and internal systems. You will drive adoption of these solutions and ensure they drive value for the company.
Your core responsibility in this role is to build and deploy observability solutions that make our products highly available, scalable, reliable, observable and delight our customers.
IN THIS ROLE, YOU WILL:
- Help build a great product that improves productivity of engineers across the globe by several orders of magnitude
- Design and build observability solutions for collection, delivery, analysis, and visualization of metrics, logs, and traces
- Work with engineers, designers, product managers and customer support to instrument and implement observability into our products and internal apps
- Build orchestration and automation tooling around off-the-shelf solutions (e.g. Datadog, Grafana), as well as build custom solutions that meet our unique needs
- Be involved in the development of scalable, distributed software systems that support globally distributed customer base
- Coach and mentor other SWE; Provide leadership in iteratively defining & refining development processes as the team grows
- 3+ years of related professional experience, 2+ years working on a mission critical platform with high-availability requirements
- Experience with containerization (e.g. Docker, Kubernetes), infrastructure as code (e.g. Terraform) and observability (e.g. Datadog, Stackdriver, Wavefront, Grafana) stacks
- A strong understanding of system availability, resiliency, and recoverability
- Strong organizational skills with high attention-to-detail and able to work independently with minimal supervision
- Ability to thrive in a high-energy, high-growth, fast-paced, entrepreneurial environment. Willing to learn new skills and implement new technologies
- Familiarity with TypeScript and Node.js backend development
- Familiarity with React frontend web development
- Experience with observability platforms and tools like Datadog, Grafana, etc.
For candidates based in San Francisco, the pay range(s) for this role is listed below and represents base salary range for non-commissionable roles or on-target earnings (OTE) for commissionable roles. This salary range may be inclusive of several career levels at Retool and will be narrowed during the interview process based on a number of factors such as (but not limited to), scope and responsibilities, the candidate's experience and qualifications, and location.
Additional compensation in the form(s) of equity, and/or commission/bonuses are dependent on the position offered. Retool provides a comprehensive benefit plan, including medical, dental, vision, and 401(k). Pay and benefits are subject to change at any time, consistent with the terms of any applicable compensation or benefit plans.
San Francisco
$164,600-$222,600 USD
Retool offers generous benefits to all employees and hybrid work location. For more information, please visit the benefits and perks section of our careers page!
Retool is currently set up to employ all roles in the US and specific roles in the UK. To find roles that can be employed in the UK, please refer to our careers page and review the indicated locations.
Be The First To Know
About the latest Observability Jobs in United States !
Engineer - Network Observability
Posted 11 days ago
Job Viewed
Job Description
The pay range is $73,200.00 - $131,700.00
Pay is based on several factors which vary based on position. These include labor markets and in some instances may include education, work experience and certifications. In addition to your pay, Target cares about and invests in you as a team member, so that you can take care of yourself and your family. Target offers eligible team members and their dependents comprehensive health benefits and programs, which may include medical, vision, dental, life insurance and more, to help you and your family take care of your whole selves. Other benefits for eligible team members include 401(k), employee discount, short term disability, long term disability, paid sick leave, paid national holidays, and paid vacation. Find competitive benefits from financial and education to well-being and beyond at .
About us:
Working at Target means helping all families discover the joy of everyday life. We bring that vision to life through our values and culture. Learn more about Target here ( .
As an Engineer, you serve as a technical specialist delivering the engineering that powers the product. You develop keen insight into the technical architecture and design to deliver robust and scalable software components. You constantly demonstrate the depth of your expertise by solving engineering problems. You are passionate about the quality of software and balance between speed of delivering new features and robustness of the software components you implement. You can handle operational issues with little or no oversight. You actively review code to ensure the software quality and functional accuracy is maintained across the team. You are keen to learn the design and architecture of the product and participate in ceremonies that can influence both.
As a member of the Network Observability team, you will design and manage telemetry pipelines, build authoritative data systems, and develop data models and APIs that power analytics and automation. Your work ensures the team has real-time visibility into network performance through platforms like Grafana, ClickHouse, and DOMO/GreenField. You create reliable sources of truth for topology and configuration, seamlessly integrating with CMDBs and automation tools. By engineering robust ETL pipelines and enabling closed-loop automation, you empower the team to make data-driven decisions, respond faster to incidents, and plan effectively for future capacity needs.
Use your skills, experience and talents to be a part of groundbreaking thinking and visionary goals. As an engineer, you’ll take the lead as you…
Use your technology acumen to provide input to assist with evaluation of new technologies and contribute to the design, lifecycle management, and total cost of ownership of services. Contribute to research and proof-of-concept initiatives for new technologies and assist with code review and design review, writes, organizes and maintains code based on designs. With guidance, delivers high-performance, scalable, repeatable, and secure deliverables. Participate in structured construction, automation, debugging, and implementation activities, ensuring architectural and operational requirements and best practices are met. Participate in disaster recovery planning and disaster recovery activities and participate in functional integration and regression testing and ability to automate test scripts. Resolve frequently encountered technical issues and monitors systems capacity with minimal assistance. Search and understand metadata about various data sources and metrics. Adhere to change and incident management standards and expectations.
Core responsibilities are described within this job description. Job duties may change at any time due to business needs.
About you:
• 4 year degree or equivalent experience
• 1+ years of software development experience
• Demonstrates familiarity with current and emerging technologies in own scope of responsibility, and develops ability to apply these technologies
• Understands concepts of package solutions and package specific programming language with knowledge of development objects
• Demonstrates and continuously builds upon domain-specific knowledge
• Demonstrates proficiency in at least one computer language
• Understands the concepts of distributed programming and applies it to their domain
• Knowledge of the different data structures in your chosen programming language and how to apply them.
• Maintains technical knowledge within areas of expertise
• Stays current with new and evolving technologies via formal training and self-directed education
Preferred Skillsets:
• Demonstrates proficiency in building and maintaining observability systems
• Understands network technology fundamentals
• Understands hardware and operating systems
This position will operate as a Hybrid/Flex for Your Day work arrangement based on Target’s needs. A Hybrid/Flex for Your Day work arrangement means the team member’s core role will need to be performed both onsite at the Target HQ MN location the role is assigned to and virtually, depending upon what your role, team and tasks require for that day. Work duties cannot be performed outside of the country of the primary work location, unless otherwise prescribed by Target. Click here if you are curious to learn more about Minnesota.
Benefits Eligibility
Please paste this url into your preferred browser to learn about benefits eligibility for this role:
Americans with Disabilities Act (ADA)
In compliance with state and federal laws, Target will make reasonable accommodations for applicants with disabilities. If a reasonable accommodation is needed to participate in the job application or interview process, please reach out to
Software Engineer, Observability
Posted 16 days ago
Job Viewed
Job Description
Join the engineering teams that bring OpenAI's ideas safely to the world!
The Applied Engineering team works across research, engineering, product, and design to bring OpenAI's technology to consumers and businesses. We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.
About the Role
We're building the observability product for OpenAI-from scalable infrastructure to a rich, AI-powered UI. Our systems ingest over petabytes of logs and billions of time series metrics across our fleet. We're now layering intelligence on top-think agents that summarize SEVs, auto-generate dashboards, or help engineers debug through notebook-like UIs.
We're hiring software engineers across the stack-infra, backend, and product. You'll join a small, gritty team building both foundational infra and novel internal tools to make OpenAI's production systems reliable, performant, and observable.
What You'll Do
Own core observability infrastructure, including distributed logging, time series, and trace storage
Build AI-native tools that help engineers detect, understand, and resolve issues autonomously.
Contribute to UI experiences like dashboards, notebooking, or interactive debugging
Collaborate closely with engineers, researchers, user ops, and other teams across the company to build the next generation observability product
You Might Be a Fit If You:
Have operated large-scale distributed systems in production. ( especially logging systems or some other time series databases)
Thrive in ambiguous environments and roll up your sleeves to solve unscoped problems.
Have full-stack chops or product sensibilities-you're excited to build real tools people use.
Have strong fundamentals in systems, networking, and cloud infra (Kubernetes, AWS, etc).
Bonus: built or contributed to observability systems (e.g. Prometheus, OpenTelemetry, etc).
Why This Team
We're both an infra and product team-building a real AI application for internal use.
Your work will directly power the reliability of GPT-based products at massive scale.
You'll help define what "AI-powered observability" looks like at one of the world's most advanced AI labs.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
For additional information, please see OpenAI's Affirmative Action and Equal Employment Opportunity Policy Statement.
Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.
To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
Compensation Range: $255K - $405K
Software Engineer - Observability
Posted 16 days ago
Job Viewed
Job Description
xAI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company's mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
About the Team
The Observability team builds and operates the core infrastructure that enables engineers to monitor, debug, and optimize the performance and reliability of their systems. We handle telemetry at massive scale - billions of time series and petabytes of logs - with strict performance and availability requirements.
About the Role
You will be part of the small, high-impact team responsible for building and maintaining X's observability platform. You'll own critical systems that power metrics, logs, tracing, and alerting enabling engineering teams to operate services at scale, identify issues before they impact users, and drive systemic reliability improvements.
What You'll Do
- Design and implement scalable observability infrastructure for metrics, logging, and tracing.
- Build high-performance telemetry pipelines that handle massive ingestion volumes.
- Develop APIs, query engines, and UIs that allow engineers to get real-time insights into their services.
- Define and enforce best practices for instrumentation, alerting, and reliability across the company.
- Partner with infrastructure and product teams to deeply integrate observability into our internal platforms.
- Own the reliability, scalability, and performance of the observability stack end-to-end.
- Production-level proficiency in Go, Rust, Scala, or a similar languages
- Deep understanding of distributed systems and telemetry architecture.
- Experience building and operating infrastructure at scale.
- Familiarity with observability stacks such as Prometheus, Grafana, OpenTelemetry, VictoriaMetrics, or ClickHouse.
- Experience with Kafka, Redis, or large-scale time series databases.
- Experience operating observability pipelines in Kubernetes or similar orchestration environments.
- We hire engineers in Palo Alto, and San Francisco. Our team usually works from the office 5 days a week but allow work-from-home days when required. Candidates who join in San Francisco must make it to Palo Alto at least twice a week.
Interview Process
After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview ("phone interview") during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of 2 technical interviews and 1 project deep-dive interview:
- Practical coding assessment in a language of your choice.
- Systems design hands-on: Demonstrate practical skills in a live problem-solving session.
- Project deep-dive: Present and answer questions about exceptional work that you've done.
- Meet and greet with the wider team.
Our goal is to finish the main process within one week. Final interviews will be conducted in person.
Annual Salary Range
$180,000 - $440,000 USD
Benefits
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer.
California Consumer Privacy Act (CCPA) Notice