48 High Performance Computing jobs in the United States
High-Performance Computing Infrastructure Engineer
Posted 3 days ago
Job Viewed
Job Description
As a High-Performance Computing (HPC) Infrastructure Engineer, you will be instrumental in architecting and managing our on-premises and hybrid compute clusters. You will collaborate with research teams and application owners, transforming their needs into automated and repeatable deployment solutions. Your expertise will enable thousands of cores, hundreds of GPUs, and petabytes of storage to provide consistent and high-throughput support for CAE simulations and AI/ML training.
Design, develop, and maintain Ansible playbooks, roles, and collections to provision and configure compute nodes, storage systems, and network services.
Automate OS and middleware upgrades, implement security patching, and manage routine maintenance tasks across multiple HPC clusters.
Utilize the Ansible Automation Platform (AAP) to orchestrate workflows, efficiently manage inventories, and streamline job templates.
Collaborate with Site Reliability Engineering (SRE) and DevOps teams to integrate comprehensive HPC infrastructure monitoring (e.g., Prometheus, Grafana) and alerting workflows.
Work closely with application owners to enhance cluster performance for CAE, data analytics, and AI/ML workloads.
Troubleshoot complex hardware and software issues, including network fabric anomalies and scheduler misconfigurations, to facilitate detailed root-cause analysis.
Participate in capacity planning, performance benchmarking, and scalability studies to inform infrastructure investment decisions.
Document system designs, develop runbooks, and create standard operating procedures while mentoring junior engineers on best practices for automation.
Support research and engineering teams during crucial project phases, including participating in on-call rotations for after-hours issue resolution.
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
3+ years of hands-on experience in designing and managing large-scale HPC or Linux infrastructures.
Expert-level proficiency in Ansible, including playbook authoring, role development, Galaxy publishing, and debugging techniques.
Familiarity with Red Hat Ansible Automation Platform (AAP) for enterprise-level automation and orchestration.
Strong Linux administration skills, particularly with RHEL/CentOS, emphasizing kernel tuning, storage configuration, and network management.
Understanding of HPC workload schedulers such as Slurm or PBS.
Experience with performance monitoring tools (Prometheus, Nagios) and metrics visualization platforms (Grafana).
Solid scripting capabilities in Python, Bash, or similar languages to enhance automation and integrate APIs effectively.
Demonstrated troubleshooting abilities across diverse hardware components (InfiniBand, Ethernet) and software stacks.
Excellent communication and collaboration skills, with the ability to explain technical concepts clearly to cross-functional stakeholders.
Ability to work independently, prioritize tasks effectively in a fast-paced environment, and mentor less experienced team members.
If you feel you bring valuable skills to the table, we encourage you to apply. At our company, we offer a range of benefits and opportunities for career development, allowing you to choose the path that suits you best, whether that involves deepening your expertise or exploring new avenues. You can look forward to comprehensive medical, dental, and prescription coverage, as well as family care programs, employee resource groups, paid community service time, and much more. Join us in shaping the future!
This position falls under salary grade 7 .
This position is classified within salary grades 7 .
Visa sponsorship is not available for this role.
Candidates must be legally authorized to work in the United States, and verification will be required at the time of hire.
We are proud to offer equal opportunity in our hiring practices, ensuring a diverse workforce. All qualified applicants will receive consideration for employment regardless of race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status, or protected veteran status.
Requisition ID: 48969
High Performance Computing Systems Administrator
Posted 4 days ago
Job Viewed
Job Description
Unlock the power of supercomputing: Join BAE Systems-one of the leading service providers of HPCs. Contribute to one of our longest running programs where we orchestrate the support and sustainment of some of the world's largest and most advanced supercomputers. We are more than just gatekeepers, we are the vanguard of support for the modern warfighter. Our dynamic and ever evolving teams are searching for exceptional individuals who yearn for a flexible work-life environment, thrive on pushing the limits of computational possibilities, and are driven by a sense of duty to empower those who serve our nation.
We are seeking HPC Systems Administrators to:
- Support a large client-server based IT enterprise in installation, configuration, and networking of Linux and Windows based platforms.
- Oversee servers running Red Hat, CentOS, SolarisX64, SuSe, Virtual Appliances, with direct attached and FC SAN storage.
- Troubleshoot PXE and DHCP boot process.
- Perform Microsoft Exchange policy management, mailbox management, enterprise-level multi-site design and architecture.
- Support Tier 3 Infrastructure services such as: DNS, NIS, Active Directory, Centrify, Zenoss, SiteScope, HP Openview, HP Oneview, HPSA, HPSE, SPLUNK, Sendmail, Exchange, NetQoS, Infoblox, Nagios, HP Cluster Management.
- Install, configure, tune and troubleshoot multi-vendor servers running numerous COTS, open source and in-house applications.
Required Education, Experience, & Skills:
- TS/SCI clearance with appropriate poly.
- Ability to work onsite 100% of the time.
- Candidates shall have a Bachelors Degree in Computer Science or related field and have eight (8) years of demonstrable experience in system administration and support of a large client-server based IT enterprise.
- Or the individual shall have five (5) years of full time computer science work that can be substituted for the Bachelors Degree, and have eight (8) years of demonstrable experience in system administration and support of a large client-server based IT enterprise.
- An industry recognized professional certification may substitute as one year experience. IAT Level II Certification Required.
- Experience with the following:
- Servers running Red Hat, CentOS, SolarisX64, SuSe, Virtual Appliances, with direct attached and FC SAN storage.
- Large SPU, memory SMP systems and clusters with many cores.
- Enterprise client server configurations (Group Policy, Centrify, DNS, LDAP).
- Linux Scripting (Bash, Perl etc).
- Linux Logical Volume Management (LVM).
- Distributed Filesystems.
- Configuration Management tools.
- Experience troubleshooting PXE and DHCP boot process.
- Multi-vendor filesystems such as XFS, GPFS, and CXFS, EXT4, CIFS, NFSClustered Blade systems and associated interconnects (iSCSI, SAD, FC, TCP/IP, etc.).
- Advanced Red Hat/CentOS and Microsoft Windows Operating System (including Group Policy management).
- Microsoft Exchange policy management, mailbox management, enterprise-level multi-site design and architecture.
- Enterprise Active Directory/Centrify Zone management (creation, update, RBAC).
- Tier 3 Infrastructure support for services such as DNS, NIS, Active Directory, Centrify, Zenoss, SiteScope, HP Openview, HP Oneview, HPSA, HPSE, SPLUNK, Sendmail, Exchange, NetQoS, Infoblox, Nagios, HP Cluster Management.
- Thin client solutions based on Virtual bridges and Centrix, VPN.
- NoMachine NX Virtual Desktop management.
- Experience with Enterprise Virtualization products; VMWare, vSphere, vCenter, vRealize Automation, vSAN.
- vSwitch, Distributed Switch, Console management.
- OpenShift, Docker, Kubernetes.
- Accepted professional certifications include a valid RHCSA or higher Red Hat certification, MCSE.
- Installation, configuration, tuning, troubleshooting and administration of:
- Multi-vendor servers running numerous COTS, opensource, and in-house applications to accommodate HPC Division IT support requirements.
- Multi-vendor servers running Red Hat of SuSe with direct attached, FC SAN storage or SSDs.
- Distributing computing tools such as ReS, LSF, and SLURM.
- HPC farm systems, HPC MPP clustered systems, Front End servers of Special Purpose devices (SPDs).
- IBM or HP Blade servers with FC/SAS/Network back end.
- Multi-vendor filesystems such as XFS, GPFS and Lustre.
- Pre-Factory testing, Factory testing, System integration and Acceptance testing during the purchase process of the HPS systems MDOPS.
Pay Information:
Full-Time Salary Range: $133802 - $234153. Please note: This range is based on our market pay structures. However, individual salaries are determined by a variety of factors including, but not limited to: business considerations, local market conditions, and internal equity, as well as candidate qualifications, such as skills, education, and experience.
Employee Benefits: At BAE Systems, we support our employees in all aspects of their life, including their health and financial well-being. Regular employees scheduled to work 20 hours per week are offered: health, dental, and vision insurance; health savings accounts; a 401(k) savings plan; disability coverage; and life and accident insurance. We also have an employee assistance program, a legal plan, and other perks including discounts on things like home, auto, and pet insurance. Our leave programs include paid time off, paid holidays, as well as other types of leave, including paid parental, military, bereavement, and any applicable federal and state sick leave. Employees may participate in the company recognition program to receive monetary or non-monetary recognition awards. Other incentives may be available based on position level and/or job specifics.
EEO Career Site Equal Opportunity Employer. Minorities . females . veterans . individuals with disabilities . sexual orientation . gender identity . gender expression.
High Performance Computing (HPC) Engineer
Posted 10 days ago
Job Viewed
Job Description
Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.
We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.
Job Description
- GPU Cluster Management: Design, deploy, and maintain high-performance GPU clusters, ensuring their stability, reliability, and scalability. Monitor and manage cluster resources to maximize utilization and efficiency.
- Distributed/Parallel Training: Implement distributed computing techniques to enable parallel training of large deep learning models across multiple GPUs and nodes. Optimize data distribution and synchronization to achieve faster convergence and reduced training times.
- Performance Optimization: Fine-tune GPU clusters and deep learning frameworks to achieve optimal performance for specific workloads. Identify and resolve performance bottlenecks through profiling and system analysis.
- Deep Learning Framework Integration: Collaborate with data scientists and machine learning engineers to integrate distributed training capabilities into GenBio AI's model development and deployment frameworks.
- Scalability and Resource Management: Ensure that the GPU clusters can scale effectively to handle increasing computational demands. Develop resource management strategies to prioritize and allocate computing resources based on project requirements.
- Troubleshooting and Support: Troubleshoot and resolve issues related to GPU clusters, distributed training, and performance anomalies. Provide technical support to users and resolve technical challenges efficiently.
- Documentation: Create and maintain documentation related to GPU cluster configuration, distributed training workflows, and best practices to ensure knowledge sharing and seamless onboarding of new team members.
- Master's or Ph.D. degree in computer science, or a related field with a focus on High-Performance Computing, Distributed Systems, or Deep Learning.
- 2+ years proven experience in managing GPU clusters, including installation, configuration, and optimization.
- Strong expertise in distributed deep learning and parallel training techniques.
- Proficiency in popular deep learning frameworks like PyTorch, Megatron-LM, DeepSpeed, etc.
- Programming skills in Python and experience with GPU-accelerated libraries (e.g., CUDA, cuDNN).
- Knowledge of performance profiling and optimization tools for HPC and deep learning.
- Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes)
- Strong background in distributed systems, cloud computing (AWS, GCP), and containerization (Docker, Kubernetes)
Join us as we embark on this journey to redefine the future of biology and medicine.
We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
High-Performance Computing (HPC) Engineer
Posted 13 days ago
Job Viewed
Job Description
High-Performance Computing (HPC) Engineer
Austin, Texas, United States
Hardware
Summary
Posted: Jul 24, 2025
Weekly Hours: 40
Role Number: 200613813-0157
As a High-Performance Computing (HPC) engineer on Apple’s Hardware Methodologies, Tools, & Solutions (HMTS) Platform team, you will serve as a vital connector between HPC infrastructure, Application development, operations, and Engineers. Your contributions will be key to maintaining the exceptional design environment for hardware engineering, supporting Apple’s commitment to leading innovation in hardware.
Description
In this role, you will be responsible for supporting, testing, and deploying HPC infrastructure products at our operations' core. You will help plan, code, build, test, deploy, operate, and monitor our Infrastructure-as-Code solutions for HPC server infrastructure.
Your responsibilities will include: Demonstrating strong troubleshooting skills by independently identifying and resolving issues. Monitor system performance and availability, and remediate issues as necessary. Develop automation for common development and operational tasks. Maintaining clear, current documentation of system configurations, including creating detailed justifications, training materials for complex topics, status reports, and procedural guides. Collaborate with Application, infrastructure, network, and storage engineering teams to find balanced solutions to engineering problems. Assessing future capacity requirements and evaluating new product features or enhancements.
Minimum Qualifications
-
A Bachelor’s degree in Computer Science with at least 5 years of relevant experience or equivalent professional background.
-
Proven experience in an HPC support role in an enterprise environment with 500+ node clusters.
-
Experience deploying and managing schedulers such as SLURM, LSF, and/or NC.
-
Deploying and configuring FEA Solvers to run on HPC
-
Experience with NVIDIA GPU compute.
-
Strong Linux administration skills.
-
Experience with InfiniBand—including IBoIP and RDMA
Preferred Qualifications
-
Experience with multiple flavors of MPI
-
Experience with machine learning and deep learning concepts, algorithms, and models.
-
Background in Software Defined Networking and AI/HPC cluster networking.
-
Familiarity with deep learning frameworks such as PyTorch and TensorFlow.
-
Experience with automation and configuration management tools like Ansible, Cobbler & Puppet.
-
Experience developing and securing containerized applications and HPC environments beneficial (e.g., Apptainer).
-
Experience with virtualization technologies is beneficial
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.Learn more about your EEO rights as an applicant ( .
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.Learn more about your EEO rights as an applicant ( .
Apple will not discriminate or retaliate against applicants who inquire about, disclose, or discuss their compensation.
Apple participates in the E-Verify program in certain locations as required by law.Learn more about the E-Verify program ( .
Apple is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities. Reasonable Accommodation and Drug Free Workplace policy Learn more .
Apple is a drug-free workplace. Reasonable Accommodation and Drug Free Workplace policy Learn more .
Apple will consider for employment all qualified applicants with criminal histories in a manner consistent with applicable law. If you’re applying for a position in San Francisco, review the San Francisco Fair Chance Ordinance guidelines applicable in your area.
It is unlawful in Massachusetts to require or administer a lie detector test as a condition of employment or continued employment. An employer who violates this law shall be subject to criminal penalties and civil liability.
Software Engineer, High Performance Computing
Posted 14 days ago
Job Viewed
Job Description
Software Engineer, High Performance Computing
Hawthorne, CA
Apply
SpaceX was founded under the belief that a future where humanity is out exploring the stars is fundamentally more exciting than one where we are not. Today SpaceX is actively developing the technologies to make this possible, with the ultimate goal of enabling human life on Mars.
SOFTWARE ENGINEER, HIGH PERFORMANCE COMPUTING
Starshield leverages SpaceX’s Starlink technology and launch capability to support national security efforts. While Starlink is designed for consumer and commercial use, Starshield is designed for government use, with an initial focus on earth observation, communications, and hosted payloads.
The Starshield software team is building highly reliable in-space mesh networks, designing secure systems to guarantee access to space, designing next-gen communication and sensing software, and more. Aerospace experience is not required to be successful here - we want our engineers to bring fresh ideas from all areas. We look for engineers who love solving problems and seek to make an impact on an inspiring mission. As we expand this team, we're looking for versatile, motivated, and collaborative engineers with hands-on experience developing C++ software for real world systems.
Our team is involved in designing the vehicle systems at every phase of development. We build tools that enable us to work more efficiently, and that help us build software systems that are secure, reliable, and autonomous. Our software engineers are responsible for the life cycle of the software they create, including development, testing, and operational support.
RESPONSIBILITIES:
-
Create highly reliable software systems that control hundreds of satellites in low earth orbit
-
Leverage software design to improve satellite constellation performance, security, and availability to meet the needs of a wide range of users
-
See your software through from start to finish: from figuring out the core needs to prototyping, developing, and testing; to on-orbit rollout and beyond
-
Work with interdisciplinary teams to brainstorm, design, and build the next generation of satellite capabilities, from cutting-edge sensors and inter-satellite lasers to space-based cloud compute
There are several roles within the Starshield software team with different focus areas. Applicants will interview for specific focus areas based on hiring needs and qualifications. Specific role responsibilities may include:
-
Write high quality Linux-based C++ software for common processors and micro controllers (e.g. ARM, PowerPC, x86, etc.)
-
Implement networking technologies to direct data across a variety of satellites, ground operations centers, and users
-
Build automated ground-based software systems that integrate smart data processing with command and control of the satellites
-
Develop models and simulations for flight-like vehicle software testing, network performance analysis, or research & development projects
-
Develop tools that allow for test execution across multiple environments: virtualized hardware, real hardware-in-the-loop, and even vehicle-in-the-loop testing
-
Invent new systems that enable more frequent and reliable software deployment, test execution, and data analysis as part of a continuous integration and release system
BASIC QUALIFICATIONS:
-
Bachelor's degree in computer science, engineering, math, or engineering discipline; OR 2+ years of professional experience in software development in lieu of a degree
-
Development experience in C, C++, or Python or full stack software development experience
PREFERRED SKILLS AND EXPERIENCE:
-
Experience in C++ for high performance systems
-
Developed and deployed software that has been used real-world applications and projects
-
Solid fundamental knowledge of computer architecture and networks
-
Strong skills in debugging, performance optimization and unit testing
-
Ability to work effectively and creatively in a dynamic environment with changing needs and requirements
-
Ability to work independently and in a team, take initiative, and communicate effectively
-
Ability to obtain and maintain a Top Secret or Top Secret SCI clearance
Some preferred skills and experience depend on the specific team within flight software and may include:
-
Experience with networking protocols (TCP, UDP, etc)
-
Experience developing in the Linux kernel
-
Experience with image data processing and machine learning
-
Strong background in math and physics
ADDITIONAL REQUIREMENTS:
-
Note that an active clearance may provide the opportunity for you to work on sensitive SpaceX missions; if so, you will be subject to pre-employment drug and random drug and alcohol testing
-
Must be willing to work extended hours and weekends as needed
COMPENSATION AND BENEFITS:
Pay Range:
Software Engineer/Level I: $120,000.00 - $45,000.00/per year
Software Engineer/Level II: 140,000.00 - 170,000.00/per year
Your actual level and base salary will be determined on a case-by-case basis and may vary based on the following considerations: job-related knowledge and skills, education, and experience.
Base salary is just one part of your total rewards package at SpaceX. You may also be eligible for long-term incentives, in the form of company stock, stock options, or long-term cash awards, as well as potential discretionary bonuses and the ability to purchase additional stock at a discount through an Employee Stock Purchase Plan. You will also receive access to comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, paid parental leave, and various other discounts and perks. You may also accrue 3 weeks of paid vacation & will be eligible for 10 or more paid holidays per year. Exempt employees are eligible for 5 days of sick leave per year.
ITAR REQUIREMENTS:
- To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITARhere ( .
SpaceX is an Equal Opportunity Employer; employment with SpaceX is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.
Applicants wishing to view a copy of SpaceX’s Affirmative Action Plan for veterans and individuals with disabilities, or applicants requiring reasonable accommodation to the application/interview process should reach out to
High Performance Computing System Administrator
Posted 24 days ago
Job Viewed
Job Description
Overview:
Experience administering multi-user systems required.
Experience in any of the following technologies is preferred:
- RedHat/Rocky/Alma Linux
- Slurm
- GPFS
- NFSv4
- Infiniband/RDMA
- Parallel computing
- Building and managing complex Python and/or R environments
- Scripting in Bash, Python and/or Perl"
Job Description Summary:
Manages high-performance computing systems, ensuring optimal performance, troubleshooting issues, implementing upgrades, and maintaining security protocols.Job Description:
Essential Functions:
- Collaborates with other IT professionals to integrate new systems with existing infrastructure.
- Conducts research and evaluates HPC relevant emerging technologies to assess their potential utility to the organization and make appropriate recommendation.
- Documents system architecture, processes, and SOPs for future reference, training and compliance purposes.
- Provides mentoring and support to less experienced users in order to understand their needs and provide technical support.
- Manages and maintains high-performance computing systems. Monitors system performance and troubleshoots issues as they arise. Implements security measures to protect the HPC system from cyber threats. Follow SOPs to adhere to compliance standards.
- Installs and configures software and hardware for optimal performance.
- Routinely gather customer needs and feedback to support use cases and maximize the value of the institutional HPC environment.
Education Requirement:
- Bachelor's Degree in Computer Science or relevant field, or equivalent experience, required.
- Master's Degree, preferred.
Skills:
- Ability to participate in an effective mentoring relationship utilizing the appropriate methods.
- Exceptional interpersonal, presentation, customer service, and communication skills required to interact effectively with all hospital staff, medical staff, and external contacts.
- Knowledge of approaches, tools, techniques for recognizing, anticipating, and resolving organizational, operational or process problems.
- Ability to complete tasks with high levels of precision and identify, collect, and analyze data.
- Knowledge of effective project management strategies and tactics, with the ability to plan, organize, monitor, and control projects.
- Knowledge of technical troubleshooting approaches, tools, and techniques and the ability to anticipate, recognize, and resolve technical issues. Working knowledge of relevant programming languages and environments.
- Ability to write technical documents such as manuals, reports, guidelines or documents on standards, processes, and applications.
Experience:
- Two years of experience in system administration, required.
- High-performance computing experience or equivalent, preferred.
- Experience in hospital information systems, preferred.
Physical Requirements:
OCCASIONALLY: Bend/twist, Climb stairs/ladder, Flexing/extending of neck, Lifting / Carrying: 0-10 lbs, Pushing / Pulling: 0-25 lbs, Reaching above shoulder, Squat/kneel, Standing, Walking
FREQUENTLY: (none specified)
CONTINUOUSLY: Audible speech, Color vision, Computer skills, Decision Making, Depth perception, Hand use: grasping, gripping, turning, Hearing acuity, Interpreting Data, Peripheral vision, Problem solving, Repetitive hand/arm use, Seeing - Far/near, Sitting
"The above list of duties is intended to describe the general nature and level of work performed by individuals assigned to this classification. It is not to be construed as an exhaustive list of duties performed by the individuals so classified, nor is it intended to limit or modify the right of any supervisor to assign, direct, and control the work of employees under their supervision. EOE M/F/Disability/Vet"
Senior High Performance Computing Engineer

Posted today
Job Viewed
Job Description
Job ID
6383
Location
SLAC - Menlo Park, CA
Full-Time
Regular
**SLAC Job Postings**
**About SLAC:**
The SLAC National Accelerator Laboratory, operated by Stanford University, is a premier national laboratory at the forefront of advancing the frontiers of scientific research and innovation.
SLAC is home to groundbreaking facilities such as the Linac Coherent Light Source - which generates incredibly brief bursts of X-rays to capture stunning movies of atomic and molecular processes in real time with rates of up to 1 million pulses per second. This capability generates an astounding amount of data, with expected data rates reaching up to 1 petabyte per week during full operations. This immense volume of data is essential for researchers to understand dynamic processes in areas such as materials science, chemistry, and biology.
Another prominent endeavor at SLAC is the Rubin Observatory, which is set to conduct the Legacy Survey of Space and Time (LSST) - an unprecedented project to map the entire southern night sky every few days for over a decade. As the data facility for all of the Rubin data, the Rubin Observatory is projected to generate about half an exabyte of data, providing unprecedented insights into dark matter and dark energy, along with discovering and cataloging and transient astronomical events like supernovae and near-Earth asteroids.
SLAC¿s commitment to exploring fundamental questions about our universe is embodied in its collaborative and multidisciplinary research culture, enabling scientists and engineers to delve into the interactions of light, matter, and the foundational principles governing the world we live in. Our lab equips its teams with innovative technologies and unparalleled expertise, fostering a dynamic environment where scientific inquiry can thrive.
**Given the nature of this position, SLAC is open to on-site and hybrid work options.**
**Position Overview:**
As a Senior High Performance Computing Engineer in the Scientific Computing Services Division of the Technology and Innovation Directorate (TID) at SLAC, you will play a critical role in managing and optimizing our High Performance Computing (HPC) environment in support of these groundbreaking scientific projects. You will be responsible for the advanced administration of our Slurm batch system, alongside deploying, optimizing, and debugging applications, scientific libraries and software environments. Additionally, you will contribute to the management and planning of our scientific software catalog to ensure it meets the diverse needs of our research community. This position offers the opportunity to work on challenges that push technological boundaries while mentoring junior staff and guiding the evolution of our HPC capabilities.
**Your specific responsibilities will be to:**
+ Administer, optimize and maintain Slurm for effective job scheduling and resource management in a multi-user HPC environment.
+ Provide implementation, debugging and performance tuning of parallel applications, ensuring high levels of efficiency and reliability.
+ Manage and plan a comprehensive scientific software catalog, ensuring that software tools are current, properly configured, and aligned with users¿ research objectives.
+ Collaborate with multidisciplinary teams to identify performance bottlenecks and software needs, devising innovative solutions to enhance computational workflows.
+ Spearhead initiatives for the design, scaling, and deployment of advanced computing infrastructure to support evolving research and operational demands.
+ Conduct performance analysis and benchmarking of HPC applications, effectively communicating results and recommendations to stakeholders.
+ Stay attuned to emerging trends and technologies in HPC, proposing strategic enhancements to maintain our competitive advantage.
**To be successful in this position you will bring:**
+ Bachelor¿s degree in computer science, computer engineering, or a related field and 5 years of relevant experience below or Master¿s degree and 3 years of relevant experience below:
+ Proficiency in debugging and profiling tools for high-performance parallel applications (e.g., gdb, Valgrind, Nvidia Nsight).
+ In-depth knowledge of Linux operating systems and advanced shell scripting.
+ Proven expertise in programming with C, C++, and Fortran, Python, along with deep experience in OpenMPI.
+ Strong problem-solving abilities complemented by exceptional communication skills to bridge technical concepts with non-technical stakeholders.
**Preferred Qualifications:**
+ Experience working in scientific or academic environments, collaborating closely with researchers and understanding their computational needs.
+ Familiarity with the scientific research process and the ability to translate research requirements into technical solutions.
+ Prior exposure to scientific computing applications and tools commonly used in fields such as physics, astrophysics, biophysics, and materials science.
+ Previous roles as a consultant or technical liaison between researchers and IT departments will be advantageous.
**Why Join Us?**
+ Innovative Environment: Work at the forefront of cutting-edge science and technology, contributing to revolutionary projects like LCLS and the Rubin Observatory that will redefine our understanding of the universe.
+ Collaborative Culture: Join a vibrant team of experts in a multidisciplinary environment, fostering collaboration across various scientific disciplines.
+ Professional Growth: Benefit from continuous learning and development opportunities, including access to training programs, workshops, and conferences.
+ Work-Life Balance: Enjoy a supportive work environment that values your well-being, with flexible working arrangements to support a healthy work-life balance.
+ Comprehensive Benefits: SLAC offers a competitive salary and a generous benefits package, including health, dental, and vision insurance, retirement savings plans, and tuition assistance for continued education.
**SLAC Employee Competencies:**
+ **Effective Decisions** : Uses job knowledge and solid judgment to make quality decisions in a timely manner.
+ **Self-Development** : Pursues a variety of venues and opportunities to continue learning and developing.
+ **Dependability** : Can be counted on to deliver results with a sense of personal responsibility for expected outcomes.
+ **Initiative** : Pursues work and interactions proactively with optimism, positive energy, and motivation to move things forward.
+ **Adaptability** : Flexes as needed when change occurs, maintains an open outlook while adjusting and accommodating changes.
+ **Communication** : Ensures effective information flow to various audiences and creates and delivers clear, appropriate written, spoken, presented messages.
+ **Relationships** : Builds relationships to foster trust, collaboration, and a positive climate to achieve common goals.
**Physical Requirements and Working Conditions:**
+ Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job. May work extended hours during peak business cycles.
**Work Standards** :
+ Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
+ Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for environment, safety and security; communicates related concerns; uses and promotes safe behaviors based on training and lessons learned.Meets the applicable roles and responsibilities as described in the ESH Manual, Chapter 1¿General Policy and Responsibilities: Subject to and expected to comply with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in the University's Administrative Guide, Title: System Administrator 2
Grade: I
Job code: 4832
Duration: Regular Continuing
_The expected pay range for this position is $_ _$22,024 to 147,076 per annum. SLAC National Accelerator Laboratory/Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs._
SLAC National Accelerator Laboratory is an Affirmative Action / Equal Opportunity Employer and supports diversity in the workplace. All employment decisions are made without regard to race, color, religion, sex, national origin, age, disability, veteran status, marital or family status, sexual orientation, gender identity, or genetic information. All staff at SLAC National Accelerator Laboratory must be able to demonstrate the legal right to work in the United States. SLAC is an E-Verify employer.
Be The First To Know
About the latest High performance computing Jobs in United States !
High-Performance Computing Future Opportunities

Posted 1 day ago
Job Viewed
Job Description
**Type of Requisition:** Regular
**Clearance Level Must Be Able to Obtain:** None
**Public Trust/Other Required:** None
**Job Family:** IT Infrastructure and Operations
**Skills:**
Computing,HPC,Supercomputing
**Experience:**
0 + years of related experience
**Job Description:**
Transform technology into opportunity with the GDIT HPC community. A career in GDIT Advanced Computing means connecting and enhancing the systems that matter most. At GDIT you'll be at the forefront of innovation and play a meaningful part in improving how agencies operate.
At GDIT, people are our differentiator. The GDIT HPC community will help ensure today is safe and tomorrow is smarter. GDIT builds supercomputers with the speed, storage, and computing power required to run complex models for climate research, scientific and medical discovery, and mission operations
**GDIT is seeking candidates who would be interested in joining our team within the HPC community for future job opportunities.**
+ **Security clearance will depend on opportunity: none, secret, TS, TS/SCI**
+ **T** **he range is an estimate and is not** **a guarantee of compensation or salary; a** **ctual compensation will vary based on** **on geographical location, education, experience, and skillsets**
If you are interested in being apart of building GDIT's pipeline of HPC candidates please click **APPLY** today.
To learn more about GDIT's HPC capabilities click the link below:
likely salary range for this position is $43,888 - $0. This is not, however, a guarantee of compensation or salary. Rather, salary will be set based on experience, geographic location and possibly contractual requirements and could fall outside of this range.
Our benefits package for all US-based employees includes a variety of medical plan options, some with Health Savings Accounts, dental plan options, a vision plan, and a 401(k) plan offering the ability to contribute both pre and post-tax dollars up to the IRS annual limits and receive a company match. To encourage work/life balance, GDIT offers employees full flex work weeks where possible and a variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave. To ensure our employees are able to protect their income, other offerings such as short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance are provided or available. We regularly review our Total Rewards package to ensure our offerings are competitive and reflect what our employees have told us they value most.
We are GDIT. A global technology and professional services company that delivers consulting, technology and mission services to every major agency across the U.S. government, defense and intelligence community. Our 30,000 experts extract the power of technology to create immediate value and deliver solutions at the edge of innovation. We operate across 50 countries worldwide, offering leading capabilities in digital modernization, AI/ML, Cloud, Cyber and application development. Together with our clients, we strive to create a safer, smarter world by harnessing the power of deep expertise and advanced technology.
Join our Talent Community to stay up to date on our career opportunities and events at Opportunity Employer / Individuals with Disabilities / Protected Veterans
Software Engineer, High Performance Computing
Posted 1 day ago
Job Viewed
Job Description
+ Bachelor's degree or equivalent practical experience.
+ 2 years of experience in high performance computing (HPC) system architecture and applications.
+ 2 years of experience testing, and launching software products, and experience with software design and architecture.
**Preferred qualifications:**
+ Advanced degree in physics, mathematics, life sciences engineering, computer science, engineering, or a similar technical field.
+ 4 years of experience in software development in C++, Phyton, Julia or similar programming languages used for technical/scientific/engineering computing.
+ 4 years of experience in scientific computing (workflows, applications, state-of-the-art) from one or more domains (health care/life science, manufacturing (CAE, EDA, energy), financial services industry).
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.
Our mission is to enable our customers to run their most demanding workloads for technical, scientific and engineering issues on our Google Cloud Platform (GCP). This High Performance Computing (HPC) role offers supercomputer-class infrastructure (CPUs, GPUs or TPUs) that interoperates with other cloud services from storage to AI.
We offer a range of Virtual Machine (VM) families tailored for HPC use and innovative control plane constructs to build scalable systems. We enable our customers to create tailor-made HPC environments from cloud building blocks or derived from use case focused reference architectures. We help our customers navigate the integration of AI into computational workflows and workloads.
Google Cloud accelerates every organization's ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google's cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.
The US base salary range for this full-time position is $141,000-$202,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .
**Responsibilities:**
+ Understand our customers computing goals and generalize them into repeatable usage patterns and adequate cloud architecture.
+ Implement specific HPC solutions as Infrastructure-as-Code and necessary deployment tooling functionality.
+ Work closely with technical leads, product managers and partner service engineering teams to get high-quality features through the software project life-cycle.
+ Manage project schedules, identify technical risks and clearly communicate them to project stakeholders.
+ Collaborate with Program Manager (PM) and Go-to-Market (GTM) teams to develop solution collateral (guides, whitepapers, blog posts, etc.) to onboard customers and drive adoption.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also and If you have a need that requires accommodation, please let us know by completing our Accommodations for Applicants form:
Lead Process Engineering, High Performance Computing
Posted 7 days ago
Job Viewed
Job Description
Peraton is hiring a Lead Process Engineer to facilitate the government customer's large High Performance Computing (HPC) related program. This program is on the cutting edge and includes everything from HPC test planning and execution, architecture design and prototyping, and vendor outreach and collaboration support. Program technical areas include commercial cloud technologies, high performance computing, and enterprise architecture. The program is tactically important to the national security of the United States and the work on these missions are frequently recognized for their results in achieving their planned objectives of this growing, high-profile program.
The selected Process Engineer will:
* Apply process improvement (PI), engineering methodologies, and principles to effectively improve and align Enterprise-level processes.
* Facilitate project teams in accomplishing project activities and objectives. Coordinates the transfer of new processes and practices to project teams by providing group facilitation, interviewing, training and additional forms of knowledge transfer.
* Acts as a key coordinator between multiple project teams to ensure Enterprise-wide integration of engineering efforts.
Qualifications
* Active TS/SCI with poly clearance required.
* A Bachelor's Degree in computer science, information systems, engineering, business, or education from an accredited college or university is required.
* Ten (10) years' experience with Process Improvement on programs and contracts of similar scope, type, and complexity is required.
* Experience is to include, within the past ten (10) years, five (5) years' experience in facilitation, training, methodology development and evaluation, process engineering across all phases of acquisition identifying best practices, change management, business management techniques, organizational development, activity and data modeling, or information system development methods and practices.
Applicants selected will be subject to a government security investigation and must meet eligibility requirements for access to classified information. Peraton offers enhanced benefits to employees working on this critical National Security program, which include heavily subsidized employee benefits coverage for you and your dependents, 25 days of PTO accrued annually up to a generous PTO cap and eligible to participate in an attractive bonus plan.
Peraton Overview
Peraton is a next-generation national security company that drives missions of consequence spanning the globe and extending to the farthest reaches of the galaxy. As the world's leading mission capability integrator and transformative enterprise IT provider, we deliver trusted, highly differentiated solutions and technologies to protect our nation and allies. Peraton operates at the critical nexus between traditional and nontraditional threats across all domains: land, sea, space, air, and cyberspace. The company serves as a valued partner to essential government agencies and supports every branch of the U.S. armed forces. Each day, our employees do the can't be done by solving the most daunting challenges facing our customers. Visit peraton.com to learn how we're keeping people around the world safe and secure.
Target Salary Range
$86,000 - $138,000. This represents the typical salary range for this position based on experience and other factors.
EEO
EEO: Equal opportunity employer, including disability and protected veterans, or other characteristics protected by law.