Unfortunately, this job posting is expired.
Don't worry, we can still help! Below, please find related information to help you with your job search.
Some similar recruitments
Research Engineer Jobs
Recruited by Modern Technology Solutions, Inc. (MTSI) 8 months ago Address Edwards, CA, United States
Computing Facilities Group Data Center Support Engineer
Recruited by Lawrence Berkeley National Laboratory 9 months ago Address , San Francisco Bay Area $82,452 - $103,068 a year
Research Engineer Jobs
Recruited by Fujitsu 9 months ago Address Sunnyvale, CA, United States
Computing Systems Engineer Jobs
Recruited by Encantado Technical Solutions 9 months ago Address Livermore, CA, United States
Program Coordinator I Jobs
Recruited by San Diego State University 10 months ago Address San Diego, CA, United States
Project Coordinator I Jobs
Recruited by WinMax 10 months ago Address Cupertino, CA, United States
Research Engineer Jobs
Recruited by OpenAI 10 months ago Address San Francisco, CA, United States
Services Coordinator I Jobs
Recruited by MidPen Housing Corporation 11 months ago Address Livermore, CA, United States
2023 Junior Computing Systems Engineer
Recruited by The Aerospace Corporation 11 months ago Address , El Segundo, Ca $117,000 a year
Postdoc / Research Engineer Jobs
Recruited by California Institute for Telecommunications and Information Technology (CALIT2), UCSD 1 year ago Address San Diego, CA, United States
Research Engineer Jobs
Recruited by Magic 1 year ago Address San Francisco Bay Area, United States

High Performance Computing Cluster Administration

Company

NVIDIA

Address , Santa Clara
Employment type FULL_TIME
Salary $144,000 - $270,250 a year
Expires 2023-09-10
Posted at 9 months ago
Job Description
NVIDIA's Deep Learning Optimized Frameworks Group is looking for a deeply technical HPC cluster administrator to lead a diverse cluster of GPU-accelerated systems and provide architectural mentorship to product teams in the deep learning and scientific computing domains. As a member of the DLFW Infrastructure team, you will provide leadership in the design and implementation of groundbreaking GPU compute cluster that runs demanding deep learning, high performance computing, and computationally intensive workloads. We are looking for an expert to identify architectural changes and/or completely innovative approaches for our GPU Compute Cluster. In this role, you will help us with the strategic challenges we encounter, including compute, networking, and storage design for large-scale, high-performance workloads and effective resource utilization in a heterogeneous compute environment.
What you'll be doing:
  • Coordinate Storage Solutions and plan for growth.
  • Automate configuration management, software updates, and maintenance and monitoring of system availability using modern DevOps tools (Ansible, Gitlab, etc.)
  • Actively connect with management regarding any problems with the equipment and propose resolution.
  • Plan, build and install/upgrade new systems that support NVIDIA DL Software
  • Administer Linux systems, ranging from powerful DGX servers to embedded systems, bringup hardware to publicly available systems.
What we need to see:
  • Experience with containers (Docker, Singularity, LXC)
  • Deep understanding of operating systems, computer networks, and high-performance applications
  • You have a BA, BS, or MS in CS, EE, CE or equivalent experience
  • Proven track record to script in bash, Perl or python
  • Ability to work well with developers & test engineers
  • Familiar with resource scheduling managers (Slurm (preferred), LSF, etc!
  • Hard-working dedication to provide quality in support for your users
  • 5+ years of previous experience deploying and administrating HPC clusters
Ways to stand out from the crowd:
  • Experience with mobile and embedded systems
  • Experience coding/scripting in Perl/Python/bash
  • Familiarity with GPU usage in Compute Cluster and Cuda
  • Basic knowledge of Deep Learning.
  • Familiarity and prior work experience with technologies such as: Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
The base salary range is $144,000 - $270,250. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and
benefits
.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.