Principal Site Reliability Engineer Jobs in United States , Employment

By Regal Rexnord At , Morehead, 40351, Ky

Experience and understanding in DevOps, Cloud Resiliency, Performance Engineering, Release Engineering, Application Performance Management and Capacity Planning, Caching, JavaScripts and .Net

Responsible for Application Performance Monitoring tool administration and management by monitoring availability and taking a holistic view of system health

Reduce organizational ‘toil’ via automation, scripting, and implementation and management of toolsets

Understand business / technical requirements and the overall business objectives of applications

2+ years of experience in software application development or test automation

5+ years of Performance Engineer or related experience with high-traffic, large-scale distributed systems, client-server architectures both on-prem and cloud (Primarily Azure)

Site Reliability Engineer, Netflix Technology

By Netflix At , Remote

Experience with incident management and response

Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks

Reads signals in aggregate to develop deeper insights into the quality of experience for our users to help inform business decisions

Experience with complex sociotechnical systems and their successful operations at scale

Experience conducting blame-aware incident reviews

Strong analytical and problem-solving skills

Site Reliability Engineer, Systems

By Anthropic At , San Francisco, Ca

Automate operations and infrastructure management

Have significant experience with Kubernetes and cloud-native infrastructure

Have strong communication skills to work with a range of technical and non-technical colleagues

Python and Linux SysAdmin skills

Significant experience with Kubernetes architecture and administration

Strong Linux skills and cloud infrastructure expertise

Site Reliability Engineer (L4/5) - Core

By Netflix At , Los Gatos, Ca

Experience in risk management and/or analysis

Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks

Read signals and metrics to develop deeper insights into our customers’ quality of experience to help inform business decisions

Strong writing and presentation skills

Development experience with Java, JavaScript/Node.js, Python, Go

Knowledge of cloud platforms (i.e. AWS, GCP, etc.) and microservices architecture

Sr. Site Reliability Engineer

By CCC At , Chicago, Il

Experience preparing and presenting operational artifacts to senior management

Gain and disseminate knowledge of our complex applications

2+ years experience working with the Azure tech stack in a production capacity

5+ years operational experience working with Microsoft technologies

Comfort and experience with Ops environment growing at a rapid scale.

Knowledge of Virtualization, Cloud Infrastructure and APIs

Site Reliability Engineer Jobs

By Nike At Beaverton, OR, United States

This overview explains our hiring process for corporate roles. Note there may be different hiring steps involved for non-corporate roles

Site Reliability Engineer (Sre) - $700,000

By Thurn Partners At New York, NY, United States

3+ years' experience in a similar software engineering or site reliability engineering position

Experience with SQL database operations

Experience with Kafka, CICD pipelines and virtualisation a bonus

Beautiful office space with generous overall benefits package

Proficiency with either Python or Golang

Extremely competitive compensation including performance bonuses

Senior Site Reliability Engineer

By NVIDIA At California, United States

BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience

Technical leadership beyond development that includes scoping, requirements capturing, leading and influencing multiple teams of engineers on broad development initiatives.

Experience with the ELK and Prometheus stacks as a power user and administrator.

Prior experience driving production issues and helping with on-call support.

Experience with Cuda, PyTorch, TensorRT, TensorFlow, and/or Triton.

Experience with StackStorm and similar automation platforms is a bonus.

Application Support – Site Reliability Engineer

By Morgan Stanley At New York, NY, United States

Good working knowledge of trading and risk management business concepts

Ensure efficient incident management, ensuring accurate communication to impacted groups and timely resolution.

Familiarity with SDLC processes and management tools (Jira/GIT/Stashblue)

Network diagnostic skills and experience with networks and realtime messaging technologies (multicast, TCP/IP, UDP, SNMP)

Facilitate root cause investigations and manage the implementation of corrective and preventative measures.

Manage coverage during Asian and European market hours, including weekend pre-open ready-for-business checks.

Site Reliability Engineer Jobs

By Spotify At Greater Chicago Area, United States

• 4+ years of IT experience needed

• Experience working in a Linux environment

• Good knowledge of Unix

• Basic experience in writing SQL queries

• Good verbal communicative skills

• Ability to manage priorities and deadlines

Site Reliability Engineer (.Net Engineer)

By Suzy At United States

Exposure to a Configuration Management System (Puppet, Chef, Salt, etc)

Optimize: Observe and improve performance, reduce cost, and improve the experience for millions of users

3+ years of experience in Software Engineering, Site Reliability Engineering, or a Development focused DevOps role.

Experience with Kubernetes and Cloud systems

Experience with the development and operation of high-traffic backend systems

Troubleshooting skills that span applications, networking (TCP/IP), and systems

Principal Reliability Engineer Jobs

By Novartis At Cambridge, MA, United States

3+ years of people leadership, project management, and in collaborating across boundaries experience

Experience in Data Management & Systems, preferably in data security

Broadly experienced specialists managing a small unit OR project. May be responsible for managing others -Leads/co‐leads novel projects within the team

Experience in implementing DevOps tools and practices for product and services teams

Experience handling a large volume of data

Experience with AWS and containers

Site Reliability Engineer - Usds

By TikTok At Seattle, WA, United States

Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, ...

Site Reliability Engineer - All Levels

By FedEx Dataworks At United States

Experience in FinOps - Cloud cost management

Experience/knowledge in capacity planning, demand forecast based on production KPIs and provisioning.

Two (2) years equivalent work experience in information technology or engineering environment. A related advanced degree may offset the experience requirements.

Bachelor's Degree in Computer Science, Engineering, Information Systems and/or related field or equivalent formal training or work experience.

Strong SRE background, with experience in Cloud platforms, Software Development, DevOps, and Data Engineering

Strong skills in Python, SQL, Azure or other Cloud technologies

Senior Site Reliability Engineer

By Business Wire At United States

Strong experience with AWS cloud infrastructure and container orchestration (Kubernetes, Docker)

Strong experience with monitoring and alerting systems such as Prometheus, Grafana, Nagios, etc.

Strong experience with at least one programming language. Java is highly preferred but other languages such as Python will be considered

Advanced experience with Linux system administration, Java based applications, and network architecture

Ability to work remotely 100%

Excellent health benefits that begin on your first day of employment

Site Reliability Engineer Jobs

By Therapy Brands At Birmingham, AL, United States

2+ years of experience programming or scripting. C# or Python is preferred.

1+ years of experience with cloud environments: AWS and Azure

1+ years of experience with SQL: writing basic select and update statements

Primary Responsibilities Of This Position

Familiarity with networking fundamentals: TCP/IP, DNS resolution

Familiarity with tools including or similar to: Grafana, InfluxDB, OpenTelemetry

Site Reliability Engineer Jobs

By Xforia Global Talent Solutions At United States

Support system design consulting, platform management, and capacity planning

Excellent communication skills and a high degree of technical leadership skills.

As Site Reliability Engineer you will:

Support the production environment by monitoring availability and the system health.

Improve reliability, quality, and time-to-release of the changes.

Provide primary operational support and engineering for multiple large-scale distributed software applications.

Site Reliability Engineer Jobs

By Sohum Inc At San Francisco Bay Area, United States

Full time opportunity that offers excellent benefits.

• Configuration Management and IAC - Salt, Pulumi (Terraform will work)

• Bachelor’s degree in CS / other highly technical discipline, or equivalent experience

• 5+ years of experience and 3+ years experience as Site Reliability Engg

• Strong networking and firewall knowledge

• Exceptional problem solving and troubleshooting skills

Site Reliability Engineer Jobs

By Insight GlobalProject Manager At Hampton, VA, United States

● (5+) years of experience working in Software Engineering, or Site Reliability Engineering

● Experience building and maintaining Container Orchestration across hybrid-cloud infrastructure

● Experience deploying and configuring modern observability tooling for monitoring and alerting.

● Experience programming in Java, JavaScript, and SQL dialects with the Spring framework and React library

● Experience writing or troubleshooting software delivery pipelines, eg: GitLab CI and Concourse

Active DOD security clearance or the ability to obtain an interim secret within 60 days of hire

Site Reliability Engineer Jobs

By Blue Yonder At Dallas, TX, United States

Solid understanding of large-scale applications, Cloud Observability, monitoring and fault management, and understanding of Network Architectures

Respond to technical business requirements around availability, performance, and planned maintenance activities to ensure a well-operating solution and SLA compliance.

Strong experience of min 5 years’ experience developing, managing, or supporting distributed systems in a

Experience working with monitoring and visualization tools such as Splunk and AppDynamics

Experience coordinating between support and development teams to ensure effective delivery of monitoring services to the end-user.

Experience implementing best practices and industry standards for operational monitoring aligned to ITIL.

Are you looking for an opportunity to join a fast-paced and innovative team as a Principal Site Reliability Engineer? We are looking for a highly motivated individual to join our team and help us build and maintain reliable, secure, and scalable systems. You will be responsible for developing and implementing strategies to ensure the availability, performance, and security of our systems. If you have a passion for technology and a drive to make a difference, this is the job for you!

Overview:

A Principal Site Reliability Engineer is responsible for ensuring the reliability, availability, and scalability of a company’s IT infrastructure. They are responsible for developing, implementing, and maintaining systems and processes that ensure the highest levels of performance and reliability. They must be able to troubleshoot and resolve complex technical issues quickly and efficiently.

Detailed Job Description:

The Principal Site Reliability Engineer is responsible for designing, developing, and maintaining systems and processes that ensure the highest levels of performance and reliability. They must be able to troubleshoot and resolve complex technical issues quickly and efficiently. They must be able to identify potential problems and develop solutions to prevent them from occurring. They must be able to work with other teams to ensure that the systems and processes are properly implemented and maintained. They must be able to provide technical guidance and support to other teams.

What is Principal Site Reliability Engineer Job Skills Required?

• Strong technical knowledge of IT infrastructure, including hardware, software, and networking

• Knowledge of system and process design

• Knowledge of system and process automation

• Knowledge of system and process monitoring

• Knowledge of system and process optimization

• Knowledge of system and process security

• Knowledge of system and process scalability

• Knowledge of system and process troubleshooting

• Ability to work independently and as part of a team

• Ability to work under pressure and meet deadlines

• Excellent problem-solving and analytical skills

• Excellent communication and interpersonal skills

What is Principal Site Reliability Engineer Job Qualifications?

• Bachelor’s degree in Computer Science, Information Technology, or related field

• 5+ years of experience in IT infrastructure, system and process design, system and process automation, system and process monitoring, system and process optimization, system and process security, system and process scalability, and system and process troubleshooting

• Experience with cloud technologies such as AWS, Azure, or GCP

• Experience with scripting languages such as Python, Bash, or PowerShell

• Experience with configuration management tools such as Chef, Puppet, or Ansible

• Experience with monitoring tools such as Nagios, Zabbix, or Splunk

• Experience with container technologies such as Docker or Kubernetes

What is Principal Site Reliability Engineer Job Knowledge?

• Knowledge of IT infrastructure, including hardware, software, and networking

• Knowledge of system and process design

• Knowledge of system and process automation

• Knowledge of system and process monitoring

• Knowledge of system and process optimization

• Knowledge of system

Latest vacancies

Systems Analyst - Excel, Xml, Sql, Scripting
By CyberCoders At Salt Lake City, UT, United States 8 months ago
(Senior) Finance & Shared Services Manager
By Catholics For Choice At Washington, DC, United States 8 months ago
Paralegal - Probate Administration
By CyberCoders At Miami, FL, United States 8 months ago
Account Executive - Automotive Software
By ECW Search At United States 8 months ago
Construction Project Coordinator Jobs
By CyberCoders At River Falls, WI, United States 8 months ago

Principal Site Reliability Engineer at