Site Reliability Engineer, Recommendation Infrastructure (San Jose)
By TikTok At , San Jose $112,200 - $205,000 a year
Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters
Bachelor's degree or above majoring in Computer Science or related fields, with at least 1 years of related work experience
Experience in SRE of large-scale systems deployment with high reliability and scalability
Familiar with system operation skills in Linux and network
Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++
Experience in designing, analyzing and troubleshooting large-scale distributed systems
Site Reliability Engineer Jobs
By Zscaler At , San Jose
Strong Centos/UNIX skills, FreeBSD specific experience is a plus.
5 -7 years experience in a SaaS/ Cloud/Distributed environment growing at a rapid scale.
Minimum 3+ years of scripting experience in Python is required.
Hands-on experience with infrastructure as code and automation tools (Ansible, Chef, Puppet, Terraform).
Basic Networking skills (TCP/IP, DNS, LACP, CARP) for testing and troubleshooting are required.
Competitive salary and benefits, including equity
Site Reliability Engineer (Sre)
By Agama Solutions At , San Jose
5+ years of US experience as in a SRE role
Good communication (and listening) skills.
Some experience administering Linux “web” servers, at scale.
Working knowledge of DNS, HTTP, TLS, web security.
Experience with networking troubleshooting using tools such as TCP Dump.
Well versed in *nix Operating Systems (we use CentOS and Ubuntu LTS).
Site Reliability Engineer Jobs
By eBay At , San Jose, 95125, Ca $168,400 - $262,900 a year
Develop automation systems for implementing eBay Traffic management
Manage eBay’s traffic infrastructure including SLB, CDN, etc.
Solid programming experience in languages like Golang, Java, C/C++
Experience with Kubernetes, docker is a must
Experience working with public cloud is a plus
Experience in software load balancer(IPVS, Envoy, Istio, Cilium etc) is a plus
Site Reliability Engineer Jobs
By Techfellow Limited At Chicago, IL, United States
Develop cutting-edge solutions following robust engineering principles alongside an experienced team
Proficient scripting skills in PowerShell, Python, or comparable languages
Harness Python and PowerShell scripting to streamline build, configuration, deployment, and admin tasks
Bolster communication and collaboration, serving as a nexus between business users and tech teams
Deploy, oversee, and refine Windows infrastructure
Identify and actualise system enhancements for optimal performance
Site Reliability Engineer Jobs
By IBM At , San Jose, 95101, Ca $123,480 - $229,320 a year
Work with others across the team (Developers, DevOps Engineers, Sys Admins and the Release Manager) during software releases
2+ years experience w/ Kubernetes
1.5+ years experience with Golang or similar
2+ years experience with Node.js or similar
2+ years experience with cloud platforms (e.g. IBM Cloud, AWS, GCP)
2+ years experience using CI/CD processes

Are you looking for a challenging and rewarding role as a Remote Site Reliability Engineer? We are looking for a talented individual to join our team and help us ensure our systems are reliable and secure. You will be responsible for monitoring, troubleshooting, and resolving issues with our systems, as well as developing and implementing strategies to improve system performance. If you have a passion for technology and a desire to make a difference, this is the job for you!

Overview:

A Remote Site Reliability Engineer is responsible for ensuring the reliability, availability, and scalability of a company’s remote systems and services. This role requires a combination of technical and operational skills to ensure that the remote systems are running optimally and securely. The Remote Site Reliability Engineer will work with the development, operations, and security teams to ensure that the remote systems are reliable, secure, and available.

Detailed Job Description:

The Remote Site Reliability Engineer will be responsible for the following tasks:

• Design, implement, and maintain remote systems and services.
• Monitor and troubleshoot remote systems and services.
• Develop and maintain automation and configuration management systems.
• Develop and maintain security policies and procedures.
• Develop and maintain system and service performance metrics.
• Develop and maintain system and service availability metrics.
• Develop and maintain system and service scalability metrics.
• Develop and maintain system and service reliability metrics.
• Develop and maintain system and service security metrics.
• Develop and maintain system and service documentation.
• Develop and maintain system and service monitoring and alerting systems.
• Develop and maintain system and service backup and recovery systems.
• Develop and maintain system and service disaster recovery plans.
• Develop and maintain system and service capacity planning.
• Develop and maintain system and service performance tuning.
• Develop and maintain system and service patching and upgrades.
• Develop and maintain system and service security hardening.
• Develop and maintain system and service change management processes.
• Develop and maintain system and service incident response plans.
• Develop and maintain system and service root cause analysis processes.

What is Remote Site Reliability Engineer Job Skills Required?

• Expertise in remote systems and services.
• Expertise in automation and configuration management systems.
• Expertise in security policies and procedures.
• Expertise in system and service performance metrics.
• Expertise in system and service availability metrics.
• Expertise in system and service scalability metrics.
• Expertise in system and service reliability metrics.
• Expertise in system and service security metrics.
• Expertise in system and service documentation.
• Expertise in system and service monitoring and alerting systems.
• Expertise in system and service backup and recovery systems.
• Expertise in system and service disaster recovery plans.
• Expertise in system and service capacity planning.
• Expert