Site Reliability Engineer, Recommendation Infrastructure (San Jose)
By TikTok At , San Jose $112,200 - $205,000 a year
Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters
Bachelor's degree or above majoring in Computer Science or related fields, with at least 1 years of related work experience
Experience in SRE of large-scale systems deployment with high reliability and scalability
Familiar with system operation skills in Linux and network
Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++
Experience in designing, analyzing and troubleshooting large-scale distributed systems

Are you looking for an exciting opportunity to join a dynamic team and help build a reliable cloud infrastructure? We are looking for a Site Reliability Engineer to join our team and help ensure our cloud infrastructure is running smoothly and efficiently. You will be responsible for developing and maintaining our cloud infrastructure, monitoring performance, and troubleshooting any issues that arise. If you have a passion for technology and a desire to work in a fast-paced environment, then this is the job for you!

Overview:

A Cloud Infrastructure Site Reliability Engineer is responsible for ensuring the reliability, scalability, and performance of cloud-based infrastructure and services. They are responsible for developing and maintaining automation and monitoring systems, troubleshooting and resolving complex technical issues, and providing technical guidance and support to other teams.

Detailed Job Description:

The Cloud Infrastructure Site Reliability Engineer is responsible for designing, developing, and maintaining cloud-based infrastructure and services. This includes developing and maintaining automation and monitoring systems, troubleshooting and resolving complex technical issues, and providing technical guidance and support to other teams. The engineer will also be responsible for developing and maintaining best practices for cloud infrastructure and services, and ensuring that all systems are secure and compliant with industry standards.

What is Cloud Infrastructure Site Reliability Engineer Job Skills Required?

• Expertise in cloud-based infrastructure and services, including AWS, Azure, and Google Cloud Platform
• Knowledge of scripting languages such as Python, Bash, and PowerShell
• Knowledge of automation and monitoring tools such as Chef, Puppet, and Ansible
• Knowledge of system and network security best practices
• Ability to troubleshoot and resolve complex technical issues
• Excellent communication and problem-solving skills

What is Cloud Infrastructure Site Reliability Engineer Job Qualifications?

• Bachelor’s degree in Computer Science, Information Systems, or related field
• 5+ years of experience in cloud-based infrastructure and services
• Experience with scripting languages such as Python, Bash, and PowerShell
• Experience with automation and monitoring tools such as Chef, Puppet, and Ansible
• Experience with system and network security best practices

What is Cloud Infrastructure Site Reliability Engineer Job Knowledge?

• Knowledge of cloud-based infrastructure and services, including AWS, Azure, and Google Cloud Platform
• Knowledge of scripting languages such as Python, Bash, and PowerShell
• Knowledge of automation and monitoring tools such as Chef, Puppet, and Ansible
• Knowledge of system and network security best practices
• Knowledge of DevOps principles and practices

What is Cloud Infrastructure Site Reliability Engineer Job Experience?

• 5+ years of experience in cloud-based infrastructure and services
• Experience with scripting languages such as Python, Bash, and PowerShell
• Experience with automation and monitoring tools such as Chef, Puppet, and Ansible
• Experience with system and network security best practices
• Experience with DevOps principles and practices

What is Cloud Infrastructure Site Reliability Engineer Job Responsibilities?

• Design, develop, and maintain cloud-based infrastructure and services
• Develop and maintain automation and monitoring systems
• Troubleshoot and resolve