Unfortunately, this job posting is expired.
Don't worry, we can still help! Below, please find related information to help you with your job search.
Some similar recruitments
Site Reliability Engineer Jobs
Recruited by The Jackson Laboratory 10 months ago Address , Farmington, 06032, Ct $75,743 - $130,196 a year
Senior Silicon Reliability Engineer
Recruited by Amazon Kuiper Manufacturing 11 months ago Address , Redmond, Wa From $127,300 a year

Senior Site Reliability Engineer

Company

Microsoft

Address , Redmond, 98052, Wa
Employment type FULL_TIME
Salary $112,000 - $218,400 a year
Expires 2023-07-21
Posted at 10 months ago
Job Description
Microsoft’s Intelligence Platform engineering team is leading the transformation of analytics in the world of data with products like Power BI, Synapse Analytics, Azure Data Factory, Azure Data Explorer. We will bring the world’s data to the Microsoft Cloud, power a new class of data first applications, and empower everyone on the planet to make better decisions with data.

We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our ideas are better, our products are better, and our customers are better served.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on
our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

Technical Knowledge and Domain-Specific Expertise
  • Experience working with all service aspects of high throughput and multi-tenant services, ability to design components carefully, properly handle errors, write clean and well-factored code with good tests and good maintainability.
  • Researches and maintains deep knowledge of industry trends as well as advances in large-scale distributed systems and cloud technologies; identifies opportunities to create, implement, and/or optimally utilize new tools, technologies, and/or processes to solve ambiguous problems and improve product availability, reliability, efficiency, observability, and/or performance. Drives the adoption of new solutions across engineering teams working with related products within an organization and provides guidance and coaching to others on relevant topics.
Contributions to Development and Design
  • Engages with product engineering teams within an organization by driving code/design reviews, hosting regular meetings, and participating in on-call rotations and incident responses throughout product development and operations cycles; leverages end-to-end technical expertise on underlying systems/platforms and insights from engagements with product engineering teams and telemetry analyses to propose scalable improvements in code and designs with attention to customer/business objectives and incident prevention.
Driving Operational Excellence
  • Develops code, scripts, systems, or platforms that automate moderately complex but repetitive operations processes (e.g., monitoring, alerting, deploying products and updates, debugging) at scale; reviews existing automation code and scripts to evaluate reusability, extendibility, and scalability within an organization.
  • Shares insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced Site Reliability Engineers (SREs) and members of product engineering teams. Mentors and coaches less experienced engineers to help them identify and propose relevant solutions.
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale. Contributes to the development of new tooling and/or predictive models to identify and test potential improvements in product development and/or operations, and monitors the impact of changes on operations metrics (e.g., Time-to-X) within an organization.
  • Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting complex issues, and deploying appropriate fixes to resolve root cause(s); alerts product teams, owners, and leadership to issues with major customer/business impact and escalates resolution of the highly complex, ambiguous, and impactful issues to include other engineering teams and/or subject matter experts as needed. Shares details related to incidents and their resolution through post-mortem reports and during regular review meetings.
  • Serve as a point of contact, trusted advisor and interact with customers other external stakeholders as a spokesperson for customer confidence or escalations calls and Support process for incident management including quality control of Root Cause Analysis (RCAs).
Other · Embody our culture and values

Qualifications

Required/Minimum Qualifications
  • 6+ years of experience in Site reliability engineering role experience with large-scale, distributed infrastructures
  • 5+ years’ experience with scripting languages such as PowerShell, Python etc.
  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
  • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Additional or Preferred Qualifications
  • 6+ years’ experience troubleshooting, investigating, and fixing production issues in large scale cloud and/or hosted environments
  • 4+ years experience with building infrastructure using Microsoft Azure technology
  • 5+ years experience writing product code in a major cloud service (C++, C# or Node.JS) including experience with algorithms, data structures, and software design
Site Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $112,000 - $218,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $145,800 - $238,600 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#azdat