TIBCO Software Senior Site Reliability Engineer in North Sydney, Australia
Senior Site Reliability EngineerAU-New South Wales-North Sydney, Australia
View All Jobs
Share This Job
Headquartered in Palo Alto, Calif., TIBCO Software empowers businesses to their digital destinations by interconnecting everything in real time and providing augmented intelligence for everyone, from business users to data scientists.
With more than 10,000 customers, 3500 employees located in over 30 countries, TIBCO has retained the speed and agility of a start-up. We value and encourage new ideas, direct communication, out-of-the-box thinking, risk-taking and creative problem solving.
We're looking for people who want to make a difference doing a job they love – dynamic individuals willing to take the risks necessary to make big ideas come to life and who are comfortable collaborating in our creative, new-idea-driven environment. We value hard work and provide new opportunities to grow, learn, and excel.
As part of TIBCO’s Cloud First initiative, we are investing in people, processes and technology to build and run enterprise Software as a Service. Join the Cloud Infrastructure Engineering & Operations team, part of the Product and Technology organization, and help us develop and operate industry-leading cloud products.
As a Site Reliability Production Engineer, you will be responsible for the availability, automation, performance, efficiency, scaling, monitoring, and emergency response of TIBCO’s cloud operating system. You will use your deep understanding of TIBCO’s platform, architecture, people, systems, and processes to continuously improve uptime, performance, deployment, monitoring, and troubleshooting. You will also be expected to provide Tier 3 support for service incidents as escalated by TIBCO’s Tier 2 Operations staff and Support teams.
Collaborate with developers and other internal groups to identify, prioritize and develop service reliability and manageability improvements.
Develop tools to improve ability to rapidly deploy and effectively monitor production services in a dynamic, large-scale Linux environment.
Mentor and guide SREs and Systems Administrators on effective methods to deliver enterprise-class services.
Continually evolve deployment methods and processes to effectively deliver products in various environments including Production, Staging, QA and Development.
Design and document systems, including writing and reviewing code, to automate routine issues and minimize manual tasks.
Participate in a 24x7 on-call rotation for third-tier escalations, working with NOC, Engineering and other Ops teams to troubleshoot service issues.
Manage assigned project activities to achieve stated project goals, objectives, and schedules.
Collaborate deeply with a cross functional team of Software Architects and Engineers, Quality Engineers, Engineering management and other SREs.
Experience with infrastructure automation, infrastructure as code, automated application deployment
Experience with automation/configuration management tools: Puppet, Chef, Ansible, or Salt
3 years operating production SaaS offering on at least one of the following cloud platforms: AWS, Azure, OpenStack, CloudFoundry or other
Experience with Docker, Mesos, and/or Kubernetes in production workloads
Experience with Continuous Integration tools such as Jenkins, Bamboo, etc.
Excellent written and verbal skills
Ability to identify and understand complex issues and develop effective solutions
Ability to handle multiple tasks, prioritize and meet deadlines
Passion for customer success via deep desire to exceed expectations and Service Level Agreements