Talented Operations Technician with Linux /EC2 administration experience in a large data center environment. The ideal NOC Technician is an IT professional who understands the rigorous demands a 24x7x365 real-time online operational environment requires. This is not a HelpDesk position. The DPC's Operations Center proactively monitors the DPC's production infrastructure around the globe. This position demands innovative thinking, a desire to learn and expand one's skill set, proactive outlook and a strong desire to think outside the box. The ideal candidate would have many of the following:
- Working knowledge Unix/Linux OS and LAMP stack. Java application servers experience is a plus.
- Working knowledge virtualization platforms (vmware, EC2)
- Working knowledge of standard networking, systems and database principles
- Experience in managing a large scale enterprise server farm
- Strongly understands company processes, policies, service offerings and products.
- Strongly understands company resources such as; databases, software applications, and organizational structure.
- Experience in configuration management/control
- Provide quick and accurate feedback to engineering teams on issues detected in running systems
- Work closely and in coordination with other Operators to ensure proper turnover takes place between shifts/holidays Must be able to demonstrate full operation knowledge of LAMP stack.
- Must be able to demonstrate full operational knowledge of Linux commands and execute them with minimal supervision. This includes; identifying components that can be attached to an Linux based box, using Linux UI, menus and command interfaces to perform system functions. Startup and shut down Linux/virtual boxes , locate, create, change, and delete objects, Initiate, monitor, and control jobs, send and display messages, backup and restore objects.
- Identify, analyze, and initiate corrective actions for system, application servers, maintenance jobs, applications, and secuirty/authority-related problems.
Essential Functions
- Install, Deploy, Configure, Test, Monitor, Troubleshoot and Fix production systems and applications
- Day to day management oversight of NOC incidents to include monitoring/tracking tier 1 and 2 network, systems and database related incidents and direct ownership for tracking the resolution of tier 3 (or other escalated) incidents.
- Technician activities related to the operations and monitoring of the DPC's networks, systems and websites.
- Using monitoring tools, this position watches and performs triage via documented processes reacts to situations within DPC's environment.
- Report and escalate to onsite Senior team members, “incidents” which impact the DPC's environment. This is imperative to improve and expand the level of service that the Global Operations Center is chartered to perform.
- Trend analysis of operational metrics (performance, failures) to proactively identify system problems and opportunities for improvement to products and processes
- Articulates results of quarterly emergency management and disaster recovery drills as defined by agreed incident escalation and disaster recovery policies.
- Create, update and enhance written Standard Operating Procedures and documentation using guidelines focused on ease of understanding and completeness.