To provide expert level Tier 3 infrastructure platform services for the most critical operations and to ensure reliable operation of the production environment. This includes platform leadership, troubleshooting, diagnosing and resolving the most complex hardware and/or software issues.
Duties and Responsibilities:
- Provide expert level Linux platform and operations services, including troubleshooting, diagnosing, and resolving the most complex hardware and/or software issues and outages elevated from the Support Center and Technical Services groups. Maintain and troubleshoot hardware and/or software systems (e.g., applications, proxy servers, and fire walls) on multiple platforms. Diagnose and troubleshoot availability interruptions and production issues.
- Assist in the planning and implementation of new technologies and major releases for the most critical enterprise-wide IT infrastructure projects.
- Coordinate with GTO build and engineering groups to identify systems of risk and communicate to appropriate groups.
- Mentor, coach and develop junior staff on processes and technologies. Troubleshoot and resolve the most complex issues escalated from staff. Assess IT risks, dependencies, and conflicts.
- Monitor and resolve automated system alerts and alarms that indicate system functions were not completed within established service levels. Recommend automated monitoring enhancements.
- Coordinate resolution activities with IT groups across systems and platforms (e.g., availability team), so they occur in sync. Communicate issue status.
- Troubleshoot, configure, and tune systems. Install software, patches, upgrades, applications, and/or hardware. Test and evaluate IT vendor products. Review IT vendor literature to identify product deficiencies, work arounds, and scheduled patches and updates.
- Write documentation, including policies and procedures. Create graphics, including IT problem notification flows. Administer the platform documentation repository ensuring freshness and completeness. Add, update, and close records within ServiceNow and manage timely update and resolution of issues. Review related records to identify requests that may have an impact on systems.
- Respond to platform inquiries and resolve the most complex issues. Review shift logs and turnover reports and suggest improvements to facilitate the transition to incoming teams and resources. Monitor ticket hygiene and ensure all incident related documentation is in good order.
- Administer system activities (e.g., internet availability). Identify performance issues and trends. Identify opportunities to improve system and application performance, including automating manual system tasks and shepherd the implementation of these efforts. Participate in change management and related meetings to identify systems issues. Present project initiatives and metrics. Create ad hoc reports of significant system trends, including problem occurrences and responses. Persuade and influence peers and management on the benefits of any improvement efforts. Present project initiatives and metrics.
- Thoroughly understand and comply with IT policies and procedures, especially those for quality and productivity standards that enable the team to meet established client service levels. Thoroughly understand and comply with Information Security policies and procedures.
- Provide strategic platform leadership to ensure sustainability and health of the platform. Identifies brittle or fragile infrastructure and works with the engineering organization to remediate by providing suggestions for the development and design of the infrastructure. Provides leadership to analysts in the organization and enables them to identify trends and opportunities to improve the operation. Serves as a consultant to SI partners ensuring that
- only the most robust solutions are introduced into the environment.
- Lead and participate in special projects and performs other related duties as assigned.
- Undergraduate degree in a related field or the equivalent combination of training and experience.
- Eight years or more of experience in technical specialty.
- Excellent written and oral communication skills, including presentation skills and negotiation skills.
- Excellent analytical and problem solving skills.
- Expert knowledge of the following operations practices and concepts: printing systems, system utilities, software installation and configuration, IT service level agreements, full product life cycle, networks, technical standards and deliverables, troubleshooting techniques, log files, network protocols (e.g., TCP IP, DLC, and ASYNC).
- Expert knowledge of Unix/Linux IT platforms: knowledge in other areas is a plus -- Client/server, IBM mainframe, Microsoft Windows NT, OS/390, DB2, z/OS operating system knowledge, CICS, MQ.
- Expert knowledge of one or more IT products: VMWare, MQ, AWS, CRS, Active Directory.
Operations are 24x7. Remote and off hours support may be required.
Vanguard is not offering visa sponsorship for this position.
Location/Region: Charlotte, NC