Strength Through Diversity
Ground breaking science. Advancing medicine. Healing made personal.
Responsible for the architecture, design and deployment of High Performance Computing clusters, including the management/maintenance of data and workflow systems for researchers and scientists. Participates in the integration of HPC resources with laboratory equipment (e.g. genomic sequencers), clinical and research data resources and systems. Resolves technical issues. Must be a team player and customer focused.
- Architects, designs and deploys ISMMS HPC clusters, of ~20,000 cores with high bandwidth, low latency interconnects, GPUs, large shared memory nodes, databases, scientific workflows and several tens of petabytes of storage in production.
- Maintains, tunes and manages computational, data, cloud technologies and workflow systems for ISMMS researchers, scientists and their external collaborators. Defines and deploys a comprehensive computational and data vision. Identifies and communicates system advantages/disadvantages and tradeoffs.
- Researches,
develops and deploys capabilities and solutions.
Troubleshoots isolates and resolves application, system and other technical problems (hardware, software, and network). Actively monitors the systems. - Designs and develops scripts for system administration, monitoring and usage reporting.
- Researches, deploys and optimizes resource management and scheduling software and policies. Develops and implements storage policies.
- Participates in the integration of HPC resources with laboratory equipment such as sequencers, etc. Incorporate and link data and compute resources.
- Designs, tunes, manages and upgrades parallel file systems, storage and data-oriented resources.
- Researches, deploys and manages security infrastructure, including development of policies and procedures.
- Writes, publishes and presents papers at national and international conferences.
- Assists in developing and writing proposals. Creates and provides clear documentation on all items.
- Works well on a team and across teams to IT and with researchers. Clear communicator with all.
Requirements:
- Bachelor’s degree or equivalent in Computer Engineering, Science or a related discipline; Master’s Degree or equivalent in a domain science
- 6
years, preferably in a Redhat/CentOS Linux administration
Batch HPC cluster environment - Experience with GPFS/Spectrum Scale parallel file systems and storage
- Configuration management
systems such as xCAT, Puppet and/or Ansible
Infiniband and Ethernet required - Script and programming
- Database and web experience
- Compliance, HIPAA, GDPR, FISMA experience
- TSM or HPSS and tape archival storage systems
- Singularity and/or docker containers
- Academic and/or healthcare research setting
- Nagios
- Cloud technologies