/HPC cluster System Administrator

HPC cluster System Administrator

Computer and IT - Leuven | More than two weeks ago

Join our team to enable future HPC systems research.

HPC cluster System Administrator 

What you will do

Can your unix system administration skills support out-of-the-box thinkers that are designing a high-performance computing system, from the programming model and language over the runtime system all the way down to the hardware architecture and the process technology? This is the context of imec’s new Compute System Architecture (CSA) group. 
To support this advanced hardware/software codesign, CSA uses a high-performance computing cluster for designing and simulating new hardware and for running benchmarks and various other system simulations and experiments. CSA’s current HPC/AI cluster has multiple racks with the latest CPU and GPU servers and a dedicated distributed storage cluster, all connected via an Infiniband network.
We are seeking an experienced Unix System Administrator who is eager to use and to grow his or her technological skills on a worldwide stage. In this role, you build, maintain and manage the HPC/AI cluster, work out solutions to support the research, provide training, and assist in the development of an overall research IT strategy. From early on, you help empower our breakthrough innovations. You can expect to be given challenging assignments, following-up and researching state-of-the-art cluster management techniques, and taking ownership and responsibility for the HPC/AI cluster. 
Objectives of this Role
  •  Linux HPC sysadmin
  •  Hardware maintenance of HPC system 
  • Maintain complex setups involving Infiniband, Lustre, and various hardware accelerators in servers and workstations (GPU, FPGA, …).
  • Monitor datacenter health using preexisting management tools and respond to hardware issues as they arise; help build, test, and maintain new systems as needed.
  • Help install and maintain servers and components.
  • Software maintenance of HPC system
  • Installing and maintaining complex and optimized software stacks across a variety of machines.
  • Installing and maintaining scientific software packages through configurable module systems (eg EasyBuild).
  • Installing, maintaining and supporting the creation and running of containers and virtualization systems (eg Docker, Singularity)
  • Installing and maintaining job scheduling system(s) for varied workloads (eg Slurm).
  • Perform server administration tasks, including user/group administration, security permissions, group policies, print services, research event log warnings and errors, and resource monitoring, ensuring system architecture components work together seamlessly.
  • Closely interact with imec’s central ICT management. 
  • Perform routine/scheduled audits of the systems, including all backups.
  • Proactively follow-up and experiment with the newest trends in high-performance computing systems (tools, hardware and software, …) to continuously support and improve the research. 

What we do for you

We offer you the opportunity to join one of the world’s premier research centers in nanotechnology and digital technologies at its headquarters in a rapidly growing, multidisciplinary team in Leuven, Belgium. With your talent, passion and expertise, you’ll become part of a team that makes the impossible possible. Together, we shape the technology that will define the society of tomorrow. We are committed to being an inclusive employer (http://www.imec-int.com/en/careers#diversity) and proud of our open, multicultural, and informal working environment with ample possibilities to take initiative and show responsibility. In everything we do, your future colleagues are guided by the imec values of passion, excellence, connectedness and integrity. We commit to supporting and guiding you in this process; not only with words but also with tangible actions. Through imec.academy, 'our corporate university', we actively invest in your development to further your technical and personal growth. We are aware that your valuable contribution makes imec a top player in its field. Your energy and commitment are therefore appreciated by means of a market appropriate salary with many fringe benefits. 

Who you are

  • You are an expert in Linux system administration of compute clusters
  • You preferably possess a Bachelor/Master/PhD degree in Computer Science, Physics or Engineering with 2-5 years of experience in related areas.
  • Proven work experience in IT.
  • Experience with programming languages (Python) and operating systems (Linux Ubuntu, Red Hat); current equipment and technologies, system performance-monitoring tools, containers, virtualization.
  • Expertise in creating, analyzing, and supporting large-scale distributed systems.
  • Practical experience with Lustre, EasyBuild, Nvidia compute software stacks, or MPI are considered a plus.
  • Passion for following up on the state-of-the-art in the field of HPC systems and experimenting with new and upcoming technologies to improve efficiency at all levels.