hero

Companies you'll love to work for.

Our portfolio companies are always looking for great people. Apply to the opportunities below.
39
companies
505
Jobs

AI Inference Engineer - Large Language Models (f/m/d)

Aleph Alpha

Aleph Alpha

Software Engineering, Data Science
Heidelberg, Germany
Posted on Thursday, July 4, 2024

Overview:

You will join our product team in a position that sits at the intersection of artificial intelligence research and real-world solutions. We foster a highly collaborative work culture where you can expect to work closely with your teammates and have a high level of communication between teams through methodologies such as pair or mob programming.

Your responsibilities:

  • Model Inference: Focus on inference optimization to ensure rapid response times and efficient resource utilization during real-time model interactions.

  • Hardware Optimization: Run models on various hardware platforms, from high-performance GPUs to edge devices, ensuring optimal compatibility and performance.

  • Experimentation and Testing: Regularly run experiments, analyze outcomes, and refine the strategies to achieve peak performance in varying deployment scenarios.

  • Staying up to date with the current literature on MLSys

Your profile:

  • You care about making something people want. You want to ship something that will bring value to our users. You want to deliver AI solutions end-to-end and not finish building a prototype.

  • Bachelor's degree or higher in computer science or a related field.

  • You understand how multimodal transformers work.

  • You understand the characteristics of LLM inference (KV caching, flash attention, and model parallelization).

  • You have experience in system design and optimization, particularly within AI or deep learning contexts.

  • You are proficient in Python and have deep understanding of deep learning frameworks such as PyTorch.

  • A deep understanding of the challenges associated with scaling AI models for large user bases.

Nice if you have:

  • Previous experience in a high-growth tech environment or a role focused on scaling AI solutions.

  • Hands-on experience with large language models or other complex AI architectures.

  • Expertise with CUDA and Triton programming and GPU optimization for neural network inference.

  • Experience with Rust.

  • Experience in adapting AI models to suit a range of hardware, including different accelerators.

  • Experience in model quantization, pruning, and other neural network optimization methodologies.

  • A track record of contributions to open-source projects (please provide links).

  • Some Twitter presence discussing ML Sys topics.

What you can expect from us:

  • Become part of an AI revolution

  • 30 Days of paid vacation

  • Flexible working hours

  • Join a dynamic start-up and a rapidly growing team

  • Work with international industry and science experts

  • Take on responsibility and shape our company and technology

  • Regular team events