PhD Fellowship (LLM Architecture Optimization) (f/m/d)
Aleph Alpha
Overview:
Aleph Alpha Research’s mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and next-generation methods that make it easy and affordable for Aleph Alpha’s customers to increase productivity in development, engineering, logistics, and manufacturing processes.
We are looking to grow our academic partnership “Lab1141” with TU Darmstadt and our GenAI group of PhD students supervised by Prof. Dr. Kersting. We are looking for an enthusiastic researcher at heart, passionate to improve foundational, multi-modal NLP models, and aiming to obtain a PhD degree in a three-year program. On average you will spend half of your time at Aleph Alpha Research in Heidelberg, and the other half at the Technical University of Darmstadt, which is close-by to travel.
As a PhD fellow in Aleph Alpha Research, you develop new approaches to improve the foundational model architecture and applications. You are given a unique research environment with sufficient amount of compute and both industrial and academic professional supervisors to conduct and publish your research. Ideally you formulate your dream research topic in your application letter that is aligned to fit one of our research teams.
While at Aleph Alpha Research, for the LLM Architecture topic, you will be working with our Foundational Models team, in which you create powerful state-of-the art, multi-modal, foundational models, research and share novel approaches to pre-training, fine-tuning, and helpfulness, and enable cost-efficient inference on a variety of accelerators.
Topic :
Introduction
Foundation models are central to many of the most innovative applications in deep learning, predominantly utilize self-supervised learning, autoregressive generation, and transformer architecture. However, the learning paradigm and architecture come with several challenges. To address these limitations and improve both accuracy and efficiency in generation and downstream tasks, it is essential to consider adjustments to its core paradigms. These include the sourcing and composition of training data, design choices of the training itself and the underlying model architecture. Further, extensions of the system, such as Retrieval-Augmented Generation (RAG), and changes to foundational components like tokenizers should be considered.
Related Work
The training data of LLMs is at the core of a model’s downstream capabilities. Consequently, recent works focus on extracting high-quality data from large corpora (LLama-3, Olmo-1.7). Additionally, the order and structure in which the data is presented to the model have a large influence on model performance, as demonstrated by curriculum learning approaches (Olmo-1.7, Ormazabal et al., Mukherjee et al) and more sophisticated data packing algorithms (Staniszewski et al., Shi et al).
Similarly, adjustments to the training procedures itself have shown promising results. For example, Ibrahim et al. discuss “infinite” learning rate schedules that allow for more flexibility in adjusting training steps and facilitate continual-pretraining tasks more easily.
Further, the LLM architecture and its components leave room for improvement. Ainslie et al. introduce grouped-query attention (GQA) which increases the efficiency of the transformer’s attention component. Liu et al make changes to the rotary position embeddings to improve long-context understanding.
Recently, structured state-space sequence models (SSMs) (Gu et al., Poli et al.) and hybrid architectures have emerged as promising class of architectures for sequence modeling.
Lastly, the model itself can be embedded in a larger system such as RAG. For example, in-context learning via RAG enhances the generation’s accuracy and credibility (Gao et al.), particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information.
Goals
This project aims to explore novel LLM-system architectures, data, and training paradigms that could either replace or augment traditional autoregressive generation and transformer components, as well as enhance auxiliary elements such as retrievers and tokenizers.
Your responsibilities:
Research and development of novel approaches and algorithms that improve training, inference, interpretations or applications of foundational models
Analysis and benchmarking of state-of-the art as well as new approaches
Collaborating with scientists and engineers at Aleph Alpha and Aleph Alpha Research, plus chosen external industrial and academic partners
In particular, fruitful interactions with our group of GenAI PhD students and fostering exchange between Aleph Alpha Research and your university
Publishing own and collaborative work on machine learning venues, and making code and models source-available for use by the broader research community
Your profile:
Masters Degree in Computer Science, Mathematics or similar
Solid understanding of DL/ML techniques, algorithms, and tools, for training and inference
Experience and knowledge of Python and at least one common deep-learning framework, preferably PyTorch
Ready to relocate to region Heidelberg/ Darmstadt, Germany
Interest to bridge the gap between addressing practical industry challenges and contributing to academic research
Ambition to obtain a PhD in generative machine learning, in a three-year program
What you can expect from us:
Become part of an AI revolution, contribute to Aleph Alpha’s mission to provide technological sovereignty
Join a dynamic startup and a rapidly growing team
Work with international industry and academic experts
Share parts of your work via publications and source-available code
Take on responsibility and shape our company and technology
Flexible working hours and attractive compensation package
-
An inspiring working environment with short lines of communication, horizontal organization, and great team spirit