Senior AI Architect (Costume Language Model Development)
Location:
Jerusalem /hybrid
Our amazing client is an early-stage deep tech startup at the forefront of digital health, that is developing a groundbreaking Large Language Model for decoding 100% of the human genome, with the potential to analyze whole genome sequencing data in real-time.
We're seeking an experienced AI architect to spearhead the development of custom large-scale language models, focusing on advanced transformer architectures and efficient tokenization strategies.
Core Responsibilities:
Architect and implement custom transformer models from scratch, optimizing for scale and efficiency
Develop and refine tokenization pipelines, with a focus on BPE and its variants
Innovate on attention mechanisms and positional encoding techniques
Optimize input processing strategies, balancing between padding, truncation, and dynamic approaches
Design and implement custom loss functions and training regimes for large-scale language modeling
Technical Focus Areas:
Transformer Architecture: Innovate multi-head attention, feed-forward networks, and layer normalization techniques. Experience with sparse attention and efficient transformer variants is highly desirable.
Tokenization and BPE: Develop and optimize tokenization strategies, with a particular emphasis on Byte Pair Encoding and its extensions. Familiarity with subword tokenization algorithms and their impact on model performance is crucial.
Input Processing: Implement efficient strategies for handling variable-length inputs, including advanced padding and truncation techniques. Experience with dynamic batching and length-adaptive processing is a plus.
Scale and Efficiency: Optimize models for large-scale training and inference, with a focus on memory efficiency, computational performance, and distributed training strategies.
The ideal candidate will have a proven track record of developing novel language modeling architectures and a deep understanding of the theoretical foundations underlying modern NLP techniques. This role requires a balance between cutting-edge research and practical implementation, with a focus on pushing the boundaries of what's possible in custom language model development.
Key Requirements:
3-5 years of hands-on experience designing and implementing transformer-based architectures
-Deep understanding of tokenization techniques, including Byte Pair Encoding (BPE)
Expertise in optimizing model input processing, including padding and truncation strategies
Proficiency in Python and deep learning frameworks (PyTorch, TensorFlow, NumPy, Pandas)
Strong background in self-supervised learning and self-attention mechanisms