This position is posted by Jobgether on behalf of a partner company. We are currently looking for a AI Research Engineer - Pre training in Switzerland.
This role sits at the core of next-generation AI model development, focusing on advancing large-scale pre-training systems that power state-of-the-art intelligence. You will work on cutting-edge architectures spanning small, large, and multimodal models, directly influencing model capability, efficiency, and scalability. Operating in a highly research-driven and distributed engineering environment, you will help push the boundaries of what modern AI systems can achieve. The position combines deep scientific exploration with hands-on engineering on massive GPU clusters. You will design and optimize training pipelines that run across thousands of NVIDIA GPUs, ensuring performance at scale. This is an opportunity to contribute to foundational AI breakthroughs while collaborating with world-class researchers and engineers in a fast-paced, innovation-focused setting.
Accountabilities
In this role, you will lead and contribute to the development of large-scale pre-training systems and model architectures that enhance intelligence and efficiency. You will design experiments, build scalable training frameworks, and improve model performance through iterative research and engineering work.
The ideal candidate has deep expertise in AI research and large-scale model training, with strong technical foundations in machine learning, distributed systems, and deep learning frameworks. You should be comfortable working in highly complex, GPU-intensive environments and driving research from concept to implementation.
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
This role sits at the core of next-generation AI model development, focusing on advancing large-scale pre-training systems that power state-of-the-art intelligence. You will work on cutting-edge architectures spanning small, large, and multimodal models, directly influencing model capability, efficiency, and scalability. Operating in a highly research-driven and distributed engineering environment, you will help push the boundaries of what modern AI systems can achieve. The position combines deep scientific exploration with hands-on engineering on massive GPU clusters. You will design and optimize training pipelines that run across thousands of NVIDIA GPUs, ensuring performance at scale. This is an opportunity to contribute to foundational AI breakthroughs while collaborating with world-class researchers and engineers in a fast-paced, innovation-focused setting.
Accountabilities
In this role, you will lead and contribute to the development of large-scale pre-training systems and model architectures that enhance intelligence and efficiency. You will design experiments, build scalable training frameworks, and improve model performance through iterative research and engineering work.
- Conduct large-scale pre-training of AI models on distributed GPU clusters, ensuring scalability, stability, and performance
- Design, prototype, and optimize novel model architectures, including transformer and non-transformer approaches
- Run experiments, analyze results, and refine methodologies to improve training efficiency and model quality
- Identify and resolve bottlenecks in training systems, data pipelines, and model performance
- Improve distributed training infrastructure to support next-generation AI workloads
- Collaborate with researchers and engineers to translate experimental ideas into production-ready training systems
- Contribute to the evolution of high-performance AI training systems and frameworks
The ideal candidate has deep expertise in AI research and large-scale model training, with strong technical foundations in machine learning, distributed systems, and deep learning frameworks. You should be comfortable working in highly complex, GPU-intensive environments and driving research from concept to implementation.
- PhD or strong academic/research background in Computer Science, Machine Learning, NLP, or related fields (preferred)
- Hands-on experience with large-scale LLM pre-training on distributed GPU infrastructure (thousands of GPUs)
- Strong understanding of transformer architectures and advanced model design techniques
- Experience with distributed training frameworks and large-scale AI systems
- Proficiency in PyTorch and Hugging Face ecosystem for model development and training
- Strong skills in debugging, optimizing, and improving model and system performance
- Ability to design experiments, interpret results, and iterate on research hypotheses
- Strong collaboration and communication skills in research-driven environments
- Competitive compensation package aligned with AI research market standards
- Remote-friendly and globally distributed work environment
- Opportunity to work on frontier AI research at massive scale
- Access to high-performance computing infrastructure and large GPU clusters
- Collaborative environment with top-tier AI researchers and engineers
- High autonomy in research direction and experimentation
- Exposure to state-of-the-art AI systems and multimodal model development
- Professional growth in a fast-evolving, innovation-driven AI ecosystem
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
Why Apply Through Jobgether?
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.