Remotehey

Work anywhere, Live anywhere

MoniSa Enterprise - remotehey
MoniSa Enterprise

Evaluation & Rewriting Project

sweden / Posted
APPLY

We are currently expanding our AI Training & Evaluation Team and are looking for Domain Experts, Linguists, and Multilingual QA Evaluators for exciting AI Rewriting, Evaluation, and Benchmarking projects.

Below are the 2 scopes currently open:

🔹 Scope 1: Domain Experts + Linguists (AI Evaluation & Training)

Objective:

Support LLM evaluation, training, and quality improvement across domain-specific and multilingual use cases.

Roles:

• Domain Experts

• Linguists / AI Evaluators

• Coders (for program-based evaluation tasks)

Domain Expertise Required:

• Finance (Asset Management)

• Marketing

• Retail

• Insurance

Key Responsibilities:

• Evaluate AI-generated content for accuracy, completeness, and domain relevance

• Perform linguistic QA (grammar, tone, fluency, terminology consistency)

• Identify cultural nuances, bias, or sensitive content

• Execute prompt-response evaluations and provide structured feedback

• (For coders) evaluate logic and code quality

Languages:

English, Arabic, German, Spanish, Hindi, Japanese, Portuguese, French, Italian, Dutch, Mandarin, Malay

Ideal Profile:

• Masters / PhD OR 5–10 years of domain experience

• Experience in Translation, Localization, QA, or AI evaluation

• Native or near-native fluency in the target language

• Ability to follow structured guidelines 


Please complete the form below and share your details:

https://docs.google.com/forms/d/e/1FAIpQLSejFQX-s3h9KZhwua88ZjC-YV3T0nuvDs2Ewc-cUICx-Ed4KA/viewform


Scope 2: Multilingual Quality Vetting & AI Benchmarking

Objective:

Ensure high-quality, safe, and factually accurate AI outputs across multiple languages.

Role:

Multilingual QA Evaluator

Key Responsibilities:

• Perform factual validation of AI-generated responses

• Evaluate prompt-response alignment

• Detect hallucinations, bias, or unsafe content

• Compare and benchmark outputs across AI tools (ChatGPT, Perplexity AI, Microsoft Copilot)

• Rank outputs based on accuracy, completeness, safety, and cultural relevance

Languages:

German, Italian, French, Dutch, Arabic, Spanish, Swedish, Danish, Thai, Vietnamese, Hebrew, Polish

Ideal Profile:

• Linguists, researchers, or QA professionals

• Strong analytical and research skills

• Experience in content validation, AI evaluation, or quality assurance

• Native-level language proficiency 


Please complete the form below and share your details:

https://docs.google.com/forms/d/e/1FAIpQLSfWi8cwMv4HoQXEdZpV6J9hELXRbTbKUvQi958UkzzKdOKyUg/viewform?usp=publish-editor