Prompt executive

About the Role:

We are seeking dedicated Prompt Evaluation Analysts to join our team for a project focused on evaluating responses from large language models (LLMs). The core objective of this project is to assess the quality of responses generated by two different LLM models to the same prompt. You will be responsible for evaluating these responses across multiple dimensions, providing insightful ratings, and determining which response is superior based on a set of predefined criteria.

Key Responsibilities:

Prompt Evaluation: You will be presented with a single prompt and two distinct responses generated by two different LLM models.

Dimensional Rating: Your task is to rate each response on the following eight key dimensions:

Harmlessness/Safety: Assess whether the response is free from harmful, offensive, or unsafe content.
Writing Style: Evaluate the clarity, fluency, and tone of the response, considering factors like readability and engagement.
Verbosity: Rate the response based on its conciseness or excessiveness in content. Does it contain unnecessary details or is it succinct and to the point?
Instruction Following: Determine how well the response adheres to the instructions provided in the prompt.
Truthfulness: Assess the accuracy and reliability of the information presented in the response.
Core Content Quality: Evaluate the relevance, depth, and utility of the core content presented in the response.
Response Structure: Rate the organization of the response, including coherence, logical flow, and ease of understanding.
Overall Quality: Provide a holistic assessment of each response based on all the above dimensions.
Comparison: After evaluating both responses across the eight dimensions, you will determine which of the two responses is overall superior based on your assessments.

Qualifications:

Must be having Master’s degree or Ph.D.
Strong analytical skills and attention to detail.
Familiarity with AI and machine learning concepts is a plus but not required.
Excellent written communication skills, including the ability to articulate clear and concise evaluations.
Experience with content analysis, writing, or reviewing is beneficial.
Critical thinking skills and the ability to make nuanced judgments based on multiple evaluation criteria.

Preferred Skills:

Experience working with AI models, specifically large language models (LLMs) such as GPT or similar.
Understanding of ethical AI guidelines, particularly related to harmful content and bias in AI-generated responses.
Background in linguistics, data science, or related fields is advantageous.

Languages- Danish, Swedish, Polish, Dutch, Malay, Norwegian

Process -

Assessment - 150 mins if cleared then onboarding

Remotehey

Work anywhere, Live anywhere