- Full-Time
- Remote
Share:
- Job Openings
- Lead Data Scientist (NLP & LLM Focus)
- Lead Data Scientist (NLP & LLM Focus)
The Mission
We are solving the "Context Loss" problem in financial data. While others provide raw PDFs, FinancialReports provides semantic understanding. We need a Lead Data Scientist to perfect our PDF-to-Markdown engines and build the next generation of RAG-ready financial datasets.
The Role
You will lead our research into unstructured data extraction. Your primary focus will be enhancing the accuracy of our parsing algorithms—ensuring that complex tables in a German Annual Report are perfectly preserved for vectorization.
Key Responsibilities
- Algorithmic Extraction: Improve our proprietary models for detecting and parsing financial tables from unstructured PDFs.
- LLM Pipeline Optimization: Design pipelines that prepare our 10M+ filings for large-scale LLM training and RAG applications.
- Quality Assurance: Build automated benchmarks to verify data integrity across 30+ languages.
- Deep NLP Background: Experience with Transformers, OCR correction, and document layout analysis (DLA).
- Research to Production: You don't just write papers; you ship models that run at scale.
- Detail Obsessed: You understand that in financial data, 99% accuracy is a failure.
- Data Advantage: You will have access to one of the world's largest clean corpora of financial text.
- Equity & Impact: You are building the brain of the company. Compensation includes significant equity.
Share:
- Job Openings
- Lead Data Scientist (NLP & LLM Focus)
- Lead Data Scientist (NLP & LLM Focus)