We’re a stealth robotics startup in Palo Alto hiring an engineer to define and ship a canonical Tactile Tensor and the reference SDK + conformance suite that makes tactile data reproducible, interoperable, and directly usable for robotics perception and foundation-model training.
Critical requirement: deterministic, byte-stable serialization + strict versioning, plus tokenization-ready interfaces (tensors → stable token streams) for Transformer-style robotics pipelines—without heavy dependencies.
What you’ll do
- Define the Tactile Tensor: units, coordinate frames, timestamps, shapes, uncertainty, required metadata, and forward/backward compatibility rules.
- Build a lightweight reference SDK (Python and/or C++) that validates, serializes/deserializes, and produces identical outputs across platforms.
- Specify training-grade data contracts: deterministic windowing/patching, normalization/quantization, and token schemas that are stable across sensors and logging setups.
- Ship a public-facing spec + examples + CI conformance tests so external robotics labs/OEMs can implement against it with confidence.
- Architect the tensor representation to ensure physical invariances (e.g., coordinate-frame independence, scale-invariant contact patches) so that policies trained on one robot's geometry generalize to another.
Requirements
- PhD in a relevant field (Robotics, Computer Science, Applied Mathematics, Electrical Engineering, or similar), or 3+ years of equivalent industry experience.
- Excellent software engineering fundamentals (API design, packaging, CI, testing, docs).
- Python and/or C++ proficiency (both ideal).
- Proven ability to design deterministic serialization and conformance tests (identical inputs → identical bytes across platforms).
- Experience with high-rate numeric data formats (Arrow/Parquet/Zarr/Protobuf/FlatBuffers or similar).
- Ability to design metadata + lineage for robotics datasets (device ID, calibration artifact ID, robot/config versions, provenance).
- Familiarity with ML data pipelines; ability to define tokenization/embedding conventions for transformer training without bundling full ML stacks.
- Experience designing data schemas that explicitly handle and flag physical sensor artifacts (saturation, dropout, thermal drift, and variable sampling rates) without crashing downstream model inference.
Preferred
- Experience authoring standards/specs, file formats, or widely-used SDKs.
- HPC/embedded/performance background; strong “minimal dependency” philosophy.
- Experience with data integrity/attestation (hashing/signing, provenance chains) for tamper-evident robotics logs.
Key Deliverables
- PDF Spec: Tactile Tensor schema, metadata/lineage rules, determinism + versioning/migration, conformance criteria.
- Reference SDK: lightweight schema objects, validators, deterministic serializer/deserializer, minimal dependencies.
- Dataset Container Spec: reproducible storage + examples (streaming + offline parity; robotics log friendly).
- ML Interfaces: modular tokenization hooks + reference tokenization recipes (windowing/patching + quantization conventions).
- CI Suite: golden files, byte-stability, backward/forward compatibility tests, reference implementations.
Contract-to-hire with a clear path to full-time and founding equity for the right fit.