Computer Vision Engineering Intern

Engineering

Remote

Internship

About This Role

A focused internship working on the hardest unsolved problem in our product: reading complex architectural drawings. Today our site-plan review pipeline gets us partway there with brittle heuristics for finding regions of interest on a page. You will replace that with modern computer vision and vision-language models. You work directly with the CTO and the senior engineer who owns RAG. Research-flavored at the start (try models, find what works), production-flavored by the end (ship the chosen approach into the site-plan agent). The problem: architectural drawings break ordinary document AI. A single submitted "site plan" can be 50–100 sheets of high-resolution PDFs — site layouts, elevations, floor plans, utility plans, fire-egress diagrams. Each sheet is dense with symbols, dimensions, callouts, legends, and free-form annotations that vary by architect. Three concrete challenges: • Region of interest (ROI) detection — find the parts of a sheet that matter for a given question without reading every pixel. • Tile-based inference on high-resolution pages — a sheet at 600 dpi blows past any model's context. Split into tiles, run detection per tile, stitch results back into one coherent answer (with NMS across tile seams). • Architectural symbol recognition — doors, windows, fire-rated walls, parking stalls, plant symbols, North arrows. Connect each back to the rule in the zoning code or building code that governs it. Our zoning agent already reads zoning code. Our building agent already reads building code. They both stop short at "what's actually on the drawing?" — that's the gap you close.

About Conflation Labs

We build AI-powered software for city planning.
Our product turns stacks of PDFs — zoning ordinances, site plans, fire codes, building codes — into structured, queryable knowledge that helps planners and architects move faster.
Multi-tenant, live in production for cities including Asheville and Berkeley.
Small team, high ownership, close collaboration with the CTO and senior engineers.

Key Responsibilities

Survey and benchmark modern CV approaches on dense technical drawings — both classical detectors (DETR / YOLO / SAM) and vision-language models (LayoutLM, Donut, Pix2Struct, Florence-2, Qwen-VL, GPT-4V / Claude Sonnet vision class).
Build a tile-and-stitch pipeline for high-resolution architectural pages with proper boundary handling (NMS across tile seams).
Fine-tune a VLM (LoRA / adapter) on a curated set of architectural drawings — build the dataset, design the eval, ship the model.
Prototype an architectural-symbol recognizer with a small starter vocabulary; grow it iteratively.
Wire the structured CV output into our existing site-plan / zoning / building agents so they can ask grounded questions of the drawing.
Document failure modes — half the value of this internship is cataloging where current models break on real architect-supplied data.

In this role, you must have

Current undergrad / master's student in CS, EE, or a related field — or a recent grad.
Has fine-tuned a CV model end-to-end — dataset → training → eval. Can explain why train/test split design matters on domain-specific data (holding out by image leaks; hold out by document / architect).
Hands-on with vision-language models — running inference, prompting, and ideally LoRA / adapter fine-tuning. CLIP, SAM, LLaVA, Florence-2, Qwen-VL, or any closed VLM (GPT-4V, Claude Sonnet vision) count.
Comfort with PyTorch, OpenCV, HuggingFace transformers / peft / accelerate.
Comfort working with messy, large, real-world data — not just clean benchmarks.
Curiosity and written communication — willing to share what you tried and what didn't work.

You might thrive in this role if you have

Experience with annotation tools (Label Studio, CVAT, Roboflow) or weak / semi-supervised labeling pipelines.
Inference optimization — fp16 / int8 quantization, batching, model serving (Triton, TorchServe).
Exposure to CAD / DWG / architectural file formats.
Document AI experience — OCR, layout analysis, table extraction.

What We Offer

Hands-on work on a genuinely hard CV problem — not a benchmark, not a homework assignment.
Direct collaboration with the CTO and the senior engineer who owns our RAG systems.
Production deployment — your code ships into our site-plan agent, not a demo.
Possible full-time conversation for strong interns.

Ready to Apply?

Join our team and help transform municipal technology. We'd love to hear from you!

Apply for this PositionConflation Labs is an equal opportunity employer committed to diversity and inclusion.

Key Skills

PyTorch

OpenCV

HuggingFace Transformers

PEFT / LoRA

Vision-Language Models

Object Detection

Fine-tuning

Document AI

Python

FastAPI

PostgreSQL

AWS