Sparse semantic points
Occupied space is modeled with point queries instead of a dense voxel-only representation, making output granularity easier to adapt and reducing redundant computation in empty space.
Anonymous project page preview
A point-based adaptive 3D semantic occupancy framework for embodied scene understanding, designed for flexible compute budgets, heterogeneous geometric inputs, and deployment-oriented perception.
Embodied agents need accurate, flexible, and semantically meaningful 3D scene representations. AdaOcc addresses this need with a point-based adaptive occupancy prediction framework that represents occupied regions as sparse semantic points. Its prediction budget can be adjusted through query numbers and decoder layers, enabling the same model to trade perception quality for latency when task demands or compute budgets change.
AdaOcc combines progressive query learning, geometry-guided query initialization, compact multi-plane 3D encoding, and containment-guided point optimization. Together, these components support heterogeneous geometric cues such as depth-derived points or LiDAR scans while reducing surface-floating artifacts and improving fine-grained spatial consistency.
Method overview
Occupied space is modeled with point queries instead of a dense voxel-only representation, making output granularity easier to adapt and reducing redundant computation in empty space.
The model is trained to produce useful predictions across query budgets and intermediate decoder depths, allowing inference-time quality/latency trade-offs without retraining.
Visual features are paired with geometric cues from depth or LiDAR through geometry-guided query initialization and a compact multi-plane 3D encoder for stronger spatial grounding.
A containment-guided objective encourages predicted points to lie in spatially valid occupied regions, mitigating surface-floating points and improving boundary consistency.
Highlights
A deployment-oriented 3D occupancy framework that adapts to variable compute budgets, sensing configurations, and observation views.
Progressive query learning, geometry-guided initialization, compact 3D encoding, and containment loss make point-based occupancy prediction more robust for embodied scenes.
Experiments report state-of-the-art Occ-ScanNet performance and demonstrate practical potential as an adaptive 3D perception module for downstream embodied tasks.
Reported results
Query-number and decoder-layer controls let the model produce coarser or finer occupancy outputs depending on latency and memory constraints.
The same framework can incorporate depth-derived 3D points or LiDAR scans, supporting different embodied platform configurations.
Image overview
A compact visual tour of the method, qualitative results, adaptability studies, and demo examples.
Media placeholders
These slots intentionally contain no images or videos yet. Replace them after preparing anonymous-safe figures, qualitative comparisons, and demos.
Method figure
Suggested file: method-overview.png
Qualitative results
Suggested file: qualitative-results.png
Adaptability study
Suggested file: adaptability-grid.png
Demo video
Suggested file: demo-video.mp4
This anonymous preview is prepared for GitHub Pages publication while withholding identity-bearing resources. After the anonymous period, the disabled resource pills can be replaced with public links, final media, and complete citation information.