AdaOcc: Adaptive 3D Occupancy Prediction for Embodied Tasks

Project video

AdaOcc overview video

Abstract

Embodied agents need accurate, flexible, and semantically meaningful 3D scene representations. AdaOcc addresses this need with a point-based adaptive occupancy prediction framework that represents occupied regions as sparse semantic points. Its prediction budget can be adjusted through query numbers and decoder layers, enabling the same model to trade perception quality for latency when task demands or compute budgets change.

AdaOcc combines progressive query learning, geometry-guided query initialization, compact multi-plane 3D encoding, and containment-guided point optimization. Together, these components support heterogeneous geometric cues such as depth-derived points or LiDAR scans while reducing surface-floating artifacts and improving fine-grained spatial consistency.

Method overview

Adaptive sparse occupancy perception

1

Sparse semantic points

Occupied space is modeled with point queries instead of a dense voxel-only representation, making output granularity easier to adapt and reducing redundant computation in empty space.

2

Progressive query learning

The model is trained to produce useful predictions across query budgets and intermediate decoder depths, allowing inference-time quality/latency trade-offs without retraining.

3

Geometry-aware encoding

Visual features are paired with geometric cues from depth or LiDAR through geometry-guided query initialization and a compact multi-plane 3D encoder for stronger spatial grounding.

4

Containment optimization

A containment-guided objective encourages predicted points to lie in spatially valid occupied regions, mitigating surface-floating points and improving boundary consistency.

Highlights

Key contributions

Adaptive embodied occupancy

A deployment-oriented 3D occupancy framework that adapts to variable compute budgets, sensing configurations, and observation views.

Geometry- and optimization-aware design

Progressive query learning, geometry-guided initialization, compact 3D encoding, and containment loss make point-based occupancy prediction more robust for embodied scenes.

Benchmark and real-world evidence

Experiments report state-of-the-art Occ-ScanNet performance and demonstrate practical potential as an adaptive 3D perception module for downstream embodied tasks.

Reported results

Accuracy, efficiency, and adaptability

65.29 IoU on Occ-ScanNet

59.67 mIoU on Occ-ScanNet

+2.46 IoU over the previous best result

+7.84 mIoU over the previous best result

Budget-adaptive inference

Query-number and decoder-layer controls let the model produce coarser or finer occupancy outputs depending on latency and memory constraints.

Heterogeneous geometric inputs

The same framework can incorporate depth-derived 3D points or LiDAR scans, supporting different embodied platform configurations.

Image overview

Visual summary of AdaOcc

A compact visual tour of the method, qualitative results, adaptability studies, and demo examples.

AdaOcc teaser overview — Teaser overview: adaptive 3D occupancy prediction for embodied tasks.

AdaOcc model pipeline — Pipeline: visual and geometric cues are fused into adaptive sparse occupancy predictions.

Containment-guided point optimization illustration — Containment-guided point optimization reduces surface-floating predictions.

Qualitative occupancy comparison across scene geometry and semantics.

TartanGround adaptability results — Adaptability: robust occupancy outputs under different embodied sensing conditions.

Open-vocabulary scene understanding visualization — Open-vocabulary visualization for semantic scene understanding.

A-star planning example using occupancy — Downstream planning example using occupancy as a spatial representation.

Real-scene demo snapshot for embodied occupancy perception.

Media placeholders

Add review-safe assets later

These slots intentionally contain no images or videos yet. Replace them after preparing anonymous-safe figures, qualitative comparisons, and demos.

Method figure

Suggested file: method-overview.png

Qualitative results

Suggested file: qualitative-results.png

Adaptability study

Suggested file: adaptability-grid.png

Demo video

Suggested file: demo-video.mp4

Release plan

This anonymous preview is prepared for GitHub Pages publication while withholding identity-bearing resources. After the anonymous period, the disabled resource pills can be replaced with public links, final media, and complete citation information.

Review the page for accidental identity leaks before publishing.
Add only anonymous-safe media while the review period is active.
After review, replace withheld resource pills with public paper, code, and citation links.