ACFR Papers at ICRA 2026 - Robotics Research at Sydney University

Check out our latest work! ACFR researchers will be presenting the following papers and presentations at the flagship robotics conference IEEE International Conference on Robotics and Automation.

Accepted contributed papers:

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations.

X. Dong, M. Johnson-Roberson, W. Zhi.

Learns an object-centric motion distribution from video demos (via flow matching on SE(3)) and turns it into executable robot motion. It jointly optimises grasp selection and the full object trajectory under kinematic feasibility and collision constraints.

Link: https://arxiv.org/abs/2509.20703

Cross-Modal Instructions for Robot Motion Generation.

W. Baron, X. Dong, M. Johnson-Roberson, W. Zhi.

Generates robot motions from weak, cross-modal “instruction examples” (e.g., text plus rough visual cues) rather than teleoperation. A VLM-driven pipeline converts these cues into multi-view 2D guidance, fuses them into 3D trajectories, and can bootstrap downstream policy learning.

Link: https://arxiv.org/abs/2509.21107

Bi-Manual Joint Camera Calibration and Scene Representation.

H. Tang, T. Zhang, M. Johnson-Roberson, W. Zhi.

A marker-free method that jointly calibrates bimanual camera–end-effector extrinsics while reconstructing a shared 3D scene representation. It uses learned correspondences to produce scale-consistent geometry usable for bimanual interaction and planning.

Link: https://arxiv.org/abs/2505.24819

Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation.

W. T. Chu, T. Zhang, M. Johnson-Roberson, W. Zhi.

Reconstructs a neural implicit surface (e.g., SDF) from a single image with fast, lightweight optimisation. The resulting geometry is accurate enough for motion generation tasks like contact-aware motion or surface following without multi-view capture.

Link: https://arxiv.org/abs/2509.20681

Infinite Leagues Under the Sea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models.

T. Zhang, W. Zhi, J. Mangelson, M. Johnson-Roberson.

Trains a diffusion model to synthesise photorealistic underwater RGBD terrain at scale using fractal-structured latent conditioning. The generated data can be fused into large 3D maps and used to supervise photorealistic 3D scene renderers for simulation and perception.

Link: https://arxiv.org/abs/2503.06784

Multivariate Active Learning and Adaptive Sampling With Multi-Kernel Gaussian Processes.

Thien Hoang Nguyen, Nathan Wallace, Nicholas Harrison, Salah Sukkarieh.

The paper presents a multivariate active transfer learning and intelligent adaptive sampling framework. It enables robotic systems to simultaneously learn accurate models for multiple quantities of interest, along with their correlations, in real time. By leveraging multi-kernel Gaussian processes, the system selects the most informative next best sampling locations, advancing efficient environmental mapping and precision robotics.

Link: https://ieeexplore.ieee.org/document/11003577

DynoSAM: Open-Source Smoothing and Mapping Framework for Dynamic SLAM. (Published at T-RO)

Jesse Morris, Yiduo Wang, Mikolaj Kliniewski, Viorela Ila.

In this paper, we present DynoSAM, an open-source framework for dynamic objects SLAM that enables the efficient implementation, testing, and comparison of various dynamic SLAM optimization formulations. We further propose a novel formulation that encodes rigid-body motion model in object pose estimation as well as an error metric agnostic to object frame definition.

Link: https://ieeexplore.ieee.org/document/11288097

EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments.

Raghav Mishra, Ian R. Manchester.

The paper introduces a new approach to sampling-based diffusion trajectory planning that helps robots find better motion plans in tightly constrained environments by using barrier functions to guide solutions. This makes motion planning for autonomous systems operating in complex real world settings more performant while staying efficient.

Link: http://arxiv.org/abs/2510.07700

DOSE3: Diffusion-Based Unified Out-of-Distribution Detection on SE(3) Trajectories. (Transferred from RA-L)

H. Cheng, T. Zheng, Z. Ma, T. Zhang, M. Johnson-Roberson, W. Zhi.

Performs OOD detection directly on SE(3) pose trajectories using a diffusion-based generative model. It outputs an anomaly score for out-of-distribution motions and shows strong results across multiple trajectory datasets.

Link: https://ieeexplore.ieee.org/abstract/document/11278083

Robust Bayesian Scene Reconstruction With Retrieval-Augmented Priors for Precise Grasping and Planning. (Transferred from RA-L)

H. Wright, W. Zhi, M. Matek, M. Johnson-Roberson, T. Hermans.

Builds a Bayesian posterior over scene geometry from sparse/noisy RGBD, explicitly modelling uncertainty under occlusion. Retrieval-augmented shape priors improve reconstruction fidelity, leading to more reliable grasping and planning.

Link: https://ieeexplore.ieee.org/abstract/document/11242019

Online Dynamic SLAM With Incremental Smoothing and Mapping. (Published at RA-L)

Jesse Morris, Yiduo Wang, Viorela Ila.

Dynamic environments evolve constantly; hence we present a novel factor-graph formulation and system architecture for Dynamic SLAM that inherently supports incremental optimisation and online estimation. We show that our formulation leads to problem structure well-suited to incremental solvers, and our system architecture further enhances performance, achieving a 5× speed-up over existing methods.

Light Field Based 6DoF Tracking of Previously Unobserved Objects

Nikolai Goncharov, James L. Gray, Donald G. Dansereau

In this work, we introduce an object tracking method based on light field images that does not depend on a pre-trained model, while being robust to complex visual behavior, such as reflections. We extract semantic and geometric features from light field inputs using vision foundation models and convert them into view-dependent Gaussian splats. These splats serve as a unified object representation, supporting differentiable rendering and pose optimization.

Link: https://nagonch.github.io/LiFT-6DoF/

WildCross: A Cross-Modal Large Scale Benchmark for Place Recognition and Metric Depth Estimation in Natural Environments

Joshua Knights, Joseph Reid, Kaushik Roy, David Hall, Mark Cox, Peyman Moghadam

In this paper we present WildCross, a large-scale benchmark for cross-modal place recognition and metric depth estimation in natural environments. The dataset comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF poses and synchronized dense lidar submaps. We conduct comprehensive experiments on visual, lidar, and cross-modal place recognition, as well as metric depth estimation, demonstrating the value of WildCross as a challenging benchmark for multi-modal robotic perception tasks.

Link: https://arxiv.org/pdf/2603.01475

Workshops:

Workshop on Open Challenges in Robotics for Asset Inspection and Management

Organisers: Yiduo Wang, Marcus Hoerger, Viorela Ila, Ian Manchester, Donald Dansereau, Rahul Shome, Thierry Peynot, Dimity Miller, Maurice Fallon, Hanna Kurniawati.

This workshop brings together robotic research advancements with industry-driven demands, addressing the gap between challenges in the real-world inspection and asset management applications and cutting-edge research in multi-modal robotic technologies designed to address them. Building on the success of our inaugural ICRA 2025 workshop, Open Challenges in Robotics for Asset Inspection and Management, this second installment continues the conversation on how to transition advanced robotics from research into reliable industrial deployments, which still faces multitudes of challenges.

Link: https://ariamhub.com/ocraim/