OCT Imaging for Pose Estimation and Feedback Control of an
Articulated Magnetic Surgical Tool
Erik Fredin1* , Nirmal Pol2* , Anton Zaliznyi3 , Dmytro Fishman3 , Eric Diller1,5 and Lueder A. Kahrs2,4,5
1 Department of Mechanical and Industrial Engineering, University of Toronto, Canada
2 Institute of Biomedical Engineering, University of Toronto, Canada
3 Institute of Computer Science, University of Tartu, Estonia
4 Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Canada
5 Robotics Institute, University of Toronto, Canada
* Authors contributed equally to this work.
Abstract
Magnetically-driven surgical tools are a new class of millimetre-scale devices that could enable procedures such as minimally invasive neurosurgery due to their high dexterity at a small size. However, safe and effective control of these magnetic tools necessitates real-time observation of tool joint angles, which is challenging inside a surgical environment. Optical coherence tomography (OCT) is an emerging volumetric imaging technique offering 3D visualization of tissue and tools simultaneously, which we explore for joint angle estimation. While some previous studies have used OCT for estimating the pose of rigid instruments, those methods are specific to needle-like tools, and often have slow processing speed. In this work, we benchmark eight deep- learning models adapted from other 3D modalities to OCT data showing magnetic tools in a mock surgical environment. The models are tested in the presence of other objects, occlusion, noise, and the tool being partially outside of the OCT’s field of view. The best performing model, VoxelNeXt, is adapted from 3D object detection in LiDAR scans, the first time a model of this kind is used on medical data. It infers tool pose with 0.6 mm position and 5° angular errors, with 40 ms inference time. We use this model to provide feedback for controlling a multi-jointed magnetic tool, demonstrating the robustness of OCT-based feedback control.
Key Contributions
- Comprehensive Model Evaluation: We evaluate eight diverse model architectures for markerless keypoint detection without needing physical modification, uncovering that a sparse CNN — leveraging techniques from LiDAR and self-driving community — yields substantially better accuracy than dense CNNs.
- High-Speed Volumetric OCT Feedback Control: Demonstration of high-speed volumetric OCT feedback control by integrating our best-performing pose estimator into a closed-loop pipeline that drives our small, articulated magnetic tool.
- Annotated Volumetric OCT Dataset: We provide our annotated volumetric OCT dataset for multi-jointed surgical-tool pose estimation, containing realistic imaging artifacts (e.g., speckle noise, shadowing, mirroring), occlusions, and partially out-of-view configurations, and uniquely supporting 8-DoF labels; prior OCT pose datasets are few and limited to 6-DoF for rigid needle tools.
System Overview
Our system couples high-speed volumetric OCT imaging with deep learning and magnetic actuation to estimate and control the pose of a millimetre-scale articulated surgical tool. An OptoRes OCT scanner acquires 3D volumes of the tool in a mock neurosurgical environment, which are converted into sparse voxels and processed by an adapted VoxelNeXt network to predict 3D keypoints and an 8-DoF tool pose (3D position, orientation, and two joint angles).
The tool is driven by an 8-coil electromagnetic actuation system and a two-axis delivery stage that sets rotation and translation. OCT-derived joint angles are streamed to a gain-scheduled PID controller that computes coil currents via a calibrated actuation model, enabling closed-loop magnetic control at roughly 20 Hz despite imaging artifacts, occlusions, and partial field-of-view.
Results
On our annotated OCT dataset, VoxelNeXt achieves mean pose errors of approximately 0.6 mm in position and 5° in joint angles, outperforming seven strong baselines based on 3D CNNs, heatmap regressors, and 2D projection methods. Accuracy degrades gracefully in the presence of challenging OCT artifacts such as occlusions, mirror artifacts at the zero-delay line, and partially out-of-view tool configurations, maintaining sub-millimetre position error in most cases.
When integrated into the magnetic control loop, the OCT-based pose estimates enable smooth joint step responses with low overshoot and stable tracking. In a pick-and-drop task, closed-loop control improves pickup success from 50% (open-loop) to 94% and increases successful drop-off to 83%, while reducing overall task time. A handover experiment further shows that OCT can simultaneously distinguish fluid-filled from empty micro-tubes and guide the articulated tool to grasp and pass the correct tube to another instrument.
BibTeX Citation
@article{fredinpol-ral2025-OCT-Pose, author={Fredin, Erik and Pol, Nirmal and Zaliznyi, Anton and Fishman, Dmytro and Diller, Eric and Kahrs, Lueder A.}, journal={IEEE Robotics and Automation Letters}, title={OCT Imaging for Pose Estimation and Feedback Control of an Articulated Magnetic Surgical Tool}, year={2025}, volume={}, number={}, pages={}, doi={} }
Acknowledgments
This research was enabled in part by support provided by Compute Ontario and the Digital Research Alliance of Canada. The OCT system was purchased with support of the Canada Foundation for Innovation (CFI, Advanced Fetal Diagnosis and Therapy Program). LAK received funding from NSERC (RGPIN-2020-05833). This work is supported by the Faculty of Applied Science and Engineering at the University of Toronto through the EMHSeed & XSeed program.
