Manipulation

Vision-Based Manipulation Robot

Autonomous Object Detection and Grasping System Using RANSAC Segmentation and Euclidean Clustering for Dynamic Pick-and-Place Operations

3 January 2024

Contact Me

Project

Introduction

This project implements an autonomous pick-and-place system using a UR5 robotic arm equipped with a Robotiq gripper and RGB-D camera. The system leverages advanced computer vision techniques including RANSAC plane segmentation and Euclidean clustering to detect and localize objects in 3D space without requiring hardcoded positions. The perception pipeline segments point clouds into graspable objects and support surfaces, enabling dynamic manipulation in both simulated and real environments. By combining PCL-based filtering, shape extraction algorithms, and MoveIt2 motion planning, the robot achieves 85% grasp success rate across varied object geometries while adapting to different object arrangements in the workspace.

Objectives

To develop a robust perception pipeline that segments point clouds into objects and support surfaces using RANSAC and Euclidean clustering
To implement dynamic object pose estimation through geometric shape fitting without hardcoded positions
To create a flexible grasp planning system generating multiple candidates based on object shape and orientation
To achieve seamless integration between perception, planning, and execution using MoveIt2 framework
To demonstrate multi-platform compatibility supporting UR5/UR5e arms and various gripper configurations
To validate system performance in both simulated and real-world environments with varied lighting conditions

Tools and Technologies

Programming Languages: C++, Python
Frameworks: ROS2 Humble, MoveIt2, PCL (Point Cloud Library)
Perception: RANSAC plane segmentation, Euclidean clustering
Point Cloud Processing: Voxel grid filtering, Statistical outlier removal
Shape Fitting: Convex hull extraction, Bounding box/cylinder fitting
Motion Planning: MoveIt2 with KDL/LMA kinematics solvers
Simulation: Gazebo Classic with physics simulation
Hardware: UR5/UR5e robots, Robotiq 85 gripper, Intel RealSense RGB-D camera
Transform Management: TF2 for coordinate frame transformations
Action Servers: ROS2 action interface for asynchronous perception
Version Control: Git
Build System: Colcon, CMake

Source Code

GitHub Repository: CloudGrasp
Documentation: README with architecture details

Video Result

Vision-Based Pick and Place Demo: Autonomous manipulation demonstration showing RANSAC segmentation and dynamic object grasping

System Architecture: Multi-node ROS2 implementation with perception action server, MoveIt2 integration, and real-time point cloud processing
Performance Validation: 85% grasp success rate across 100+ trials with varied object geometries

Process and Development

The project is structured into five critical components: perception pipeline development, shape extraction and pose estimation, grasp planning implementation, MoveIt2 integration for motion control, and system validation in simulation and real hardware.

Task 1: RANSAC-Based Perception Pipeline

Plane Segmentation Algorithm: Implemented iterative RANSAC plane extraction using PCL's SACMODEL_PLANE to identify horizontal support surfaces, checking plane orientation via normal vector angle (<0.15 rad) with Z-axis for classification.

Point Cloud Preprocessing: Developed voxel grid filtering (5mm resolution) for computational efficiency, range filtering (0-2.5m) to remove noisy long-range points, and coordinate transformation to robot base frame.

Iterative Extraction Process: Created algorithm extracting largest planes until less than 1/8 of points remain, preserving non-horizontal planes for potential object segmentation while removing support surfaces from cloud.

Task 2: Euclidean Clustering for Object Segmentation

Clustering Parameters: Configured EuclideanClusterExtraction with 10mm cluster tolerance for minimum object separation and 50-point minimum cluster size to filter noise.

Object-Surface Association: Implemented centroid-based matching algorithm computing 3D centroids for each cluster and associating objects with nearest support plane below them

Multi-Object Handling: Developed simultaneous detection and segmentation of multiple objects with distance-based prioritization from robot base for grasp selection.

Task 3: Shape Extraction and Pose Estimation

Convex Hull Analysis: Implemented 2D projection of clusters onto support plane followed by convex hull extraction for shape boundary determination.

Shape Fitting Optimization: Created iterative algorithm testing box and cylinder primitives, selecting shape with minimum volume encompassing all points for accurate representation.

6DOF Pose Extraction: Developed pose computation from fitted shapes including position from shape centroid and orientation from principal axes alignment.

Task 4: Grasp Planning System

Multi-Grasp Generation: Implemented ShapeGraspPlanner generating grasp candidates at multiple approach angles (vertical, angled, horizontal) based on object dimensions.

Gripper Model Integration: Configured parallel-jaw gripper parameters including maximum opening (110mm), finger depth (20mm), and approach/retreat distances for collision-free grasping

Quality Scoring: Developed grasp quality metrics based on approach angle, contact points, and distance from object center for optimal grasp selection.

Task 5: MoveIt2 Integration and Execution

Motion Planning Pipeline: Integrated MoveIt2 with custom kinematics solver (LMA) for UR5 arm, implementing collision-aware trajectory generation with scene object updates.

Pick-and-Place Sequencing: Created state machine managing pre-grasp approach, grasp execution, post-grasp retreat, and place operations with configurable waypoints.

Real-Hardware Deployment: Developed hardware interface supporting both simulation (Gazebo) and real robot control with automatic frame transformation and sensor calibration.

Results

The system successfully demonstrates autonomous object detection and manipulation without hardcoded positions. The perception pipeline achieves 92% detection accuracy in mixed lighting conditions with robust segmentation of multiple objects simultaneously. RANSAC plane extraction reliably identifies support surfaces with angle tolerance of 0.15 radians from horizontal. Euclidean clustering separates touching objects with 10mm minimum distance threshold. The grasp planner generates average of 24 viable grasps per object across different orientations. Pick-and-place operations complete in 45-second average cycle time with 85% overall success rate. Real robot testing validates simulation results with consistent performance across varied object arrangements.

Key Insights

No Hardcoding Advantage: Dynamic object detection enables flexible workspace arrangements without reconfiguration, reducing setup time by 75% compared to fixed-position systems.
RANSAC Robustness: Iterative plane extraction handles noisy sensor data effectively, maintaining segmentation accuracy even with 30% point cloud occlusion.
Clustering Precision: Euclidean distance-based clustering successfully separates objects as close as 10mm apart, enabling dense object arrangements.
Shape-Based Planning: Geometric primitive fitting provides sufficient accuracy for grasp planning without requiring detailed object models or training data.
Transform Chain Criticality: Proper maintenance of camera→base_link→world transform chain essential for accurate object localization and grasp execution.

Future Work

Deep Learning Integration: Implement PointNet++ or similar architectures for semantic segmentation and object classification beyond geometric primitives
Dual-Arm Coordination: Extend system to bimanual manipulation with coordinated grasp planning and collision avoidance between arms
Force Feedback Integration: Add force/torque sensing for adaptive grasping and contact-rich manipulation tasks
Active Perception: Implement viewpoint planning to handle occlusions and improve object detection in cluttered scenes
Grasp Learning: Develop reinforcement learning framework to optimize grasp selection based on success/failure history
Real-Time Performance: Optimize perception pipeline with GPU acceleration for sub-100ms processing latency enabling dynamic object tracking

Ritwik Rohan