Multi-modal perception and sensor fusion for human-robot collaboration