Text this: Spatial information-based multimodal integration technique