Text this: Hybrid deep learning video captioning model for gastrointestinal tract endoscopy