This workshop explores how convolutional and recurrent neural networks can be combined to generate effective descriptions of content within images and video clips.
Learn how to train a network using TensorFlow and the Microsoft Common Objects in Context (COCO) dataset to generate captions from images and video by:
Implementing deep learning workflows like image segmentation and text generation
Comparing and contrasting data types, workflows, and frameworks
Combining computer vision and natural language processing
Upon completion, you’ll be able to solve deep learning problems that require multiple types of data inputs.
Luigi Troiano (troiano@unisannio.it)
Piero Altoé (paltoe@nvidia.com)
Elena Mejuto Villa (mejutovilla@unisannio.it)
Palazzo Bosco Lucarelli
Primo Piano - Laboratorio Polifunzionale
82100 Benevento (Italy)
Laboratorio Multimediale
Building B2
84084 Campus di Fisciano (Italy)
Department of Physics and Earth Sciences
Via G. Saragat 1
44122 Ferrara (Italy)