Similarity search on big data in cloud

Project Title:

Similarity search on big data in cloud


  • Professor Manzur Murshed (Principal)
  • Dr Jiangang Ma
  • Associate Professor Shyh Wei Teng
  • Professor Guojun Lu

Contact person and email address:

A brief description of the project:

The emergence of the cloud system has simplified the deployment of distributed software systems and facilitated business activities such as tracking customers’ purchases, products searches, and website interactions, etc. This results in a very large volume of data exchanged between clouds. Currently, cloud systems rely on Distributed File Systems (DFS) to manage data and use key-value model to store business data so that it is only suitable for simple key-value insert and lookup operations. In addition, to simplify implementation, most of existing applications employ a simple query approach to process data by parallel scanning the whole data set. In spite of working in some cases, the current solutions are limited in a dedicated system and a single cloud. Therefore, it is a critical challenge to support effective data intensive computing, search and information storage, that is, supporting big data search in clouds.

This task aims to develop a novel similarity search approach for analysing big data in cloud. Here data may be an abstract object that users request such as a video on an online video website. Data can also be images and text files, and even be cloud services such as search for a cloud service provider. First, you will develop new similarity search techniques, including novel model for object identification in order to process query efficiently. Second, to avoid processing all the objects (data) in a distributed software system  at every time step, you will develop effective deep learning algorithms, including indexing architecture with combination of lower bound filtering technique and related software tools.