Umar Masud

Hi πŸ‘‹! I am a researcher and developer in the field of AI/ML.
I am a recent graduate (MSc) from the University of Toronto 🍁 and a past intern at Samsung AI Center Toronto, working with the camera technology team. Prior to that, I did my undergrad in Computer Engineering at JMI University, New Delhi.

I have a strong background in AI, with a focus on multimedia data. My work includes top-conference paper submissions (CVPR 25' under review), experience with large-scale data (millions), prototyping ML models and understanding the complete ML lifecycle. I have done many personal projects and academic internships in the ML domain, solving a variety of problems. Currently exploring vision+language solutions like Visual-RAG, Multimodal VLMs, LLMs, Agents, Diffusion, etc. πŸš€

My interest is in making machines understand vision & language as we humans do.
I love to make and own end-to-end projects that solve real-life problems.

Additionally, I am enthusiastic about innovation, ideas, entrepreneurship, community, and the like. A keen interest in the start-up ecosystem.


Education

University of Toronto

Master of Science in Applied Computing (MScAC)
AI Concentration

GPA: 4.0/4.0

Relevant Coursework: Computational Imaging, Neural Networks and Deep Learning, Software engineering for machine learning, Visual and mobile computing systems.

September 2023 - December 2024

Jamia Millia Islamia University

Bachelor of Technology
Computer Engineering

GPA: 9.82/10

Relevant Coursework: Fundamentals of Computing, Data Structures and Computer Programming, Computer Networking, Database Management, Engineering Mathematics (I, II, III), Computer Architecture, Digital Signal Processing.

August 2019 - May 2023

Applied Research

Samsung AI Center Toronto

ML Research Intern

Worked with the Camera technology team, focusing on computational imaging problems. Involved researching ways to turn heuristic software dominant ISP to AI-enabled. Developed extremely lightweight Neural Implicit model (180kb) for addressing the authenticity of images when cameras use generative AI (submitted to CVPR 2025). Additionally, conducted psychovisual user-studies for other teams.

May 2024 - December 2024

Ulm University

Visiting Research Intern

Visited the lab at Institute of Neural Information Processing under Prof. Friedhelm Schwenker where I explored the topic of Compressed Image Super-resolution.

With only a 4.19M parameter model, could effectively address jpeg compression and super-resolution simultaneously, achieving up to 27.62 PSNR and 0.771 SSIM.
To overcome additional compression artefacts, devised a lightweight CNN-based model leveraging a pre-trained feature extractor during training for information fusion. During inference, it operates independently saving a lot of computation.

June 2022 - August 2022

Indian Institute of Science (IISc)

Research Intern

Associated with VCL Lab, where I am working on Domain Generalisation for Person Re-identification task.

Utilised simple techniques of supervised contrastive learning in Domain Generalisation for Person Re-identification task, getting up to 53.7 mAP and 77.8 Rank-1.
Introduced novel perturbation strategies to realistically model domain variations and preserve target identities. Also contributed person attribute annotations for CUHK-03 and MSMT17 benchmark datasets.

January 2022 - June 2022

Ecole normale superieure - PSL

Research collaborator

Worked with members of Computational Bioimaging and Bioinformatics team headed by Auguste Genovesio for a project on Quality Control of out-of-focus/noised images Phenotypic Screening using Self/Semi Supervised learning.

For phenotypic screening, devised a method for data quality check on 2.1M images reaching beyond 98% success.
Compared transfer learning and self-supervised learning methods to detect abnormal single-cell images, fine-tuning downstream classification with as little as 350 annotated pairs.

December 2021 - March 2022

IIIT-Allahabad

Summer Research Intern

Worked on the topic of "Automatic Detection of Image Splicing", under Prof. Anupam Agarwal , in Interactive Technologies and Multimedia Research (ITMR) Lab.

Found 40-45% drop in performance of Image Forgery solutions, questioning the reliability and robustness of several over-estimated results.
Implemented 5 papers from scratch and tested them across 13 different datasets in cross-evaluation and out-of-distribution training/testing environments, commenting on their generalizability across datasets.

May 2021 - July 2021

Jamia Millia Islamia

Undergraduate Researcher

Working on various problems in computer vision under Prof. Sarfaraz Masood.


  • Designed a novel, lightweight model with up to 496x reduction in parameter count for facial mask detection. Developed a large synthetic dataset by stitching masks at incorrect positions on faces. The dataset has 1300+ downloads on Kaggle.
  • Created a 152x times lighter model for DeepFake Video detection while achieving a significant accuracy of 99.24% at a remarkable rate of 80 fps. Accomplished by using both spatial and temporal information through pre-trained CNN encoders, topped up by LSTMs saving up training data and time.
May 2021 - December 2022

PUBLICATIONS

  1. Self-Supervised Learning Masud, U., Cohen, E., Bendidi, I., Bollot, G., Genovesio, A. (2022). Comparison of semi-supervised learning methods for High Content Screening quality control. BioImage Computing workshop at ECCV 2022. https://doi.org/10.48550/arXiv.2208.04592
  2. Synthetic Data Generation Efficient ML Masud, U., Siddiqui, M., Sadiq, Mohd., Masood, S. (2022). SCS-Net: An efficient and practical approach towards Face Mask Detection. Procedia Computer Science Journal. ICMLDE, 2022. https://doi.org/10.1016/j.procs.2023.01.165
  3. Domain Generalisation Contrastive Learning Jambigi, C., Masud, U., Chakraborty, A. (2022). G-PReDICT: Generalizable Person Re-ID using Domain Invariant Contrastive Techniques. ICVGIP, 2022. https://doi.org/10.1145/3571600.3571655
  4. Inverse Problem Masud, U., Shwenker, F. (2022). Compressed Image Super-Resolution using Pre-trained Model Assistance. COMSYS, 2022. https://doi.org/10.1007/978-981-99-2680-0_5
  5. Video classification Masud, U., Sadiq, Mohd., Masood, S., Ahmad, M., and Ahmed A. Abd El-Latif. 2023. LW-DeepFakeNet: A Lightweight Time Distributed CNN-LSTM network for real-time DeepFake Video Detection.Signal, Image and Video Processing. https://doi.org/10.1007/s11760-023-02633-9
  6. Media Authenticity Masud, U., Agarwal, A. (2021). Analysing Statistical methods for Automatic Detection of Image Forgery. arXiv. https://doi.org/10.48550/arXiv.2111.12661

Skills

AI/ML
  • Classical ML/DL,
  • Computer Vision,
  • Multi-Modal learning,
  • Generative AI,
  • VLMs,
  • LLMs,
  • Agents,
  • Diffusion.
Programming Languages
  • Python,
  • Java,
  • Bash/Shell.
Libraries/Frameworks
  • NumPy,
  • Pandas,
  • Matplotlib,
  • Sklearn,
  • OpenCV,
  • Keras,
  • TensorFlow,
  • PyTorch,
  • Flask.
Database
  • MySql,
  • ChromaDB (vector)
MLOPs
  • Git,
  • MLflow,
  • W&B,
  • Tensorboard,

Projects

This is a collective dump of all my projects, big and small. Some of them were part of coursework, others for self learning. The advanced ones are marked by πŸ”₯.

Machine Learning
  • Generative AI
    • πŸ”₯ Built a Multi-Modal Visual-RAG for intelligent gallery search that combines image and metadata search capabilities using multimodal embeddings and evaluation.
  • Classical ML
    • πŸ”₯ Reproduced 5 papers on the topic Image Forgery Detection that uses handcrafted features for classification of pristine and tampered images.
    • Diabetic Retinopathy Detection using Texture Features and Ensemble Learning (paper implementation). Achieved F1-score = 0.97 and accuracy = 97.2%.
    • Fog detection in images using GLCM based features and SVM (paper implementation). Got F1-score = 0.83 and test accuracy = 82.3%
    • Phishing URL detection system based on URL features using SVM (paper implementation). Achieved F1-score = 0.99 and test accuracy = 99.2%
  • OpenCV Projects
    • πŸ”₯ Air-Piano, an air-based piano enabling the person to play through hand(fingertip) movements.
    • Air-Drum System, an air-based drum beat generator.
    • Background Color Detection, uses 2 techniques to detect a suitable background for the input image.
  • Deep Learning
    • πŸ”₯ Clicking better Images with Under Display Cameras (UDC) in Smartphones. A 7.78M params model with KD gets 30.59 PSNR, and diffusion beats SOTA getting 42.37 PSNR. (Report)
    • Integrating ML functionalities - generating tags and descriptions for uploaded images, in an existing Instagram clone web-app in Flask. (Report)
    • πŸ”₯ Different Descriptors for Squeeze and Excitation Attention Block - experimented with standard deviation, trace, largest singular value, and DC coefficient of DCT instead of usual GlobalAvgPool2d. The SVD approach gives a 0.78% improvement but with an 80% increase in training time. (Report)
    • Image Inpainting using a U-Net model with a fused ConvMixer encoder. The feature fusion method showed 1.34% improvement in terms of dice coefficient (Report)
    • Background Remover tool for portrait images of humans, made using a U-Net model trained for semantic segmentation of the image. The model achieved 0.981 IOU-score on test data. Also deployed on a web-app.
    • Implemented the paper - Medical image denoising using Convolutional Denoising Autoencoders(CAE). Achieved a loss = 0.106 or Structural Similarity Index(SSIM) = 0.894 .
    • Image similarity measure through Siamese network on fashion apparels. Got an evaluation accuracy of 94.2%
    • Plant Pathology Challenge, a FGVC8 workshop challenge at CVPR-2021 for multi-label classification of plant leaf diseases. Got 87.34 accuracy with a pre-trained model as feature extractor.
    • Human Emotion Detection, Pneumonia Prediction models.

Web Development
  • Banking System
    • The project contains a simple banking system that enables to transact between the customers. It uses HTML, CSS, bootstrap, PHP, and MySQL, with the local server provided by the XAMPP.
  • Website Template for InnerveSOC
    • As a part of the InnerveSOC competition, designed a complete website template for Innerve Tech-Fest 2020, IGDTUW. I was the adjudged winner.

Awards & Honors

  • Research Week with Google 2023: Amongst 250 people accepted for participation by Google Research India.
  • Online Asian Machine Learning School (OAMLS): Accepted with full scholarship as a part of ACML 2022.
  • Robotics & AI Summer School 2022: Accepted to this summer school hosted by IRI, CSIC-UPC.
  • DAAD-WISE Scholarship 2022: Financial aid for Summer Research Internship in Germany.
  • Workshop on AI for Computational Social Systems(ACSS) 2021: - 3rd place in Student Paper Competition.
  • 5th Summer School of AI 2021 - IIIT Hyderabad - One amongst 500 participants worldwide.
  • Winner-Innerve Summer of Code Challenge 2020 - Indira Gandhi Delhi Technical University for Women.
  • INSPIRE Science Award For Top 1% - Scholarship for Higher Studies by Govt. of India.
  • Mr. Harbinder Singth Dugal Rolling Trophy - Awarded for Proficiency in Science ISC-XII
  • Mr. G W Mayer’s Merit Scholarship - Awarded for excellence in Mathematics and Science
  • Shanker Sumeda Rolling Trophy - Awarded for Excellence in Academics.

Volunteering

  • ML/AI Dev

    Google Developer Student Club - JMI.

    One of the core team members, responsible for all the activities being carried out for the dissemination of knowledge about ML/AI to students.

    August 2021 - August 2022
  • Organising Member

    NewInML WORKSHOP, ICML 2022

    Supported the main team, responsible for organizing the NewInML workshop's online events.

    June 2022 - July 2022
  • Youth Ambassador

    HundrED Global Organization

    HundrED Youth Ambassadors is an active community of students from around the world who are passionate about improving education and want to work with other like-minded young changemakers.

    January 2021 - December 2021