Slidebrook Partners

Deep Learning Scientist

Twitter Facebook
Boston, MA
Job Type
Direct Hire
May 15, 2018
Job ID
Deep Learning Scientist
Our client is an MIT spin-off focused on understanding human emotion. Their vision is that technology needs the ability to sense, adapt and respond to not just commands but also non-verbal signals. They're building emotional AI.
Such an ambitious vision takes a strong team with a strong desire to explore and innovate. They are growing the team to improve and expand the core technologies and help solve many unique and interesting problems focused around sensing, understanding and adapting to human emotion/behavior.

Their initial technology measures human emotion through sensing and analyzing facial expressions. This technology is already being used commercially in a number of different verticals and has been released to the public in the form of SDKs so  developers around the world can begin to use it to create a new breed of apps, websites and experiences. Since 2017, they've extended their emotion sensing technology beyond the face to leverage human speech. The goal is to build out their technology to perform emotion sensing multi-modally from speech and facial expressions when both channels are present, and unimodally when one of the channels is available.
This position is on the Science team.   This team is tasked with creating and improving their emotion recognition technology. They're a team of researchers with backgrounds in computer vision, speech processing, machine learning and affective computing. The Science team does everything from initial prototyping of state-of-the art algorithms to creating production models which can be included in their cloud and mobile products.
We are looking for a deep learning researcher to join the science team.  (S)he will have experience in solving either computer vision problems (e.g., object detection, localization and classification) or speech processing problems (e.g., speech classification, speech denoising, speaker diarization, source separation). Experience working on multi-modal classification problems, unsupervised learning, or semi-supervised learning is a plus. Any  experience working with the human face or voice (recognition, emotion estimation, or multi-modal recognition) is also a plus.
. Great candidates will be those who want to shape the future of this space, can execute ideas effectively and efficiently, and are passionate about emotion research.
  • Running a multitude of deep learning experiments
    • Prototype new ideas
    • Explore a variety of approaches
    • Refine promising ideas into product ready models
  • Explore new methods to leverage a large dataset of spontaneous real-world audio-visual data
  • Patent and publish findings in computer vision, speech processing, and affective computing conferences
  • At least 2 years of experience using deep learning techniques (CNN, RNN/LSTM) on multi-modal (vision, speech) tasks (audio and video classification, action recognition)
  • Experience working with deep learning frameworks (e.g. Keras, TensorFlow, Theano, Caffe) including implementing custom layers
  • Passionate about innovation and pushing state of the art research
  • Strong Python programming skills
  • Demonstrated experience (publications, projects) solving machine learning problems
  • Masters or PhD in the field of computer vision or speech processing
  • Experience working in one of the follow fields is highly desirable:
    • Facial analysis – face detection, face recognition, expression and emotion classification, face landmark tracking
    • Speech analysis – source separation, speaker diarization or identification, emotion classification