Strength Through Diversity
Ground breaking science. Advancing medicine. Healing made personal.
Req # 2513058
Data Engineer Analyst I
The “Data Engineer Analyst I” will work on developing and maintaining batch and streaming machine learning pipeline for accelerating translational research and improving clinical care. This individual demonstrates sound understanding of the software engineering, technical data manipulation, health care data and big data analytics skills and impact the patient community of the Mount Sinai Health System.
Duties and Responsibilities
- Creates machine learning pipeline using big-data technology stacks
- Works on Python, Apache Spark, Kafka, MongoDB, and Machine Learning
- Works on developing and maintaining operational machine learning software
- Develops prototypes and proof of concepts for the selected use cases, and implementing complex machine learning pipeline
- Identifies necessary data, data sources and methodologies.
- Translates prototype of machine learning solution into operational software
- Develops productionalizable software following standard practice, like, software testing, documentation and QA
- Identifies and addresses expected and unforeseen data complexities to mitigate their impact on the analytic outcome and associated business decisions.
- Works to improve data quality where possible within created machine learning models.
- Feeds data quality issues back to IT or identified data stewards to facilitate creation of high quality metrics.
- Masters degree minimum, in a relevant field of study (e.g. Computer Science , Information Systems, Physics, Statistics); PhD degree preferred.
- Proficient with at least 1 programming languages among Scala/Python/Java/C/C++. Must be flexible and fast to pick up new languages.
- Basic knowledge of Big Data Technology Stacks like Apache Spark, Apache Kafka, NoSQL, and Machine Learning.
- Basic Knowledge of Supervised and Un-supervised Learning
- Knowledge of deploying and maintain the Machine Learning Software in production environment
- Hands-on experiences on Apache Spark, Kafka, MongoDB, Apache Nifi and other big data technology stacks and streaming tools.
- Familiarity with and the ability to leverage a wide variety of open source technologies and tools
- Knowledge of cloud architecture and implementation on Azure or AWS is a big plus
- Advanced Knowledge of Machine Learning