Remote Senior Data Scientist @ Sonatype

This job has now closed and is no longer accepting applications.
See related jobs

Archive Job Description

Sonatype is the software supply chain management company. We’re on a mission to change how the world innovates by making software development easier. From running the world’s largest repository of Java open-source components (Maven Central) to inventing componentized software development and then software supply chain management to creating the only solution that stops malicious open-source malware in its tracks, we’re constantly leading the industry while helping thousands of customers manage open source every day.

Already used by 15 million developers, we have lofty goals for our technology to be in the hands of every engineering team. And we need you to do that

Sonatype’s mission is to enable organizations to better manage their software supply chain. We offer a series of products and services including the Sonatype Nexus Repository and Sonatype Lifecycle.

*** This position is 100% remote and candidates must currently live in the Canada or the US. ***

You’ll be working with one of our sophisticated research teams to help turn large amounts of data into valuable insights for our customers. We’re building our data science program so you’ll be helping to build out our standard processes as we grow. We have a large team of dedicated data engineers and data scientists so you can focus on doing what you do best, building models.

What You’ll Be Doing

Interacting with product management and data engineers to think through the potential ways to leverage data.
It is encouraged that you are an authority in machine learning so you will largely be driving the direction of your work since you know best what is possible.
Assure quality of models you’re producing and supervising them over time
Lead the research, development, and deployment of machine learning models for malicious behavioral analysis detection, demonstrating innovative techniques such as GANs, VAEs, and generative AI..
Collaborate closely with multi-functional teams, including data engineers, software developers, and domain authorities, to identify business requirements and translate them into practical data science solutions.
Explore and evaluate different generative AI approaches and algorithms to detect and predict malicious activities, anomalies, and behavioral patterns in diverse datasets.
Design and implement scalable and efficient data processing pipelines to collect, cleanse, and preprocess large-scale datasets for training and validation purposes.
Develop and implement feature engineering strategies, dimensionality reduction techniques, and data augmentation methods to improve the performance and generalization capabilities of the models.
Conduct in-depth exploratory data analysis and develop statistical models to identify patterns, correlations, and trends in data related to fraud and behavioral patterns.
Collaborate with the data governance team to ensure compliance with data privacy regulations and ethical considerations while working with critical customer data.
Stay updated on the latest research and advancements in generative AI, fraud detection, and behavioral analysis domains, and evaluate their applicability to enhance our existing models and methodologies.
Mentor and provide guidance to junior data scientists, assisting them in developing their technical skills and understanding of AI capabilities.
Present findings, insights, and model performance to both technical and non-technical partners, successfully communicating sophisticated concepts in a clear and concise manner.
Excellent problem-solving abilities and the capacity to develop innovative solutions to sophisticated data science challenges
Strong grasp of robust model validation techniques, including cross-validation and evaluation metrics suitable for assessing generalization performance.
Confirmed ability to implement data science standard methodologies
Proficiency using Jupyter or Databricks notebooks

Requirements

Strong academic credentials in computer science, statistics, data science, machine learning or a related field.
8+ years of hands-on experience as a data scientist.
Strong expertise in generative modeling, and deep learning architectures.
Thorough quantitative background.
Shown understanding of fraud detection techniques, anomaly detection, and behavioral analysis.
Proficiency in programming languages such as Python or R, and experience with relevant libraries and frameworks (e.g., TensorFlow, Keras, PyTorch, ScikitLearn).
Shown experience in working with large-scale datasets, data preprocessing, and feature engineering.

Preferences

Familiarity with Databricks, AWS, S3, EMR, Sagemaker, would be beneficial
Experience with Git and preferably Github
PySpark, MLflow, LangChain, HuggingFace APIs
Our data engineers primarily use Java and Scala. We don’t expect you to be writing Java/Scala code, but familiarity may make it easier to work with the Data Engineers.

What We Offer

The opportunity to be part of an incredible, high-growth company, working on a team of expert colleagues
Competitive salary package
Medical/Dental/Vision benefits
Business casual dress
Flexible work schedules that ensure time for you to be you
2019 Best Places to Work Washington Post and Washingtonian
2019 Wealthfront Top Career Launch Company
EY Entrepreneur of the Year 2019
Fast Company Top 50 Companies for Innovators
Glassdoor ranking of 4.9
Come see why we’ve won all of these awards