Remote Senior Data Scientist @ Sonatype

This job has now closed and is no longer accepting applications.
See related jobs

Archive Job Description

Sonatype is the software supply chain management company. We’re on a mission to change how the world innovates by making software development easier. From running the world’s largest repository of Java open-source components (Maven Central) to inventing componentized software development and then software supply chain management to creating the only solution that stops malicious open-source malware in its tracks, we’re constantly leading the industry while helping thousands of customers manage open source every day.
Already used by 15 million developers, we have lofty goals for our technology to be in the hands of every engineering team. And we need you to do that
Sonatype’s mission is to enable organizations to better manage their software supply chain.  We offer a series of products and services including the Sonatype Nexus Repository and Sonatype Lifecycle.
*** This position is 100% remote and candidates must currently live in the Canada or the US. ***
You’ll be working with one of our sophisticated research teams to help turn large amounts of data into valuable insights for our customers. We’re building our data science program so you’ll be helping to build out our standard processes as we grow. We have a large team of dedicated data engineers and data scientists so you can focus on doing what you do best, building models.

What You’ll Be Doing

  • Interacting with product management and data engineers to think through the potential ways to leverage data.
  • It is encouraged that you are an authority in machine learning so you will largely be driving the direction of your work since you know best what is possible.
  • Assure quality of models you’re producing and supervising them over time
  • Lead the research, development, and deployment of machine learning models for malicious behavioral analysis detection, demonstrating innovative techniques such as GANs, VAEs, and generative AI..
  • Collaborate closely with multi-functional teams, including data engineers, software developers, and domain authorities, to identify business requirements and translate them into practical data science solutions.
  • Explore and evaluate different generative AI approaches and algorithms to detect and predict malicious activities, anomalies, and behavioral patterns in diverse datasets.
  • Design and implement scalable and efficient data processing pipelines to collect, cleanse, and preprocess large-scale datasets for training and validation purposes.
  • Develop and implement feature engineering strategies, dimensionality reduction techniques, and data augmentation methods to improve the performance and generalization capabilities of the models.
  • Conduct in-depth exploratory data analysis and develop statistical models to identify patterns, correlations, and trends in data related to fraud and behavioral patterns.
  • Collaborate with the data governance team to ensure compliance with data privacy regulations and ethical considerations while working with critical customer data.
  • Stay updated on the latest research and advancements in generative AI, fraud detection, and behavioral analysis domains, and evaluate their applicability to enhance our existing models and methodologies.
  • Mentor and provide guidance to junior data scientists, assisting them in developing their technical skills and understanding of AI capabilities.
  • Present findings, insights, and model performance to both technical and non-technical partners, successfully communicating sophisticated concepts in a clear and concise manner.
  • Excellent problem-solving abilities and the capacity to develop innovative solutions to sophisticated data science challenges
  • Strong grasp of robust model validation techniques, including cross-validation and evaluation metrics suitable for assessing generalization performance.
  • Confirmed ability to implement data science standard methodologies
  • Proficiency using Jupyter or Databricks notebooks

Requirements

  • Strong academic credentials in computer science, statistics, data science, machine learning or a related field.
  • 8+ years of hands-on experience as a data scientist.
  • Strong expertise in generative modeling, and deep learning architectures.
  • Thorough quantitative background.
  • Shown understanding of fraud detection techniques, anomaly detection, and behavioral analysis.
  • Proficiency in programming languages such as Python or R, and experience with relevant libraries and frameworks (e.g., TensorFlow, Keras, PyTorch, ScikitLearn).
  • Shown experience in working with large-scale datasets, data preprocessing, and feature engineering.

Preferences

  • Familiarity with Databricks, AWS, S3, EMR, Sagemaker, would be beneficial
  • Experience with Git and preferably Github
  • PySpark, MLflow, LangChain, HuggingFace APIs
  • Our data engineers primarily use Java and Scala. We don’t expect you to be writing Java/Scala code, but familiarity may make it easier to work with the Data Engineers.

What We Offer

  • The opportunity to be part of an incredible, high-growth company, working on a team of expert colleagues
  • Competitive salary package
  • Medical/Dental/Vision benefits
  • Business casual dress
  • Flexible work schedules that ensure time for you to be you
  • 2019 Best Places to Work Washington Post and Washingtonian
  • 2019 Wealthfront Top Career Launch Company
  • EY Entrepreneur of the Year 2019
  • Fast Company Top 50 Companies for Innovators
  • Glassdoor ranking of 4.9
  • Come see why we’ve won all of these awards