DevOps for Data Scientists: Taming the Unicorn

Stow do we version control the model and add it to an app? How will people interact with our website based on the outcome? How will it scale!?

When most data scientists start working, they are equipped with all the neat math concepts they learned from school textbooks. However, pretty soon, they realize that the majority of data science work involve getting data into the format needed for the model to use. Even beyond that, the model being developed is part of an application for the end user. Now a proper thing a data scientist would do is have their model codes version controlled on Git. VSTS would then download the codes from Git. VSTS would then be wrapped in a Docker Image, which would then be put on a Docker container registry. Once on the registry, it would be orchestrated using Kubernetes. Now, say all that to the average data scientist and his mind will completely shut down. Most data scientists know how to provide a static report or CSV file with predictions. However, how do we version control the model and add it to an app? How will people interact with our website based on the outcome? How will it scale!? All this would involve confidence testing, checking if nothing is below a set threshold, sign off from different parties and orchestration between different cloud servers (with all its ugly firewall rules). This is where some basic DevOps knowledge would come in handy.

Continue Reading

20 Core Data Science Concepts

With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.With so much to learn and so many advancements to follow in the field of data science, there are a core set of foundational concepts that remain essential. Twenty of these ideas are highlighted here that are key to review when preparing for a job interview or just to refresh your appreciation of the basics.

Continue Reading

A Data Scientist is a Unicorn by Definition!

There are many definitions of data science and what a data scientist is, but no matter the source there’s general agreement that data scientists require three very important sets of skills:

  1. Math and Statistics. The end-to-end process for data science is deeply rooted in math and statistics. For example, creating understanding from often messy and disparate data requires statistical analysis. And, the techniques that data scientists apply, including all of machine learning, are largely based on mathematical formulas.
  2. Domain Expertise. Knowledge about a particular industry or department, including its business challenges and respective terminology, significantly increases the likelihood that a data scientist can find (and then solve) the right business problems.
  3. DevOps. Data scientists have highly specialized needs for data, storage, compute, and more during the R&D phase of their work. Later, they need to apply important DevOps steps to put their models into production.
Continue Reading