Collaboration Platforms for Data Scientists

News from April 10th 2019 is the release of Google’s collaborative AI platform for Data Science teams, for execution on cloud or on premises. Google’s platform joins Alibaba‘s similar platform called PAI 2.0 announced in March 29th 2017. While comprehensive information on Alibaba’s platform is sparse in non-Chinese, the Google AI Platform does give samples and tutorials. Two others ClusterOne for the DevOps of data science and DeterminedAI for collaboration each had funding announcements earlier this year. Google and Alibaba’s platforms give a clear separation for team roles to collaborate at each stage of the process (as is indicated for the two yet-to-be-released others). The concept is well worth a mention because they are collaborative frameworks pushing forward the methodologies of data science, engineering and in essence, social intelligence..

 

cloud-ai-platform2
AI Platform announced by Google April 10th 2019: Process pipeline of data-driven application stages.

 

You might well be familiar with dedicated collaborative product teams in larger businesses working on separate stages of scalable data-driven software life cycles. Often it combines a series of enabling teams and methodologies: Site Reliability Engineering for quantifiable operations, managing hardware tooling and automation — including code revisions, deployment containers such as Docker and Kubernutes and automated continuous build, test, integration and deployment (CI/CD), (a flavour of) the Agile methodologies for project & progress management, developer task chunking, collaborative efforts,  and those of the Data Science roles.

 

The Google AI Platform encourages the first two and enables the latter in the form of a few roles. Data Engineer to wrangle data sources and attributes, develop data pipelines from data producers through pre-processing ready for downstream data consumers. Machine Learning (ML) Engineer to develop ML models, maintain scalable & distributed ML solutions in production and in-charge of the entire life cycle through deployment, monitoring and maintenance. Finally, the Data Scientist to research & design statistical models for analysis, understand business stakeholder needs, communicate results and statistical concepts to business leaders, design projects frameworks for joint development efforts (which is where Google’s platform really kicks-in), build custom tools for monitoring, use predictive modelling to achieve key business metrics, develop company model evaluation systems through A/B testing and experimental testing.

 

All in all, collaborative platforms are set to continue as the efficient new norm for developing and running statistical data driven applications in large businesses, Google’s AI Platform follows suit, and meets with similar platform concepts coming out of US research institutes around this time. Of course, there are a number of questions to answer..

  • Is this a tool to automate one of the Data Scientist’s jobs roles, e.g. providing a framework for joint development efforts?
  • How will the efficiency ratio (e.g. accelerated rate of meeting business needs vs cost for transition to this platform) weigh-up for those with existing frameworks in their businesses vs those without?
  • Which smaller scale businesses is such a system truly suitable for? What integration plan would be best to pursue to minimise risk and loss?

 

Reference Info Links:

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.