How data is gathered today has substantially changed, the storage capabilities and even the explosion of the Internet-using population. However, these are not the most important changes happening in our world today on the data front. The most important is the realization by organizations that data is the source of knowledge. With this realization, the position of the data scientists has changed drastically from being at the tail-end of the production line to being involved in the whole spectrum of data collection, gathering, and processing.
Initially, data gathering was unorganized and dirty. Today this is changing in most organizations with the realization that time spent clearing the data could be used to put the insight from the data into production. Thinking of what data to gather from users and customers should be part of the system development life-cycle. This is not something new to large organizations around the world, such as major social media outlets. Clean and accessible data is changing the way businesses are dealing with the various entities in their production lifecycle.
The cycle of data collection generally has the following steps: Identify the data requirements, Collect data, Analyze data for insights, more data is gathered to confirm the observed insights, and then validate the model. Organizations must have well-skilled staff on data gathering, data cleaning, feature engineering, machine learning techniques, methodologies for data mining, evaluation of the models developed, report writing, and visualization skills. We are at a stage where organizations must change from monolithic systems to systems that allow integration of micro-services/Application Programming Interfaces (APIs) to take advantage of insights from the data science teams.
The skills-building for the data scientist must be both rigorous and focused. The opportunities are all around the world. This is expected to grow going forward. Watch the discussion on these opportunities in training in Data Science here.
Lawrence Nderu is a Lecturer and a researcher in the Department of Computing at Jomo Kenyatta University of Agriculture and Technology (JKUAT- Kenya). He is also a Senior Instructor at JENGA school. He holds a Ph.D. in Computer Science specializing in AI and Fuzzy Logic, from the University of Paris VIII, France, and a Master’s degree in Software Engineering from JKUAT. His current research interests are Artificial Intelligence, Data Science, and Mobile Application Development.