COURSE DESCRIPTION
This course is In this course you will learn how to use data science tools for business decisions. Using open source tools, this course covers all the concepts necessary to move through the entire data science pipeline to analyse your business and make informed decisions.
COURSE LEARNING OUTCOMES (CLOs)
On completion of this course, participants are expected to be able to:
- Translate business questions into Machine Learning problems to understand what your data is telling you
- Explore and analyze data from the Web, Word Documents, Email, Twitter feeds, NoSQL stores, Relational Databases and more, for patterns and trends relevant to your business
- Build Decision Tree, Logistic Regression and Naïve Bayes classifiers to make predictions about your customers’ future behaviors as well as other business critical events
- Use K-Means and Hierarchical Clustering algorithms to more effectively segment your customer market or to discover outliers in your data
- Discover hidden customer behaviors from Association Rules and Build Recommendation Engines based on behavioral patterns
- Use biologically-inspired Neural Networks to learn from observational data as humans do
- Investigate relationships and flows between people, computers and other connected entities using Social Network Analysis
Course Outline:
Introduction to R
Exploratory Data Analysis with R
- Loading, querying and manipulating data in R
- Cleaning raw data for modeling
- Reducing dimensions with Principal Component Analysis
- Extending R with user–defined packages
Facilitating good analytical thinking with data visualization
- Investigating characteristics of a data set through visualization
- Charting data distributions with boxplots, histograms and density plots
- Identifying outliers in data
Working with Unstructured Data
Mining unstructured data for business applications
- Preprocessing unstructured data in preparation for deeper analysis
- Describing a corpus of documents with a term–document matrix
- Make predictions from textual data
Predicting Outcomes with Regression Techniques
Estimating future values with linear regression
- Modeling the numeric relationship between an output variable and several input variables
- Correctly interpreting coefficients of continuous data
- Assess your regression models for ‘goodness of fit’
Categorizing Data with Classification Techniques
Automating the labeling of new data items
- Predicting target values using Decision Trees
- Constructing training and test data sets for predictive model building
- Dealing with issues of over fitting
Assessing model performance
- Evaluating classifiers with confusion matrices
- Calculating a model’s error rate
Detecting Patterns in Complex Data with Clustering and Social Network Analysis
Identifying previously unknown groupings within a data set
- Segmenting the customer market with the K–Means algorithm
- Defining similarity with appropriate distance measures
- Constructing tree–like clusters with hierarchical clustering
- Clustering text documents and tweets to aid understanding
Discovering connections with Link Analysis
- Capturing important connections with Social Network Analysis
- Exploring how social networks results are used in marketing
Leveraging Transaction Data to Yield Recommendations and Association Rules
Building and evaluating association rules
- Capturing true customer preferences in transaction data to enhance customer experience
- Calculating support, confidence and lift to distinguish “good” rules from “bad” rules
- Differentiating actionable, trivial and inexplicable rules
Constructing recommendation engines
- Cross–selling, up–selling and substitution as motivations
- Leveraging recommendations based on collaborative filtering
Learning from Data Examples with Neural Networks
Machine learning with neural networks
- Learning the weight of a neuron
- Learning about how neural networks are being applied to object recognition, image segmentation, human motion and language modeling
- Analyzing labeled data examples to find patterns in those examples that consistently correlate with particular labels for object recognition
Implementing Analytics within Your Organization
Expanding analytic capabilities
- Breaking down Data Analytics into manageable steps
- Integrating analytics into current business processes
- Reviewing Hadoop, Spark, and Azure services for machine learning
Dissemination and Data Science policies
- Examining ethical questions of privacy in Data Science
- Disseminating results to different types of stakeholders
- Visualizing data to tell a story
Course Textbook
Statistics, Data Analysis, and Decision Modeling, 5th Edition
James R. Evans, University of Cincinnati
Link: https://www.pearson.com/us/higher-education/program/Evans-Statistics-Data-Analysis-and-Decision-Modeling-5th-Edition/PGM75296.html
Feedback Given to Participants in Response to Assessed Work
- Individual written feedback on coursework
- Feedback discussed as part of a tutorial
- Individual feedback on request
- Model answers
Developmental Feedback Generated Through Teaching Activities
- Feedback is given at presentations and during tutorial sessions
- Dialogue between participants and staff in tutorials and lectures
GRADING AND SCORING
The course grade will be based on a final project presented by the participant and graded by the instructor. Participants much achieve a passing grade of 70% or more to be awarded a certificate of completion of the course.