IST718 Master Syllabus

NOTE TO INSTRUCTORS To maintain consistency among class sections of this course, all syllabi should contain this information, cover the schedule of topics, and follow the guidelines herein.

Course Information

Catalog Description

A broad introduction to analytical processing tools and techniques for information professionals. Students will develop a portfolio of resources, demonstrations, recipes, and examples of various analytical techniques.

Detailed Course Description

Upon the successful completion of this course, you will be able to:

Prerequisite Knowledge required

Students taking this course should be familiar with command-line interfaces, possess, basic quantitative skills including elementary statistics, and possess basic programming skills in SQL and either R or Python.

Textbooks

Methods of Evaluation

NOTE TO INSTRUCTORS It is important to mix several methods of evaluation such as individual practice, group work, and assessment. The following table should be used as a guideline for weighting each activity:

Assessment Examples of Activity At Least No More Than
Individual Homework Labs, Homework, Papers, Problem Sets, Discussion, Programming Exercises 20% 50%
Individual Assessment Exams, Tests, Quizzes 20% 50%
Group Activities Group projects, Group Papers, Group Homework 20% 30%

Topics to be covered

NOTE TO INSTRUCTORS At minimum, the following topics should be covered in the course. Full course preparations are provided for these topics:

This course will revolve around three use cases: Sentiment analysis, a prediction use case with Random Forests, and Object Recognition with Deep Learning.

Students will first learn to program on a big data analytics environment with Hadoop and Apache Spark. Then, they will learn to some fundamental machine learning concepts.

The following is the outline:

Some notes:

  1. The machine learning component will be based on Chapter 2 and Random Forest will be based on Chapter 8 of ISLR
  2. The introduction to probability will be based on I.3 and the Deep Learning component will be based on I.2, I.5, 4.3, and II.6 of DL

There is latitude for individual instructors to cover their own topics of interest. Consider using no more than 2 weeks to do this. Some suggestions might be:

  1. Big Data in the Cloud
  2. Machine learning in the Cloud
  3. Spark Streaming
  4. Spark GraphX
  5. Future trends in the Hadoop Ecosystem
  6. Kaggle Competitions
  7. Operationalizing a big data analytics project
  8. Deep learning on Hadoop