Purdue School of Engineering and Technology

Purdue School of Engineering and Technology

Big Data Analytics

CIT 49900 / 3 Cr.

This course introduces you to big data analytics using the Hadoop architecture and the Hadoop ecosystem of tools. These technologies are at the foundation of the Big Data movement, and they facilitate scalable management and processing of vast quantities of data in its various formats. Students will learn the architecture of Hadoop clusters at both the hardware and system software levels, and to apply Hadoop and related Big Data technologies such as MapReduce, Hive, Impala, and HDFS in developing analytics and solving the types of problems faced by enterprises.




Course Outcomes (What are these?)

  • Explain the Hadoop architecture (CIT a)
  • Develop and test a MapReduce application (CIT i)
  • Write and view log files (CIT i)
  • Use the Hadoop API  and the ToolRunner class (CIT i)
  • Write a custom Partitioner (CIT i)
  • Sort and search large data sets (CIT j)
  • Compute term frequency –inverse document frequency (CIT j)
  • Import data from a relational database using Sqoop (CIT j)

CIT Student Outcomes (What are these?)

(a) An ability to apply knowledge of computing and mathematics appropriate to the program’s student outcomes and to the discipline.

(i) An ability to use current techniques, skills, and tools necessary for computing practice.

(j) An ability to use and apply current technical concepts and practices in the core information technologies.

  • Hadoop Ecosystem & core technologies
  • Basic MapReduce API concepts
  • Common MapReduce algorithms
  • Using the ToolRunner class
  • The Hadoop API library
  • Log files
  • Partitioners and Reducers
  • Loading Data with Sqoop
  • Data storage and analysis with Pig, Hive and Impala


Principles of Undergraduate Learning (PULs)

2.  Critical Thinking

3.  Integration and Application of Knowledge

4.  Intellectual Depth, Breadth, and Adaptiveness