Big Data Analytics
CIT 42100/ 3 Cr.
This course will cover both the fundamentals and concepts of data analytics. The focus is on emerging advanced data analytics techniques and their applications to practical problems for different disciplines, such as IT, health care, and economics. Machine learning algorithms and distributed computing environments will be explored.
- Available Online: Yes
- Credit by Exam: No
- Laptop Required: No
Prerequisites/Co-requisites:
P: CIT 31400 and CIT 32000 and CIT 38800.
Software
Hadoop
Outcomes
Course Outcomes(What are these?)
- Explain the Hadoop architecture (CIT a)
- Develop and test a MapReduce application (CIT i)
- Write and view log files (CIT i)
- Use the Hadoop API and the ToolRunner class (CIT i)
- Write a custom Partitioner (CIT i)
- Sort and search large data sets (CIT j)
- Compute term frequency –inverse document frequency (CIT j)
- Import data from a relational database using Sqoop (CIT j)
CIT Student Outcomes(What are these?)
(a) An ability to apply knowledge of computing and mathematics appropriate to the program’s student outcomes and to the discipline.
(i) An ability to use current techniques, skills, and tools necessary for computing practice.
(j) An ability to use and apply current technical concepts and practices in the core information technologies.
Topics
- Hadoop Ecosystem & core technologies
- Basic MapReduce API concepts
- Common MapReduce algorithms
- Using the ToolRunner class
- The Hadoop API library
- Log files
- Partitioners and Reducers
- Loading Data with Sqoop
- Data storage and analysis with Pig, Hive and Impala
Principles of Undergraduate Learning (PULs)
2. Critical Thinking
3. Integration and Application of Knowledge
4. Intellectual Depth, Breadth, and Adaptiveness