India's first PG program of its kind!
- Comprehensive curriculum created by BITS & Industry Experts
- 5 practical industry projects, sponsored by Saavn
- Industry mentors, mock interviews and career support
- Offline workshops with industry, peer and faculty interactions
Program Syllabus
The curriculum has been developed by BITS faculty and leading Big Data companies. Most courses have an independent industry-sourced project that will be deployed by you on AWS Cloud. This syllabus will teach you end to end skills - a thorough understanding of fundamental concepts and thinking beyond tools!
Preparatory Sessions
If you don’t have previous experience in programming or databases (SQL) , don't worry! By enrolling for the program, you get access to completely free, pre-program preparatory sessions which will augment your skills in fundamental Computer Science concepts.
Topics Covered:
Object Oriented Programming (OOP) using JAVA
- Data Structures
- Design and Analysis of Algorithms
- Relational Database Management Systems (SQL)
Prep Sessions will be available to students upon enrolment.
To learn more about why should you be taking prep sessions,
Foundations of Big Data Systems
Duration : 8 weeks
In this course you will be given an introduction to Big Data and its common industry applications. You will also develop important foundations in data structures and algorithms that form the basis of the Big Data Systems used in the industry.
Topics Covered:
Introduction to Big Data and its Applications
- Data Abstraction
- Linear data structures like Hashtables, Hashmaps, Bloom Filters
- Non-linear data structures like Binary Search Trees, KD Trees
- Distributed Algorithm Design
- Algorithm Design using MapReduce
Course Outcomes:
You will be able to select and implement appropriate data structures to solve big data problems and also write Map and Reduce codes for distributed processing of data.
Programming Language Used: Java
Processing Big Data - ETL & Batch Processing
Duration : 7 weeks
Learn about collecting and processing structured and unstructured data by performing ETL operations. Use workflow manager tools to learn automation of task flows
Topics Covered:
- Performing ETL Operations
- Concepts in Data Warehousing and its Relevance for Big Data
- Ingesting data into Big Data Platforms using Apache Sqoop & Flume
- Workflow management for Hadoop using OOZIE
- Batch Processing on Cloud
Course Outcomes:
You will learn to choose and use tools to ingest structured and unstructured data into big data processing systems and use Hive to perform data transformations. You will also be able to process Big Data on Cloud using Amazon EMR and use OOZIE for managing your workflow.
Tools & Technologies Used: Sqoop, Apache Flume, Apache Hive, HBase, Amazon EMR
Processing of Real Time Data & Streaming Data
Duration : 4 weeks
Ever wondered how you receive a notification based on your location? The answer lies in exploiting Real Time & Streaming Data. This course will expose you to the exciting world of processing real time data.
Topics Covered:
Applications of Streaming Data in Industry
- Sourcing Streaming data using Apache Flume
- Building real-time data pipeline using Apache Storm
- Streaming on Apache Spark
Course Outcomes:
You will be able to build real time data processing systems using Apache Storm and Apache Spark
Tools & Technologies Used: Apache Storm, Apache Flume, Apache Spark
Big Data Analytics
Duration : 5 weeks
In this course you will be introduced to the field of Big Data Analytics and you will learn about the libraries in Apache Spark used to perform Regression, Classification, Clustering on Big Data.
Topics Covered:
Regression, Clustering & Classification using Spark MLLib
- Building visualizations using Big Data
- Case Studies on applications of Big Data Analytics
Course Outcomes:
- You will be able to perform analytics on the big data using Spark MLLib and get knowledge of tools to visualize results.
- Interested students will also have an opportunity to learn the basics of functional programming in Scala*
Tools & Technologies used:
Spark (MLLib) and Scala*