## University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Revised Syllabus

University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics and their Unit wise marks distribution

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Course Structure 2021-2022 With Marking Scheme

# | Unit/Topic | Weightage |
---|---|---|

C | Introduction to Big Data | |

CC | Introduction to Hadoop | |

CCC | NoSQL | |

CD | MapReduce and the New Software Stack | |

401 | Distributed File Systems | |

402 | MapReduce | |

403 | Algorithms Using MapReduce | |

D | Finding Similar Items | |

DC | Mining Data Streams | |

601 | The Stream Data Model | |

602 | Sampling Data in a Stream | |

603 | Filtering Streams | |

604 | Counting Distinct Elements in a Stream | |

605 | Counting Ones in a Window | |

DCC | Link Analysis | |

DCCC | Frequent Itemsets | |

801 | Handling Larger Datasets in Main Memory | |

802 | The Son Algorithm and MapReduce | |

803 | Counting Frequent Items in a Stream | |

CM | Clustering | |

M | Recommendation Systems | |

MC | Mining Social-Network Graphs | |

Total | - |

## Syllabus

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Introduction to Big Data

- Introduction to Big Data, Big Data characteristics, types of Big Data, Traditional vs. Big Data business approach, Case Study of Big Data Solutions.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Introduction to Hadoop

- What is Hadoop?
- Core Hadoop Components
- Hadoop Ecosystem
- Physical Architecture
- Hadoop limitations

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for NoSQL

- What is NoSQL? NoSQL business drivers; NoSQL case studies;
- NoSQL data architecture patterns: Key-value stores, Graph stores, Column family (Bigtable) stores, Document stores, Variations of NoSQL architectural patterns;
- Using NoSQL to manage big data:- What is a big data NoSQL solution? Understanding the types of big data problems; Analyzing big data with a shared-nothing architecture; Choosing distribution models: master-slave versus peer-to-peer; Four ways that NoSQL systems handle big data problems

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for MapReduce and the New Software Stack

- Physical Organization of Compute Nodes, LargeScale File-System Organization.

- The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution, Coping With Node Failures.

- Matrix-Vector Multiplication by MapReduce, Relational-Algebra Operations, Computing Selections by MapReduce, Computing Projections by MapReduce, Union, Intersection, and Difference by MapReduce, Computing Natural Join by MapReduce, Grouping and Aggregation by MapReduce, Matrix Multiplication, Matrix Multiplication with One MapReduce Step.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Finding Similar Items

- Applications of Near-Neighbor Search, Jaccard Similarity of Sets, Similarity of Documents, Collaborative Filtering as a Similar-Sets Problem.
- Distance Measures:- Definition of a Distance Measure, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Mining Data Streams

- A Data-Stream-Management System, Examples of Stream Sources, Stream Querie, Issues in Stream Processing.

- Obtaining a Representative Sample, The General Sampling Problem, Varying the Sample Size.

- The Bloom Filter, Analysis.

- The Count-Distinct Problem, The Flajolet-Martin Algorithm, Combining Estimates, Space Requirements.

- The Cost of Exact Counts, The Datar-Gionis-Indyk-Motwani Algorithm, Query Answering in the DGIM Algorithm, Decaying Windows.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Link Analysis

- PageRank Definition, Structure of the web, dead ends, Using Page rank in a search engine, Efficient computation of Page Rank:- PageRank Iteration Using MapReduce, Use of Combiners to Consolidate the Result Vector.
- Topic sensitive Page Rank, link Spam, Hubs and Authorities.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Frequent Itemsets

- Algorithm of Park, Chen, and Yu, The Multistage Algorithm, The Multihash Algorithm.

- Sampling Methods for Streams, Frequent Itemsets in Decaying Windows.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Clustering

- CURE Algorithm, Stream-Computing, A Stream-Clustering Algorithm, Initializing & Merging Buckets, Answering Queries.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Recommendation Systems

- A Model for Recommendation Systems, Content-Based Recommendations, Collaborative Filtering.

### University of Mumbai Semester 8 (BE Fourth Year) Big Data Analytics Syllabus for Mining Social-Network Graphs

- Social Networks as Graphs, Clustering of Social-Network Graphs, Direct Discovery of Communities, SimRank, Counting triangles using MapReduce.