01 - Introduction to Mongodb

MongoDB Introduction

MongoDB is a DOCUMENT Type NoSQL Database. It is an open source product. We all understand what a DATABSE is and what it is used for? A database is a container that stores physical data in a specified format. Each database has its own filesystem. Now, we will try to understand what is document oriented and what is NoSQL?

 

Document Type

Being a Document Type means the rows will be stored in the form of JSON type document

NoSQL

It is Not Only SQL. NoSQL database handles structured and unstructured data. Compared to RDBMS it provides lesser functionality but higher performance in terms of amount of data handling and execution time.

As we all understand the RDBMS, we will first focus on the popular terminology of MongoDB in comparison to RDBMS. Here we go –

COLLECTION – A Collection in document database is similar to a table in RDBMS.

DOCUMENT – A Document in document database is similar to a row in RDBMS.

FIELD – Field in document database is similar to column in RDBMS.

DATABASE – It is referred to as it is in RDBMS.

As Table contains many Rows in RDBMS, in the same way a Collection contains many Documents. I would like to highlight one difference here, you will find that in RDBMS all the Rows contain same number of Columns for a single Table. However, in a Collection all the Documents need not have same Field. This means different Documents can have different Fields.

Benefits of MongoDB

The following are some of the benefits  that MongoDB offers:

  • Provide higher flexibility as MongoDB is a document oriented database where Documents in Collections  need not be have same Fields
  • Used for both Structured and Unstructured data
  • There is no typical join as such
  • Modification of Collection is far simpler
  • Definitely write SQL like language
  • As we mentioned that compared to RDBMS, it has lesser functionality but higher performance
  • Highly scalable
  • Data retrieval is faster due to its architecture
  • Highly Available
  • Index can be applied on any of the attributes 

MongoDB makes integration of data very easy as well as fast. MongoDB is released under 2 licenses:

  1. GNU Affero General Public License
  2. Apache License

However MongoDB is free and an open source product. Note that MongoDB Inc. also offers proprietary licenses. Very first, MongoDB was developed by 10gen in October 2007. Initially it was developed as platform and later in 2009 shifted to development model of open source. 10gen was offering support and many services. Now 10gen is MongoDB and it is backend software for many websites and services. It is also a fact that MongoDB is one of the most popular NOSQL Database.

Main Feature of MongoDB 

  • Document Oriented

In normal database design, we take entity data and divide into lot of relationally related components. MongoDB stores the business subjects in documents. Instead of storing Student and Subject as 2 different Relational Structure, all these information can be stored in a single Document as School. In this way it would be more intuitive and easy to work.

  • Indexing

Every field present in MongoDB document can be indexed. Secondary Indexes are also available with MongoDB.

  • Replication

MongoDB is a high available system that also provides Replica Sets. Replica Set means, there can be two or more copies of the data can be present, where each Replica is capable of playing a role of Primary or Secondary Replica at any point of time. By default all the Read and Write operations will be performed by the Primary Set. Secondary replica maintains a copy of the data. In the event of Primary Replica’s failure, an automatic selection process would be conducted to select which Secondary Replica should become Primary Replica.

  • Ad Hoc Queries

MongoDB supports following searches -:

               By Field

               By Range Query

               By Regular Expression

  • Load Balancing

MongoDB uses sharding to implement horizontal scaling. User has an option to choose the shard key that helps to determine how data will be distributed in a collection. Based on the shard key, the data gets split into ranges and then distributed across more than one shards. Note that shard is basically a master that includes one or more slaves. MongoDB implements Load Balancing Capability by running over multiple servers. It duplicates the data as well to keep the system running in the event of hardware failure.

  • Automatic Configuration

MongoDB has the feature of Automatic Configuration which helps for the easy deployment. It is very easy to deploy and add new machines to a running database.

  • File Storage

MongoDB as a database and a data collection system ultimately is a type of file system only. Now we understand 2 features (1. Load Balancing 2. Data Replication) of MongoDB that are getting driven through effective file system mechanism, wherein overall storage is distributed across multiple machines. This is popularly known as GridFS (Grid File System) which is an integral part of MongoDB drivers and available without any difficulty for the purpose of development.

These are the places where GridFS is used -:

            In plugins for

               NGINX

               Lighttpd

Here is the working mechanism of GridFS -:

GridFS does not store the file into a single document rather it divides the file into chunks and then store each and every different chunk as a separate document. Note that in a MongoDB system with multiple machines, files can be copied multiple times by distributing them across machines transparently and it assures an effective Load Balancing and Fault Tolerant System.

  • Aggregation

For the data Aggregation purpose, MapReduce paradigm can be utilized wherein a batch processing of data happens. This framework enables user to get same kind of results that they receive while running SQL GROUP BY clause.

MapReduce is a framework where computation of large volumes of data happens in a parallel manner across cluster of computers. The basic feature of Map Reduce paradigm is to divide the task among various Machines so that to perform execution on these Machines and then in return collect the output from all these Machines. The first part where we divide the task across cluster of computers is known as Map while getting the output and collect it back is known as Reduce.

  • Capped Collections

MongoDB supports Capped Collections which is a fixed size collection where it maintains insertion order. Note that once the specified size is reached then it starts to act like a Circular Queue.

Like us on Facebook