Thursday, 16 June 2022

Basics of MongoDB



MongoDB

MongoDB is a NoSQL Database. It uses documents to store data in an organized way. MongoDB is designed to meet the demands of modern apps with a technology foundation that enables :

  • Document data model – presenting the best way to work with semi structured data.

  • Distributed systems design – allowing to intelligently put data in different server and locations, scale up and out dynamically.

  • OS and cloud agnostic – allowing to run from different OS and eliminate vendor lock-in.


MongoDB Architecture

In the data driven world we need generate and save loads of data, as the data grows exponentially and it's impossible to save all the data in the single host and have it durability all the time. Through Replica Sets and Sharding MongoDB, this is resolved which in turn brings in high resiliency and builds distributed system design. Below let's see each components from the below architecture, what happens in the background when the application server hits the mongoDB server to fetch or feed data. Well for me this architecture appears to be the best from my research, great job for the author, credits mentioned.


Credits: https://www.livescript.in/2018/10/sharding-in-mongodb.html

Replica Sets

  • Replica sets have been designed for high resiliency
  • If the primary node goes down, from secondary node cluster will be automatically elected as primary
  • For client it would be seem less or with small glitch which can be fixed with retry
  • Replica set can hold of 50 nodes in total
  • For selection of the new primary, the election occurs by voting and algorithmically, some conditions such as:
    • Most recent updates from the primary data node
    • Timestamp and heartbeat status of the latest
    • History of the connectivity status about other secondary nodes
    • User defined priority
  • Node Types
    • Primary
      • All the write happens in this node
      • We have only one node as primary node
      • At times we may have another node acting as primary temporarily, causing the split brain issue, leading into two leaders when there is disconnect between set of nodes. 
      • Ex: We have 6 nodes in 2 data centre. 1 node is primary. Due to network issue, connectivity stops between them, causing network partition. As in other data centre there is no primary node, among the 3, 1 is selected. As you could see we have 2 primary nodes at present, which is not recommended.
      • This is a temporary issue, as there can be only max 7 voting members. It needs to be distributed across. 
      • If the current primary cannot see a majority of voting members, it will step down and become a secondary.
      • Meanwhile only one primary node with { w: "majority" } write concern will be capable in confirming the writes. 
    • Secondary
      • All the writes done in the primary node, is replicated in all the secondary nodes by applying the oplogs
      • When the primary node goes down, from these set of secondary nodes one is elected as primary
    • Arbiter
      • Arbiters are mongod instances that are part of a replica set but do not hold data (i.e. do not provide data redundancy). 
      • But it can participate in elections.
      • Arbiters have minimal resource requirements and do not require dedicated hardware. 
  • Consistency
    • Read Concern
      • The readConcern option allows to control the consistency and isolation properties of the data read from replica sets and replica set shards.
      • Levels
        • local: The query returns data from the instance with no guarantee that the data has been written to a majority of the replica set members
        • available: Identical to the "local" in unshared collections. But in a sharded cluster we have orphaned documents, these documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdown. In "local" the reads require communication with primary shard (if read is on secondary) or the config servers to service the read. Whereas "available" option does not contact the shard's primary nor the config servers for updated metadata and may return orphan documents.
        • majority: Only returns data that was written to the majority of voting nodes and will not be rolled back.
        • linearizable: The query returns data that reflects all successful majority acknowledged writes that completed prior to the start of the read operation. The query may wait for concurrently executing writes to propagate to a majority of replica set members before returning results. Also helpful when there is network partition issue, the latest write should not be missed giving the stale information to the client. Beautifully explained in the below link.
          • https://stackoverflow.com/questions/42615319/the-difference-between-majority-and-linearizable
        • snapshot: Each writes are synchronized across host based on the time. The snapshot option reads from the latest synchronized time from all the nodes. As from the documentation it is not clear for me, referred the below link explaining the concept though not confirmed to be right.
          • https://stackoverflow.com/questions/53908672/whats-the-difference-of-majority-committed-data-and-the-snapshot-of-majority
    • Write Concern
      • Write concern describes the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters.
      • Levels
        • majority: Requests acknowledgment that write operations have propagated to the primary and based on the calculated majority number for the secondary voting members. Eg: if we have P-S-S replica set the write should be propagated to primary and one secondary. Another Eg: if we have P-S-A replica set the write should have been propagated to primary and secondary as well.
        • <number>: Requests acknowledgment that the write operation has propagated to the specified number of mongod instances. No acknowledgment for the write operation with w is set to 0.
        • <custom write concern name>: Requests acknowledgment that the write operations have propagated to tagged members that satisfy the custom write concern defined in settings.getLastErrorModes.


Sharding

  • As we know all the data cannot be saved in the single disk, we would need to have horizontal scaling.
  • Sharding is the process of enabling horizontal scaling seamlessly beyond the hardware limits of the single server.
  • Shards are used to store data in distributed systems using the shard key.
  • Shard key, which determines how data is distributed across a sharded cluster. As we can modify the shard key, MongoDB will automatically rebalance data across shards as needed without manual intervention.
  • As seen from the above diagram it's advisable to have separate replica set for each shard, providing high availability and data consistency.
  • Chunks are subsets of shared data. MongoDB separates sharded data into chunks that are distributed across the shards in the shared cluster. Each chunk has an inclusive lower and exclusive upper range based on the shard key. A balancer specific for each cluster handles the chunk distribution.

  • Sharding Options
    • Ranged Sharding: Documents are partitioned across shards according to the shard key value.
    • Hashed Sharding: Documents are distributed according to an MD5 hash of the shard key value, providing even distribution.
    • Zoned Sharding: Allows developers to define specific rules governing data placement in a sharded cluster.
  • Note: Sharding requires careful planning, as the shard key directly impacts the overall performance of the underlying cluster, as it is used to identify all the documents within the collections.


Config Server

  • Config servers store the cluster's metadata. 
  • This data contains a mapping of the cluster's data set to the shards. 
  • The metadata includes the list of chunks on every shard and the ranges that define the chunks.
  • The query router (mongos process) uses  and caches this metadata to redirect target operations to specific shards. 
  • Config servers needs to be placed in the replica set to enable consistency and resiliency, which can be upto 50 instances.
  • To deploy config servers as a replica set, the config servers must run the WiredTiger storage engine (we will see this below).

  • If the config server replica set loses its primary and cannot elect a primary, the cluster’s metadata becomes read only. You can still read and write data from the shards, but no chunk migration or chunk splits will occur until the replica set can elect a primary.
  • In a sharded cluster, mongod and mongos instances monitor the replica sets in the sharded cluster (e.g. shard replica sets, config server replica set)

Daemon mongod 

  • mongod is the primary daemon process for the MongoDB system.
  • It handles data requests, manages data access, and performs background management operations.
  • mongod in Linux and mongod.exe in Windows
  • Usually the data is stored in /data/db
  • Cannot have two mongoDB instance running on the same port 27017

Daemon mongos

  • For a sharded cluster, the mongos instances provide the interface between the client applications and the sharded cluster.
  • From the config server the mongos instances cache the meta data and use it to route read and write operations to the correct shards. Also called as query routers.
  • mongos updates the cache when there are metadata changes for the cluster, such as Chunk Splits or adding a shard. 
  • A sharded cluster can contain more than one query router to divide the client request load.



MongoDB Storage Engine

  • The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and on disk. 
  • MongoDB supports multiple storage engines, as different engines perform better for specific workloads. 
  • Choosing the engine varies based on the need of the application.
  • Below are three engines:
    • WiredTiger
      • The WiredTiger storage engine is the default storage engine starting in MongoDB version 3.2. 
      • It provides a document-level concurrency model - allowing multiple users to write on the same collection in different documents at same time, checkpointing, compression, and other features.
    • Encrypted storage engine
      • The data is encrypted at rest by using this engine, to read the data requires the decrytion key.
      • In MongoDB Enterprise edition this is available.
    • In-Memory storage engine
      • All the data including the configurations are stored in the memory.
      • Provides quicker and low latency, by avoiding I/O from disks.
      • Caution: Data is not persistence


MongoDB Data Storage Hierarchy

  • Clusters

    • Database - <Collection of Collections>

      • Collections - <Similar to RDBMS Table>

        • Document - <JSON single record> <Single Row>

          • Key and Value pair


Document

A document is a way to organise and store data as a set of field-value pairs. 

  •     Field - a unique identifier for a datapoint.
    • Key -> Should not connect \0, . and $

  •     Value - data related to a given identifier.

The documents are viewed in the JSON format. JSON stands for Javascript Standard Object Notation. 

JSON object is defined in following format:

  • starts and ends with curly braces {}
  • separate each key and value with a colon :
  • separate each key:value set with comma ,
  • keys must be written within double quotes “”
  • based on the datatype of value it should be represented accordingly
  • value can be number, string, objects (sub document)

Since there are few limitation in storing data in JSON format its stored in BSON format.

JSON limitations:

  • Its very much readable but its text based consuming more space

  • Limited datatypes are supported

  • JSON only supports UTF-8 format

The documents are stored in the BSON format. BSON stands for Binary JSON. BSON addresses the limitations of JSON by storing the data in the binary format. It is optimized for better speed, limited space usage and high performance. Supports multiple datatypes. BSON is faster to parse and lighter to store than JSON. But the limitation is it's not human readable, only machines can read. 

Syntax: 

{

<field>: <value>, 

<field>: <value>,

<field>: <value>,

}

Example:

{

                    _id:  ObjectId("5099803df3f4948bd2f98391")

name: “Ram”,

age: 12,

standard: 8

address: {

door: 12,

area: “xyz”

}

}

_id:

  • every document must have a unique _id value
  • ObjectId(): default value for the _id value
  • if not mentioned, its autogenerated
  • structure -> total 12bytes
    • 0-4bytes: timestamp in seconds since the epoch
    • 5-9bytes: random
    • 10-12bytes: counter -> to avoid collide with ObjectIDs on different machines

Data Types:

- Null

        {"x": null}

- Boolean

        {"x": true}

- Number

        {"x": 3}

        {"x": 3.14}

        {"x": NumberInt(3)}

        {"x": NumberLong(3)}

- String

        {"x": "foobar"}

- Date

        Stores date as 64-bit integers epoch time

        Note: timezone is not stored, can store in another key

        {"x": new Date()}

- Regex

        Queries can use Javascript regex

        {"x": /foobar/i}

- Array

        {"x": ["a", "b", "c"]}

- Embedded document

        {"x": {"a": "1", "b": "2", "c": "3"}}

- Object ID

        {"x": ObjectID()}

- Binary data

- Code

        {"x": function() {/*..*/} }



Collection

An organized store of documents in MongoDB, usually with common fields between documents. Documents are stored in the collections. There can be many collections per database and many documents per collection.

Notes:

  • Collection:
    • Should not connect \0, . and $
    • Should not start with system.
  • SubCollection: 
    • Syntactic Sugar
    • To hold sub collections

Example:

> show collections

companies

grades

inspections

posts

routes

trips

zips


> db.zips.find({"state": "NY"})

[

  {

    _id: ObjectId("5c8eccc1caa187d17ca72f89"),

    city: 'FISHERS ISLAND',

    zip: '06390',

    loc: { y: 41.263934, x: 72.017834 },

    pop: 329,

    state: 'NY'

  },

  {

    _id: ObjectId("5c8eccc1caa187d17ca72f8a"),

    city: 'NEW YORK',

    zip: '10001',

    loc: { y: 40.74838, x: 73.996705 },

    pop: 18913,

    state: 'NY'

  },

....

]


> db.zips.find({"state": "NY"}).count()

1596


Database

A database is structured way to access data. MongoDB is NoSQL database, which means the data is not saved in rows nor columns.

Data in MongoDB is stored in Document as described above. Collections are stored in the Database.

Folder structure:

  • admin
    • authentication and authorization
  • local
    • database stores data specific to a single server.
    • stores the data used in the replication process
    • local itself is never replicated
  • config
    • stores information about each shard

Example:

> show dbs

sample_airbnb       55.1 MB

sample_analytics    9.94 MB

sample_geospatial   1.06 MB

sample_mflix        47.9 MB

sample_restaurants  7.18 MB

sample_supplies     1.02 MB

sample_training       51 MB

sample_weatherdata  2.52 MB

admin                377 kB

local               10.1 GB


> use sample_training

switched to db sample_training


MongoDB Cloud - Atlas

Atlas is one of the MongoDB cloud. Atlas free shared clusters, creates three replicas.

Replica Set - a few connected machines that store the same data to ensure that if something happens to one of the machines the data will remain intact. Comes from the word replicate - to copy something.

Instance - a single machine locally or in the cloud, running a certain software, in our case it is the MongoDB database.

Cluster - group of servers that store your data.

Below are three common ways to connect MongoDB Cloud (also any MongoDB Instance), Mongo Shell, Compass and Application.


MongoDB Client

  • Mongo Shell
    • written in javascript
    • accepts all the javascript functions
    • use help, to example commands

                > help

                > db.listingsAndReviews.updateOne.help

                        db.collection.updateOne(filter, update, options):

                        Updates a single document within the collection based on the filter.

                        

                > db.listingsAndReviews.updateOne

                [Function: updateOne] AsyncFunction {

                  apiVersions: [ 1, Infinity ],

                  serverVersions: [ '3.2.0', '999.999.999' ],

                  returnsPromise: true,

                  topologies: [ 'ReplSet', 'Sharded', 'LoadBalanced', 'Standalone' ],

                  returnType: { type: 'unknown', attributes: {} },

                  deprecated: false,

                  platforms: [ 0, 1, 2 ],

                  isDirectShellCommand: false,

                  acceptsRawInput: false,

                  shellCommandCompleter: undefined,

                  help: [Function (anonymous)] Help

                }

    • running script with the shell

                option 1> mongo script1.js script2.js

                option 2> load("script1.js")

                example>

                        show dbs

                        db.getMongo().getDBs()

    • mongorc.js
      • frequently used scripts can be loaded from here
    • editor
      •  EDITOR="/usr/bin/emacs"

  • Compass
    • Compass is an interactive tool for querying, optimizing, and analyzing the MongoDB data.
    • Get key insights, drag and drop to build pipelines, and more.
  • Application

    • MongoDB is widely used across various web applications as the primary data store. 
    • One of the most popular web development stacks, the MEAN stack employs MongoDB as the data store (MEAN stands for MongoDB, ExpressJS, AngularJS, and NodeJS).
    • Other languages also have client libraries to work with MongoDB such as Python, Java, Ruby, etc.


Import and Export Data

Below commands helps to get and set the data from or to mongoDB database.

JSON:

  • Import

    • mongodump --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies"

      • --uri (uniform resource identifier, srv establishes secure connection)

      • srv : connection string - a specific format used to establish a connection between your application and a MongoDB instance.

  • Export

    • mongoexport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --collection=sales --out=sales.json

BSON:

  • Import

    • mongoimport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop sales.json

  • Export

    • mongorestore --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop dump


Query

  • findOne -> will return one document(row) from the collection
  • insertOne -> one document will be inserted
  • updateOne -> one document will be updated
  • deleteOne -> one document will be deleted
  • find -> Returned records are not ordered.

db.zips.find({"state": "NY"})

        # resuts of it iterates through the cursor.

        # cursresultsor: pointer of the result set of query

        # pointer: a direct address of the memory location

db.zips.find({"state": "NY"}).count()

db.zips.find({"state": "NY", "city": "ALBANY"})

db.zips.find({"state": "NY", "city": "ALBANY"}).pretty()

  • updateOne and updateMany,
    • take a filter document as their first parameter
    • and a modifier document which describes changes to make as the second parameter
  • replaceOne
    • take a filter document as their first parameter
    • and the second parameter will replace the document matching the filter
  • drop
    • when all collections are dropped from a database, the database no longer appears in the list of databases when you run show dbs.
  • To avoid race conditions below are preferred methods
    • findOneAndDelete
    • findOneAndUpdate
    • findOneAndReplace
  • Aggregation Framework:
    • Aggregation operations process multiple documents and return computed results. 
    • Different Examples:
    • Find all documents that have Wifi as one of the amenities. Only include price and address in the resulting cursor.
db.listingsAndReviews.find({ "amenities": "Wifi" },
                           { "price": 1, "address": 1, "_id": 0 }).pretty()

    • Using the aggregation framework find all documents that have Wifi as one of the amenities``*. Only include* ``price and address in the resulting cursor.

db.listingsAndReviews.aggregate([

                                  { "$match": { "amenities": "Wifi" } },

                                  { "$project": { "price": 1,

                                                  "address": 1,

                                                  "_id": 0 }}]).pretty()

    • Find one document in the collection and only include the address field in the resulting cursor.

db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 })

    • Project only the address field value for each document, then group all documents into one document per address.country value.

db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }},

                                  { "$group": { "_id": "$address.country" }}])

    • Project only the address field value for each document, then group all documents into one document per address.country value, and count one for each document in each group.

db.listingsAndReviews.aggregate([
                                  { "$project": { "address": 1, "_id": 0 }},
                                  { "$group": { "_id": "$address.country",
                                                "count": { "$sum": 1 } } }

])

  • Sort and Filter:

db.zips.find().sort({ "pop": 1 }).limit(1)


db.zips.find({ "pop": 0 }).count()


db.zips.find().sort({ "pop": -1 }).limit(1)


db.zips.find().sort({ "pop": -1 }).limit(10)


db.zips.find().sort({ "pop": 1, "city": -1 })

  • Cursor Methods: 
    • Cursor is pointer to the result, from where we can access the data iteratively.
    • For example when the find() method is used to find the documents present in the given collection, then this method returned a pointer which will points to the documents of the collection, now this pointer is known as cursor
    • Queries returns a database cursor, which lazily returns the batches of documents as needed.
    • There are a lot of meta operations one can perform on a cursor, including skipping a certain number of results, limiting the number of results returned and sorting the results.
    • Applied to the results Methods are:
      • sort
      • limit
      • pretty
      • count
  • Indexes:
    • Indexes support the efficient execution of queries in MongoDB. 
    • Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. 
    • If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
    • Examples:

db.trips.find({ "birth year": 1989 })


db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )


db.trips.createIndex({ "birth year": 1 })


db.trips.createIndex({ "start station id": 1, "birth year": 1 })

    • The index is a special data structure - B-Tree, which stores the value of a specific field or a set of fields, ordered by the value of the field.
    • The ordering of the index entries supports efficient equality matches and range-based query operations. 
    • Using B-Tree indexes significantly reduces the number of comparison to find the document.
    • Below example shows the query select using the index
Credits: https://www.mongodb.com/docs/manual/indexes/
    • Index Types
      • Single Field
        • By default _id field is indexed
        • Additionally any field from the document can be indexed
        • This increases the performance of the query for the select operations done on the indexed field, such as sort by the field or search by it
      • Compound Index
        • Compound index that support queries on multiple fields
        • The order of the indexed fields has a strong impact on the effectiveness of a particular index for a given query
      • Multikey Index
        • Multi key index is used to index for the content stored in an array
        • It creates the separate in index entries for each value in the array
      • Geospatial Index
        • Index on the geospatial data for better performance
        • Two special indexes: 2d indexes that uses planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results

      • Text Index
        • Text index supports queries on the string content in a collection
        • Can have more than one string field for creating this index
        • The weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score. Eg: Below there are 3 fields used in creation of index, each carries its own weights:

          db.blog.createIndex(
          {
          content: "text",
          keywords: "text",
          about: "text"
          },
          {
          weights: {
          content: 10,
          keywords: 5
          },
          name: "TextIndex"
          }
          )
        • For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results
        • Using this sum, MongoDB then calculates the score for the document.
      • Hash Index
    • Index Properties
      • Unique
      • Partial
      • Sparse
      • TTL
      • Hidden
  • Upsert:
    • Hybrid of update and insert
    • By default set to false
    • If match update will happen else insert will happen
    • Example

db.iot.updateOne({ "sensor": r.sensor, "date": r.date,

                   "valcount": { "$lt": 48 } },

                         { "$push": { "readings": { "v": r.value, "t": r.time } },

                        "$inc": { "valcount": 1, "total": r.value } },

                 { "upsert": true })

  • Transaction
    • The multi-document transactions that contain read operations must use read preference primary. 
    • Until a transaction commits, the data changes made in the transaction are not visible outside the transaction

MQL Operators

  • Update Operators:
    • $inc: increment
    • $set: set the value
    • $unset: unset the value
  • Query Operators: Locate the data
    • Used to query for ranges, set inclusions, and many more by using $ conditionals.
    • Below are few $ conditionals.
      • $ne
      • $eq
      • $gt
      • $lt
      • $gte
      • $lte
      • $in -> in array
      • $nin -> not in array
      • $not
      • $or -> condition
      • $and -> condition
      • $mod -> modulus, queries the keys whose values, when divided by the first value given, have a remained of the second value
      • $regex -> regular expression, using Perl Compatible Regular Expression
      • $all -> if you need to match arrays by more than one element
      • $size -> size of the array
      • $slice -> return a subset of elements for an array key
      • $elemMatch
    • Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was not Subscriber:

db.trips.find({ "tripduration": { "$lte" : 70 },

                "usertype": { "$ne": "Subscriber" } }).pretty()

    • Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was Customer using a redundant equality operator:

db.trips.find({ "tripduration": { "$lte" : 70 },

                "usertype": { "$eq": "Customer" }}).pretty()

    • Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was Customer using the implicit equality operator:

db.trips.find({ "tripduration": { "$lte" : 70 },

                "usertype": "Customer" }).pretty()

  • Logic Operators
    • $and
    • $or
    • $nor
    • $not
    • Find all documents where airplanes CR2 or A81 left or landed in the KZN airport:

db.routes.find({ "$and": [ { "$or" :[ { "dst_airport": "KZN" },

                                    { "src_airport": "KZN" }

                                  ] },

                          { "$or" :[ { "airplane": "CR2" },

                                     { "airplane": "A81" } ] }

                         ]}).pretty()

  • Expressive Operators
    • Allows the use of aggregation expressions within the query language.
    • Examples:
    • Find all documents where the trip started and ended at the same station:

db.trips.find({ "$expr": { "$eq": [ "$end station id", "$start station id"] }

              }).count()

    • Find all documents where the trip lasted longer than 1200 seconds, and started and ended at the same station:

db.trips.find({ "$expr": { "$and": [ { "$gt": [ "$tripduration", 1200 ]},

                         { "$eq": [ "$end station id", "$start station id" ]}

                       ]}}).count()

  • Array Operators
    • $push -> adds elements to an array
    • $pop / $pull -> removes elements from an array
    • $each -> modifier adds multiple values to an array
    • $slice -> projection operator specifies the number of elements in an array to return in the query result
    • $sort -> sorts all input documents and returns them to the pipeline in sorted order
    • $addToSet -> only unique values in array
    • $upsert -> if no record is found and insert will happen
    • Examples:
    • Find all documents with exactly 20 amenities which include all the amenities listed in the query array:

db.listingsAndReviews.find({ "amenities": {
                                  "$size": 20,
                                  "$all": [ "Internet", "Wifi",  "Kitchen",
                                           "Heating", "Family/kid friendly",
                                           "Washer", "Dryer", "Essentials",
                                           "Shampoo", "Hangers",
                                           "Hair dryer", "Iron",
                                           "Laptop friendly workspace" ]
                                         }

                            }).pretty()

  • Project and $elemMatch
    • Specifies the fields that should or not be included in the result cursor
    • Syntax: db.<collection>.find({<query>}, {<projection})
    • Do not combine 1s and 0s in the projection, except for {_id: 0, <fields>: 1}
    • {<field>: {"$elemMatch": {<field>: <value>}}}
      •  Matches documents that contain an array field with at least one element that matches specified query criteria
      • (or)
      • Projects only the array elements with at least one element that matches the specified criteria
    • Examples:
    • Find all documents with exactly 20 amenities which include all the amenities listed in the query array, and display their price and address:

db.listingsAndReviews.find({ "amenities":
        { "$size": 20, "$all": [ "Internet", "Wifi",  "Kitchen", "Heating",
                                 "Family/kid friendly", "Washer", "Dryer",
                                 "Essentials", "Shampoo", "Hangers",
                                 "Hair dryer", "Iron",
                                 "Laptop friendly workspace" ] } },

                            {"price": 1, "address": 1}).pretty()

    • Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor:

db.listingsAndReviews.find({ "amenities": "Wifi" },

                           { "price": 1, "address": 1, "_id": 0 }).pretty()

    • Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor, also exclude ``"maximum_nights"``. **This will be an error:*

db.listingsAndReviews.find({ "amenities": "Wifi" },

                           { "price": 1, "address": 1,

                             "_id": 0, "maximum_nights":0 }).pretty()

    • Get one document from the collection:

db.grades.findOne()

    • Find all documents where the student in class 431 received a grade higher than 85 for any type of assignment:

db.grades.find({ "class_id": 431 },

               { "scores": { "$elemMatch": { "score": { "$gt": 85 } } }

             }).pretty()

    • Find all documents where the student had an extra credit score:

db.grades.find({ "scores": { "$elemMatch": { "type": "extra credit" } }

               }).pretty()

  • Querying Arrays and Sub Documents:

db.trips.findOne({ "start station location.type": "Point" })


db.companies.find({ "relationships.0.person.last_name": "Zuckerberg" },

                  { "name": 1 }).pretty()


db.companies.find({ "relationships.0.person.first_name": "Mark",

                    "relationships.0.title": { "$regex": "CEO" } },

                  { "name": 1 }).count()


db.companies.find({ "relationships.0.person.first_name": "Mark",

                    "relationships.0.title": {"$regex": "CEO" } },

                  { "name": 1 }).pretty()


db.companies.find({ "relationships":

                      { "$elemMatch": { "is_past": true,

                                        "person.first_name": "Mark" } } },

                  { "name": 1 }).pretty()


db.companies.find({ "relationships":

                      { "$elemMatch": { "is_past": true,

                                        "person.first_name": "Mark" } } },

                  { "name": 1 }).count()


Data modeling

  • Data modelling - a way to organize fields in a document to support your application performance and querying capabilities.
  • Avoid redundant data with different _id value, unnecessary memory is occupied
  • Avoid completely different set of key value pairs in single document
  • Foresee a heavy query usage then consider the use of indexes in your data model to improve the efficiency of queries.

Pros using MongoDB

  • More flexibility in the data model, adds change friendly design and quicker releases
  • Scalability of data easily
  • Distributed system and cloud computing delivers resiliency
  • Supports various data types with ease such as time series, geospatial, polymorphic data, etc
  • Create rich data driven application
  • Balanced high performance reads (indexes) and writes
  • As data grows sharding helps by horizontally scaling and saving data across multiple instances
  • Cost effective as its has its open source version available
  • Data model, as simple JSON, leads in faster to understand and develop 
  • Also easy installation

Challenges and Disadvantages

  • As distributed in nature transactions is challenge, though from MongoDB 4.2 the distributed transactions can be used across multiple operations, collections, databases, documents, and, starting in MongoDB 4.2, shards with strong consistency
  • Joins are not like traditionally done, yet we have stage called $lookup for joins but these come with high memory cost.
  • Limited document data size upto 16MB.
  • Limited number of levels in nesting. 

Conclusion

At a high level we have seen about MongoDB. Each of these topics itself can be talked to its full length and breadth. Additionally there are several topics in MongoDB which are not covered here. For now we could sense the easiness, resiliency and performance and consistency - effective read and writes. As MongoDB is still evolving the demerits would be fixed in near future.


References

https://www.mongodb.com
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne
https://medium.com/swlh/mongodb-indexes-deep-dive-understanding-indexes-9bcec6ed7aa6
https://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf
https://stackoverflow.com/questions/58814041/in-mongodb-why-is-read-concern-available-default-option-for-secondaries-in-no
https://www.bmc.com/blogs/mongodb-sharding-explained
MongoDB Course


2 comments:

  1. Very nice and comprehensive information for a beginner to start off with. Thank you.

    ReplyDelete

Scarcity Brings Efficiency: Python RAM Optimization

  In today’s world, with the abundance of RAM available, we rarely think about optimizing our code. But sooner or later, we hit the limits a...