HelloWorldEngineer: Basics of MongoDB

MongoDB

MongoDB is a NoSQL Database. It uses documents to store data in an organized way. MongoDB is designed to meet the demands of modern apps with a technology foundation that enables :

Document data model – presenting the best way to work with semi structured data.
Distributed systems design – allowing to intelligently put data in different server and locations, scale up and out dynamically.
OS and cloud agnostic – allowing to run from different OS and eliminate vendor lock-in.

MongoDB Architecture

In the data driven world we need generate and save loads of data, as the data grows exponentially and it's impossible to save all the data in the single host and have it durability all the time. Through Replica Sets and Sharding MongoDB, this is resolved which in turn brings in high resiliency and builds distributed system design. Below let's see each components from the below architecture, what happens in the background when the application server hits the mongoDB server to fetch or feed data. Well for me this architecture appears to be the best from my research, great job for the author, credits mentioned.

Credits: https://www.livescript.in/2018/10/sharding-in-mongodb.html

Replica Sets

Replica sets have been designed for high resiliency
If the primary node goes down, from secondary node cluster will be automatically elected as primary
For client it would be seem less or with small glitch which can be fixed with retry
Replica set can hold of 50 nodes in total
For selection of the new primary, the election occurs by voting and algorithmically, some conditions such as:

Most recent updates from the primary data node
Timestamp and heartbeat status of the latest
History of the connectivity status about other secondary nodes
User defined priority

Node Types

Primary

All the write happens in this node
We have only one node as primary node
At times we may have another node acting as primary temporarily, causing the split brain issue, leading into two leaders when there is disconnect between set of nodes.
Ex: We have 6 nodes in 2 data centre. 1 node is primary. Due to network issue, connectivity stops between them, causing network partition. As in other data centre there is no primary node, among the 3, 1 is selected. As you could see we have 2 primary nodes at present, which is not recommended.
This is a temporary issue, as there can be only max 7 voting members. It needs to be distributed across.
If the current primary cannot see a majority of voting members, it will step down and become a secondary.
Meanwhile only one primary node with { w: "majority" } write concern will be capable in confirming the writes.

Secondary

All the writes done in the primary node, is replicated in all the secondary nodes by applying the oplogs
When the primary node goes down, from these set of secondary nodes one is elected as primary

Arbiter

Arbiters are mongod instances that are part of a replica set but do not hold data (i.e. do not provide data redundancy).
But it can participate in elections.
Arbiters have minimal resource requirements and do not require dedicated hardware.

Consistency

Read Concern

The readConcern option allows to control the consistency and isolation properties of the data read from replica sets and replica set shards.
Levels

local: The query returns data from the instance with no guarantee that the data has been written to a majority of the replica set members
available: Identical to the "local" in unshared collections. But in a sharded cluster we have orphaned documents, these documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdown. In "local" the reads require communication with primary shard (if read is on secondary) or the config servers to service the read. Whereas "available" option does not contact the shard's primary nor the config servers for updated metadata and may return orphan documents.
majority: Only returns data that was written to the majority of voting nodes and will not be rolled back.
linearizable: The query returns data that reflects all successful majority acknowledged writes that completed prior to the start of the read operation. The query may wait for concurrently executing writes to propagate to a majority of replica set members before returning results. Also helpful when there is network partition issue, the latest write should not be missed giving the stale information to the client. Beautifully explained in the below link.

https://stackoverflow.com/questions/42615319/the-difference-between-majority-and-linearizable

snapshot: Each writes are synchronized across host based on the time. The snapshot option reads from the latest synchronized time from all the nodes. As from the documentation it is not clear for me, referred the below link explaining the concept though not confirmed to be right.

https://stackoverflow.com/questions/53908672/whats-the-difference-of-majority-committed-data-and-the-snapshot-of-majority

Write Concern

Write concern describes the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters.
Levels

majority: Requests acknowledgment that write operations have propagated to the primary and based on the calculated majority number for the secondary voting members. Eg: if we have P-S-S replica set the write should be propagated to primary and one secondary. Another Eg: if we have P-S-A replica set the write should have been propagated to primary and secondary as well.
<number>: Requests acknowledgment that the write operation has propagated to the specified number of mongod instances. No acknowledgment for the write operation with w is set to 0.
<custom write concern name>: Requests acknowledgment that the write operations have propagated to tagged members that satisfy the custom write concern defined in settings.getLastErrorModes.

Sharding

As we know all the data cannot be saved in the single disk, we would need to have horizontal scaling.
Sharding is the process of enabling horizontal scaling seamlessly beyond the hardware limits of the single server.
Shards are used to store data in distributed systems using the shard key.
Shard key, which determines how data is distributed across a sharded cluster. As we can modify the shard key, MongoDB will automatically rebalance data across shards as needed without manual intervention.
As seen from the above diagram it's advisable to have separate replica set for each shard, providing high availability and data consistency.
Chunks are subsets of shared data. MongoDB separates sharded data into chunks that are distributed across the shards in the shared cluster. Each chunk has an inclusive lower and exclusive upper range based on the shard key. A balancer specific for each cluster handles the chunk distribution.
Sharding Options

Ranged Sharding: Documents are partitioned across shards according to the shard key value.
Hashed Sharding: Documents are distributed according to an MD5 hash of the shard key value, providing even distribution.
Zoned Sharding: Allows developers to define specific rules governing data placement in a sharded cluster.

Note: Sharding requires careful planning, as the shard key directly impacts the overall performance of the underlying cluster, as it is used to identify all the documents within the collections.

Config Server

Config servers store the cluster's metadata.
This data contains a mapping of the cluster's data set to the shards.
The metadata includes the list of chunks on every shard and the ranges that define the chunks.
The query router (mongos process) uses and caches this metadata to redirect target operations to specific shards.
Config servers needs to be placed in the replica set to enable consistency and resiliency, which can be upto 50 instances.
To deploy config servers as a replica set, the config servers must run the WiredTiger storage engine (we will see this below).
If the config server replica set loses its primary and cannot elect a primary, the cluster’s metadata becomes read only. You can still read and write data from the shards, but no chunk migration or chunk splits will occur until the replica set can elect a primary.
In a sharded cluster, mongod and mongos instances monitor the replica sets in the sharded cluster (e.g. shard replica sets, config server replica set)

Daemon mongod

mongod is the primary daemon process for the MongoDB system.
It handles data requests, manages data access, and performs background management operations.
mongod in Linux and mongod.exe in Windows
Usually the data is stored in /data/db
Cannot have two mongoDB instance running on the same port 27017

Daemon mongos

For a sharded cluster, the mongos instances provide the interface between the client applications and the sharded cluster.
From the config server the mongos instances cache the meta data and use it to route read and write operations to the correct shards. Also called as query routers.
mongos updates the cache when there are metadata changes for the cluster, such as Chunk Splits or adding a shard.
A sharded cluster can contain more than one query router to divide the client request load.

MongoDB Storage Engine

The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and on disk.
MongoDB supports multiple storage engines, as different engines perform better for specific workloads.
Choosing the engine varies based on the need of the application.
Below are three engines:

WiredTiger

The WiredTiger storage engine is the default storage engine starting in MongoDB version 3.2.
It provides a document-level concurrency model - allowing multiple users to write on the same collection in different documents at same time, checkpointing, compression, and other features.

Encrypted storage engine

The data is encrypted at rest by using this engine, to read the data requires the decrytion key.
In MongoDB Enterprise edition this is available.

In-Memory storage engine

All the data including the configurations are stored in the memory.
Provides quicker and low latency, by avoiding I/O from disks.
Caution: Data is not persistence

MongoDB Data Storage Hierarchy

Clusters
- Database - <Collection of Collections>
  - Collections - <Similar to RDBMS Table>
    - Document - <JSON single record> <Single Row>
      - Key and Value pair

Document

A document is a way to organise and store data as a set of field-value pairs.

Field - a unique identifier for a datapoint.

Key -> Should not connect \0, . and $

Value - data related to a given identifier.

The documents are viewed in the JSON format. JSON stands for Javascript Standard Object Notation.

JSON object is defined in following format:

starts and ends with curly braces {}
separate each key and value with a colon :
separate each key:value set with comma ,
keys must be written within double quotes “”
based on the datatype of value it should be represented accordingly
value can be number, string, objects (sub document)

Since there are few limitation in storing data in JSON format its stored in BSON format.

JSON limitations:

Its very much readable but its text based consuming more space
Limited datatypes are supported
JSON only supports UTF-8 format

The documents are stored in the BSON format. BSON stands for Binary JSON. BSON addresses the limitations of JSON by storing the data in the binary format. It is optimized for better speed, limited space usage and high performance. Supports multiple datatypes. BSON is faster to parse and lighter to store than JSON. But the limitation is it's not human readable, only machines can read.

Syntax:

{

<field>: <value>,
<field>: <value>,
<field>: <value>,

}

Example:

{

_id: ObjectId("5099803df3f4948bd2f98391")

name: “Ram”,
age: 12,
standard: 8
address: {

door: 12,
area: “xyz”

}

_id:

every document must have a unique _id value
ObjectId(): default value for the _id value
if not mentioned, its autogenerated
structure -> total 12bytes

0-4bytes: timestamp in seconds since the epoch
5-9bytes: random
10-12bytes: counter -> to avoid collide with ObjectIDs on different machines

Data Types:

- Null
{"x": null}
- Boolean
{"x": true}
- Number
{"x": 3}
{"x": 3.14}
{"x": NumberInt(3)}
{"x": NumberLong(3)}
- String
{"x": "foobar"}
- Date
Stores date as 64-bit integers epoch time
Note: timezone is not stored, can store in another key
{"x": new Date()}
- Regex
Queries can use Javascript regex
{"x": /foobar/i}
- Array
{"x": ["a", "b", "c"]}
- Embedded document
{"x": {"a": "1", "b": "2", "c": "3"}}
- Object ID
{"x": ObjectID()}
- Binary data
- Code
{"x": function() {/*..*/} }

Collection

An organized store of documents in MongoDB, usually with common fields between documents. Documents are stored in the collections. There can be many collections per database and many documents per collection.

Notes:

Collection:

Should not connect \0, . and $
Should not start with system.

SubCollection:

Syntactic Sugar
To hold sub collections

Example:

> show collections
companies
grades
inspections
posts
routes
trips
zips

> db.zips.find({"state": "NY"})
[
{
_id: ObjectId("5c8eccc1caa187d17ca72f89"),
city: 'FISHERS ISLAND',
zip: '06390',
loc: { y: 41.263934, x: 72.017834 },
pop: 329,
state: 'NY'
},
{
_id: ObjectId("5c8eccc1caa187d17ca72f8a"),
city: 'NEW YORK',
zip: '10001',
loc: { y: 40.74838, x: 73.996705 },
pop: 18913,
state: 'NY'
},
....
]

> db.zips.find({"state": "NY"}).count()
1596

Database

A database is structured way to access data. MongoDB is NoSQL database, which means the data is not saved in rows nor columns.

Data in MongoDB is stored in Document as described above. Collections are stored in the Database.

Folder structure:

admin

authentication and authorization

local

database stores data specific to a single server.
stores the data used in the replication process
local itself is never replicated

config

stores information about each shard

Example:

> show dbs
sample_airbnb 55.1 MB
sample_analytics 9.94 MB
sample_geospatial 1.06 MB
sample_mflix 47.9 MB
sample_restaurants 7.18 MB
sample_supplies 1.02 MB
sample_training 51 MB
sample_weatherdata 2.52 MB
admin 377 kB
local 10.1 GB

> use sample_training
switched to db sample_training

MongoDB Cloud - Atlas

Atlas is one of the MongoDB cloud. Atlas free shared clusters, creates three replicas.

Replica Set - a few connected machines that store the same data to ensure that if something happens to one of the machines the data will remain intact. Comes from the word replicate - to copy something.

Instance - a single machine locally or in the cloud, running a certain software, in our case it is the MongoDB database.

Cluster - group of servers that store your data.

Below are three common ways to connect MongoDB Cloud (also any MongoDB Instance), Mongo Shell, Compass and Application.

MongoDB Client

Mongo Shell

written in javascript
accepts all the javascript functions
use help, to example commands

> help

> db.listingsAndReviews.updateOne.help

db.collection.updateOne(filter, update, options):

Updates a single document within the collection based on the filter.

> db.listingsAndReviews.updateOne

[Function: updateOne] AsyncFunction {
apiVersions: [ 1, Infinity ],
serverVersions: [ '3.2.0', '999.999.999' ],
returnsPromise: true,
topologies: [ 'ReplSet', 'Sharded', 'LoadBalanced', 'Standalone' ],
returnType: { type: 'unknown', attributes: {} },
deprecated: false,
platforms: [ 0, 1, 2 ],
isDirectShellCommand: false,
acceptsRawInput: false,
shellCommandCompleter: undefined,
help: [Function (anonymous)] Help
}

running script with the shell

option 1> mongo script1.js script2.js
option 2> load("script1.js")
example>
show dbs
db.getMongo().getDBs()

mongorc.js

frequently used scripts can be loaded from here

editor

EDITOR="/usr/bin/emacs"

Compass

Compass is an interactive tool for querying, optimizing, and analyzing the MongoDB data.
Get key insights, drag and drop to build pipelines, and more.

Application

MongoDB is widely used across various web applications as the primary data store.
One of the most popular web development stacks, the MEAN stack employs MongoDB as the data store (MEAN stands for MongoDB, ExpressJS, AngularJS, and NodeJS).
Other languages also have client libraries to work with MongoDB such as Python, Java, Ruby, etc.

Import and Export Data

Below commands helps to get and set the data from or to mongoDB database.

JSON:

Import
- mongodump --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies"
  - --uri (uniform resource identifier, srv establishes secure connection)
  - srv : connection string - a specific format used to establish a connection between your application and a MongoDB instance.
Export
- mongoexport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --collection=sales --out=sales.json

BSON:

Import
- mongoimport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop sales.json
Export
- mongorestore --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop dump

Query

findOne -> will return one document(row) from the collection
insertOne -> one document will be inserted
updateOne -> one document will be updated
deleteOne -> one document will be deleted
find -> Returned records are not ordered.

db.zips.find({"state": "NY"})
# resuts of it iterates through the cursor.
# cursresultsor: pointer of the result set of query
# pointer: a direct address of the memory location
db.zips.find({"state": "NY"}).count()
db.zips.find({"state": "NY", "city": "ALBANY"})
db.zips.find({"state": "NY", "city": "ALBANY"}).pretty()

updateOne and updateMany,

take a filter document as their first parameter
and a modifier document which describes changes to make as the second parameter

replaceOne

take a filter document as their first parameter
and the second parameter will replace the document matching the filter

drop

when all collections are dropped from a database, the database no longer appears in the list of databases when you run show dbs.

To avoid race conditions below are preferred methods

findOneAndDelete
findOneAndUpdate
findOneAndReplace

Aggregation Framework:

Aggregation operations process multiple documents and return computed results.
Different Examples:
Find all documents that have Wifi as one of the amenities. Only include price and address in the resulting cursor.

db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1, "_id": 0 }).pretty()

Using the aggregation framework find all documents that have Wifi as one of the amenities``*. Only include* ``price and address in the resulting cursor.

db.listingsAndReviews.aggregate([
{ "$match": { "amenities": "Wifi" } },
{ "$project": { "price": 1,
"address": 1,
"_id": 0 }}]).pretty()

Find one document in the collection and only include the address field in the resulting cursor.

db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 })

Project only the address field value for each document, then group all documents into one document per address.country value.

db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }},
{ "$group": { "_id": "$address.country" }}])

Project only the address field value for each document, then group all documents into one document per address.country value, and count one for each document in each group.

db.listingsAndReviews.aggregate([
  { "$project": { "address": 1, "_id": 0 }},
  { "$group": { "_id": "$address.country",
  "count": { "$sum": 1 } } }
])

Sort and Filter:

db.zips.find().sort({ "pop": 1 }).limit(1)

db.zips.find({ "pop": 0 }).count()

db.zips.find().sort({ "pop": -1 }).limit(1)

db.zips.find().sort({ "pop": -1 }).limit(10)

db.zips.find().sort({ "pop": 1, "city": -1 })

Cursor Methods:

Cursor is pointer to the result, from where we can access the data iteratively.
For example when the find() method is used to find the documents present in the given collection, then this method returned a pointer which will points to the documents of the collection, now this pointer is known as cursor
Queries returns a database cursor, which lazily returns the batches of documents as needed.
There are a lot of meta operations one can perform on a cursor, including skipping a certain number of results, limiting the number of results returned and sorting the results.
Applied to the results Methods are:

sort
limit
pretty
count

Indexes:

Indexes support the efficient execution of queries in MongoDB.
Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.
If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Examples:

db.trips.find({ "birth year": 1989 })

db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )

db.trips.createIndex({ "birth year": 1 })

db.trips.createIndex({ "start station id": 1, "birth year": 1 })

The index is a special data structure - B-Tree, which stores the value of a specific field or a set of fields, ordered by the value of the field.
The ordering of the index entries supports efficient equality matches and range-based query operations.
Using B-Tree indexes significantly reduces the number of comparison to find the document.
Below example shows the query select using the index

Credits: https://www.mongodb.com/docs/manual/indexes/

Index Types

Single Field

By default _id field is indexed
Additionally any field from the document can be indexed
This increases the performance of the query for the select operations done on the indexed field, such as sort by the field or search by it

Compound Index

Compound index that support queries on multiple fields
The order of the indexed fields has a strong impact on the effectiveness of a particular index for a given query

Multikey Index

Multi key index is used to index for the content stored in an array
It creates the separate in index entries for each value in the array

Geospatial Index

Index on the geospatial data for better performance
Two special indexes: 2d indexes that uses planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results

Text Index

Text index supports queries on the string content in a collection
Can have more than one string field for creating this index
The weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score. Eg: Below there are 3 fields used in creation of index, each carries its own weights:
db.blog.createIndex(
{
content: "text",
keywords: "text",
about: "text"
},
{
weights: {
content: 10,
keywords: 5
},
name: "TextIndex"
}
)
For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results
Using this sum, MongoDB then calculates the score for the document.

Hash Index

Index Properties

Unique
Partial
Sparse
TTL
Hidden

Upsert:

Hybrid of update and insert
By default set to false
If match update will happen else insert will happen
Example

db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
   "valcount": { "$lt": 48 } },
   { "$push": { "readings": { "v": r.value, "t": r.time } },
"$inc": { "valcount": 1, "total": r.value } },
   { "upsert": true })

Transaction

The multi-document transactions that contain read operations must use read preference primary.
Until a transaction commits, the data changes made in the transaction are not visible outside the transaction

MQL Operators

Update Operators:

$inc: increment
$set: set the value
$unset: unset the value

Query Operators: Locate the data

Used to query for ranges, set inclusions, and many more by using $ conditionals.
Below are few $ conditionals.

$ne
$eq
$gt
$lt
$gte
$lte
$in -> in array
$nin -> not in array
$not
$or -> condition
$and -> condition
$mod -> modulus, queries the keys whose values, when divided by the first value given, have a remained of the second value
$regex -> regular expression, using Perl Compatible Regular Expression
$all -> if you need to match arrays by more than one element
$size -> size of the array
$slice -> return a subset of elements for an array key
$elemMatch

Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was not Subscriber:

db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": { "$ne": "Subscriber" } }).pretty()

Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was Customer using a redundant equality operator:

db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": { "$eq": "Customer" }}).pretty()

Find all documents where the tripduration was less than or equal to 70 seconds and the usertype was Customer using the implicit equality operator:

db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": "Customer" }).pretty()

Logic Operators

$and
$or
$nor
$not
Find all documents where airplanes CR2 or A81 left or landed in the KZN airport:

db.routes.find({ "$and": [ { "$or" :[ { "dst_airport": "KZN" },
{ "src_airport": "KZN" }
] },
{ "$or" :[ { "airplane": "CR2" },
{ "airplane": "A81" } ] }
]}).pretty()

Expressive Operators

Allows the use of aggregation expressions within the query language.
Examples:
Find all documents where the trip started and ended at the same station:

db.trips.find({ "$expr": { "$eq": [ "$end station id", "$start station id"] }
}).count()

Find all documents where the trip lasted longer than 1200 seconds, and started and ended at the same station:

db.trips.find({ "$expr": { "$and": [ { "$gt": [ "$tripduration", 1200 ]},
{ "$eq": [ "$end station id", "$start station id" ]}
]}}).count()

Array Operators

$push -> adds elements to an array
$pop / $pull -> removes elements from an array
$each -> modifier adds multiple values to an array
$slice -> projection operator specifies the number of elements in an array to return in the query result
$sort -> sorts all input documents and returns them to the pipeline in sorted order
$addToSet -> only unique values in array
$upsert -> if no record is found and insert will happen
Examples:
Find all documents with exactly 20 amenities which include all the amenities listed in the query array:

db.listingsAndReviews.find({ "amenities": {
"$size": 20,
"$all": [ "Internet", "Wifi", "Kitchen",
"Heating", "Family/kid friendly",
"Washer", "Dryer", "Essentials",
"Shampoo", "Hangers",
"Hair dryer", "Iron",
"Laptop friendly workspace" ]
}
}).pretty()

Project and $elemMatch

Specifies the fields that should or not be included in the result cursor
Syntax: db.<collection>.find({<query>}, {<projection})
Do not combine 1s and 0s in the projection, except for {_id: 0, <fields>: 1}
{<field>: {"$elemMatch": {<field>: <value>}}}

Matches documents that contain an array field with at least one element that matches specified query criteria
(or)
Projects only the array elements with at least one element that matches the specified criteria

Examples:
Find all documents with exactly 20 amenities which include all the amenities listed in the query array, and display their price and address:

db.listingsAndReviews.find({ "amenities":
{ "$size": 20, "$all": [ "Internet", "Wifi", "Kitchen", "Heating",
"Family/kid friendly", "Washer", "Dryer",
"Essentials", "Shampoo", "Hangers",
"Hair dryer", "Iron",
"Laptop friendly workspace" ] } },
{"price": 1, "address": 1}).pretty()

Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor:

db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1, "_id": 0 }).pretty()

Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor, also exclude ``"maximum_nights"``. **This will be an error:*

db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1,
"_id": 0, "maximum_nights":0 }).pretty()

Get one document from the collection:

db.grades.findOne()

Find all documents where the student in class 431 received a grade higher than 85 for any type of assignment:

db.grades.find({ "class_id": 431 },
{ "scores": { "$elemMatch": { "score": { "$gt": 85 } } }
}).pretty()

Find all documents where the student had an extra credit score:

db.grades.find({ "scores": { "$elemMatch": { "type": "extra credit" } }
}).pretty()

Querying Arrays and Sub Documents:

db.trips.findOne({ "start station location.type": "Point" })

db.companies.find({ "relationships.0.person.last_name": "Zuckerberg" },
{ "name": 1 }).pretty()

db.companies.find({ "relationships.0.person.first_name": "Mark",
"relationships.0.title": { "$regex": "CEO" } },
{ "name": 1 }).count()

db.companies.find({ "relationships.0.person.first_name": "Mark",
"relationships.0.title": {"$regex": "CEO" } },
{ "name": 1 }).pretty()

db.companies.find({ "relationships":
{ "$elemMatch": { "is_past": true,
"person.first_name": "Mark" } } },
{ "name": 1 }).pretty()

db.companies.find({ "relationships":
{ "$elemMatch": { "is_past": true,
"person.first_name": "Mark" } } },
{ "name": 1 }).count()

Data modeling

Data modelling - a way to organize fields in a document to support your application performance and querying capabilities.
Avoid redundant data with different _id value, unnecessary memory is occupied
Avoid completely different set of key value pairs in single document
Foresee a heavy query usage then consider the use of indexes in your data model to improve the efficiency of queries.

Pros using MongoDB

More flexibility in the data model, adds change friendly design and quicker releases
Scalability of data easily
Distributed system and cloud computing delivers resiliency
Supports various data types with ease such as time series, geospatial, polymorphic data, etc
Create rich data driven application
Balanced high performance reads (indexes) and writes
As data grows sharding helps by horizontally scaling and saving data across multiple instances
Cost effective as its has its open source version available
Data model, as simple JSON, leads in faster to understand and develop
Also easy installation

Challenges and Disadvantages

As distributed in nature transactions is challenge, though from MongoDB 4.2 the distributed transactions can be used across multiple operations, collections, databases, documents, and, starting in MongoDB 4.2, shards with strong consistency
Joins are not like traditionally done, yet we have stage called $lookup for joins but these come with high memory cost.
Limited document data size upto 16MB.
Limited number of levels in nesting.

Conclusion

At a high level we have seen about MongoDB. Each of these topics itself can be talked to its full length and breadth. Additionally there are several topics in MongoDB which are not covered here. For now we could sense the easiness, resiliency and performance and consistency - effective read and writes. As MongoDB is still evolving the demerits would be fixed in near future.

References

https://www.mongodb.com

https://docs.mongodb.com/manual/reference/method/db.collection.updateOne

https://medium.com/swlh/mongodb-indexes-deep-dive-understanding-indexes-9bcec6ed7aa6

https://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf

https://stackoverflow.com/questions/58814041/in-mongodb-why-is-read-concern-available-default-option-for-secondaries-in-no

https://www.bmc.com/blogs/mongodb-sharding-explained

MongoDB Course

HelloWorldEngineer

Thursday, 16 June 2022

Basics of MongoDB