MongoDB Database Views

Introduction

Today I’m going to talk to you about a MongoDB Database View. This is a relatively new concept that was introduced in MongoDB version 3.4. For me, this is just a natural progression of the MongoDB database engine as it continues to evolve. Many of the concepts that I describe here will be familiar to those of you who have used MongoDB’s Aggregation Framework before. For the complete documentation on creating a database view in MongoDB, please visit Views in the Introduction to MongoDB.

A view in MongoDB acts the same way that a view does in a relational database. A database view is a searchable object in a database that is defined by a query. The view created in MongoDB is a read only collection that can combine data from two or more collections. Just like in a relational database where joins are used to connect two or more tables, a $lookup can be used to combine two collections.  You can also return a subset of information by using the $project operator.  This allows you to return a smaller document when a smaller portion is required.  This can be used to save network bandwidth as a smaller document will be returned from the database.

Aggregation

Aggregation allows you to define a one or more stages that will read the defined collections and process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. MongoDB provides many different ways to perform aggregation:

  • The aggregation pipeline
  • The map-reduce function
  • The single purpose aggregation methods

Today I am going to be talking about the first one the aggregation pipeline. for more information on aggregation, please refer to the MongoDB documentation on this topic.

The JSON way

First let’s walk through this as we would any other aggregation.  I will do this by showing you an aggregation using JSON steps in the pipeline first before I show performing the same actions using the C# driver.  I feel that it is important to understand what is going on behind the scenes before covering it up with an easy to use .NET API. In my code sample below, I will be joining a Customer Template with a Customer. I will then be projecting a subset of that information out for a selection grid.

Join $lookup

The $lookup operator performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. Here is an example of the lookup definition that I used in my sample code.

{
  $lookup: {
    "from": "TemplateCollection",
    "localField": "_id",
    "foreignField": "_id",
    "as": "Template"
  }
}

I will be applying the $lookup described above to my “Customer” collection. When I execute this as my first stage in my pipeline, I will get the following result.

{ 
    "_id" : BinData(3, "XAZge7gK1UqtmWbYbepbVw=="), 
    "CustomerIdentifier" : NumberInt(100000), 
    "CustomerName" : "John Frank's Company", 
    "CreatedDate" : ISODate("2017-08-24T00:00:00.000-0500"), 
    "ChangeHistory" : [
        { 
            "_id" : NumberInt(1),
            "Comments" : "Customer created"
        }, 
        // Omitted for brevity
        { 
            "_id" : NumberInt(10),
            "Comments" : "Modified Purchase Order"
        },         
    ], 
    "Template" : [
       {
          "_id": BinData(3, "XAZge7gK1UqtmWbYbepbVw=="),
          "PlannedNumberOfOrders": NumberInt(20),
          // Omitted for brevity
       }
    ]
}

Please not that not all of the data is displayed here. Some has been removed as it is domain to the discussion and it adds brevity to this post. Please be aware as this acts like a left join, if there is no match in the CustomerTemplate collection, then an empty array will be returned in the “Template” property.

Initial $project

Next we will project the information that was want from each collection into a smaller subset of data. Please note that my object has now two nested arrays. I only want one of the items in these nested arrays, the latest one. I will do this with the next stage of my pipeline. Here is the JSON object that I will use to perform this.

{
  "CustomerIdentifier": 1,
  "CurrentHistory": { "$arrayElemAt": ["$ChangeHistory", -1] },
  "Template": { 
    "$cond": { 
       if: { "$ne": [ "$Template", [] ] },
       then: { "$arrayElemAt": ["$Template", 0] }, 
       else: {
          _id: "$_id", 
          PlannedNumberOfOrders: 0
       }
    } 
  }
}

There are several things at play here. First the -1 in the $arrayElemAt causes the database to pull the last element in the $ChangeHistory property on the object returned from the first stage of the pipeline.

Second is the $cond operator. This operation acts like the If/Then/Else operator that you may be used to. In my case, IF my Template property array is not empty, THEN I can return the first item in that array. Since I only will every have a maximum of one CustomerTemplate for every customer, this is a logical 1 to 1 relationship given that a template exists for the given customer. If a template does not exist for the customer, I am creating an object that has only the properties that I will need in my final projection. That is my CustomerTemplate has 17 properties on it. However, since I am only interested in the _id, for linking purposes, and PlannedNumberOfOrders, that is all that I return if not CustomerTemplate has been found.

{ 
    "_id" : BinData(3, "XAZge7gK1UqtmWbYbepbVw=="), 
    "CustomerIdentifier" : NumberInt(100000), 
    "CustomerName" : "John Frank's Company", 
    "CreatedDate" : ISODate("2017-08-24T00:00:00.000-0500"), 
    "CurrentHistory" : { 
            "_id" : NumberInt(10),
            "Comments" : "Modified Purchase Order"
        }, 
    "Template" : {
          "_id": BinData(3, "XAZge7gK1UqtmWbYbepbVw=="),
          "PlannedNumberOfOrders": NumberInt(20),
          // Omitted for brevity
        }
}

Final $project

Now that I have selected out the version of customer history that I wanted as well as ensure that the Template object will be there. Instead of nested arrays, I how have single objects for the CurrentHistory and Template properties. In my final projection I will flatten out this document even further to ensure that it will provide a quick lookup for my selection grid.

{
    "_id": 1,
    "CustomerIdentifier": 1,
    "CustomerName": 1,
    "CustomerStatus": "$CurrentHistory.CustomerStatus",
    "CustomerTypeId": "$CurrentHistory.CustomerTypeId",
    "LocationId": "$CurrentHistory.LocationId",
    "NumberOfOrders": "$CurrentHistory.NumberOfOrders",
    "PlannedNumberOfOrders": "$Template.PlannedNumberOfOrders",
    "Orders": "$CurrentHistory.Orders",
    "CancelledOrders": "$CurrentHistory.CancelledOrders",
    "LastEditDate": "$CurrentHistory.EditDateTime",
    "CreatedDate" : 1, 
    "HasOutstandingOrders" : "$CurrentHistory.OrdersOutstanding",
    "Comment": "$CurrentHistory.Comments"
}

This projection definition will provide me with the small subset of data that was needed for the selection grid, as seen in the following result.

{ 
    "Id" : BinData(3, "XAZge7gK1UqtmWbYbepbVw=="),
    "CustomerIdentifier" : NumberInt(100000), 
    "CustomerOrigin" : NumberInt(2),     
    "CustomerStatus" : NumberInt(2), 
    "CustomerTypeId" : NumberInt(101), 
    "LocationId" : NumberInt(113), 
    "NumberOfOrders" : NumberInt(2), 
    "PlannedNumberOfOrders" : 0.0, 
    "Orders" : [
        BinData(3, "MtfSFn9MTUmlnyPib0Q81g=="), 
        BinData(3, "3x4VKF3r6k6AgSRTsgz+sQ==")
    ], 
    "CancelledOrders" : [ ], 
    "LastEditDate" : ISODate("2017-10-05T00:00:00.000-0500"), 
    "HasOutstandingOrders" : true
}

The .NET way

Now that I’ve shown you how to do this with JSON objects, I will demonstrate how to accomplish the same result with the C# .NET MongoDB API. Take the following fluent aggregation query.

var aggregateFluent = collection.Aggregate(aggregateOptions)
.Lookup<CustomerTemplate, LookupView>("CustomerTemplate", "Id", "Id", "TemplateView")
.Project<LookupView, TemporaryProjection>(func => new TemporaryProjection
  Id = func.Id,
  CustomerIdentifier = func.CustomerIdentifier,
  AlternateCustomerIdentifier = func.AlternateCustomerIdentifier,
  CustomerOrigin = func.CustomerOrigin,
  CurrentHistory = func.Versions.Last(),
  Template = func.TemplateView.Any()
	? func.TemplateView[0]
	: new OrderTemplateState {
	  Id = func.Id,
	  PlannedNumberOfOrders = 0
  },
  CreatedDate = func.CreatedDate,
})
.Project<TemporaryProjection, CustomerView>(func => new CustomerView
{
  _id = func.Id,
  CustomerIdentifier = func.CustomerIdentifier,
  AlternateCustomerIdentifier = func.AlternateCustomerIdentifier,
  CustomerOrigin = func.CustomerOrigin,
  CustomerStatus = func.CurrentHistory.CustomerStatus,
  CustomerTypeId = func.CurrentHistory.CustomerTypeId,
  LocationId = func.CurrentHistory.LocationId,
  NumberOfOrders = func.CurrentHistory.NumberOfOrders,
  PlannedNumberOfOrders = func.OrderTemplate.PlannedNumberOfOrders,
  Orders = func.CurrentHistory.Orders,
  TerminatedOrders = func.CurrentHistory.TerminatedOrders,
  LastEditDate = func.CurrentHistory.EditDateTime,
  CreatedDate = func.CreatedDate,
  HasCancelledOrders = func.CurrentHistory.OrdersCancelled
});
 

As you can see it is extremely similar to the JSON object queries that we constructed above. This can then be used to create the new view. Here is the final code.

var pipeline = PipelineDefinition<CustomerState, CustomerView>.Create(aggregateFluent.Stages);
db.CreateView("MySpecialView", "CustomerState", pipeline);
 

View Speed

Now when I pull my documents directly from the “CustomerView” that collection is populated with real documents. I need to consider that I am not pulling documents at all from “MySpecialView”. I am actually just causing the execution in the view to execute pulling back the objects in the aggregation pipeline. That means that it will be a slightly slower than pulling real documents.

Test Performed Reading 15000 Customer Records
 -> Performed Read All Records From CustomerView in 0:00:00.3450469
 -> Performed Read All Records From MySpecialView in 0:00:00.4732765

Why do it then? Well for one thing since there are no physical documents so the amount of storage space is reduced. This same thing occurs when you pull data from a database view from an RDBMS. Another reason is that your view or read model will be perfectly in sync with your write model at all times. Because the database engine is doing the work, you don’t have to embed complex code in your application. You can simply read the aggregated view as if you were reading any other top level document. For that, I am willing to give up 13 milliseconds.

I hope that this post has been enlightening to you. If you have any questions, please feel free to drop me a line below. Thank you!

Advertisements

2 thoughts on “MongoDB Database Views

  1. Excellent blog right here! Additionally your website rather a lot up fast! What host are you the usage of? Can I get your associate hyperlink in your host? I wish my website loaded up as fast as yours lol

  2. I’m now not sure the place you are getting your information, however good topic. I must spend a while finding out much more or understanding more. Thanks for magnificent information I was on the lookout for this information for my mission.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s