mongodb

Is it time to rethink your MongoDB indexing strategy?

By October 17, 2016 August 29th, 2019 No Comments

As of version 3.2 MongoDB supports partial Indexes. A partial index only indexes the documents that meet a specified filter expression, before 3.2 the closest index type to partial was the sparse index. A sparse index only indexes documents where the index field(s) is not equal to null. The sparse index is a great tool to create smaller and higher performing indexes. Partial indexes bring more flexibility to the equation.

Let’s review an example:

Let’s assume you are the owner of an application. Your application has users connect to it, you have defined two type of access for your users, VIP and NORMAL according to their subscription. You store your user information in a USERS collection, and you choose to add a field “vip” to that collection to distinguish your users, you set this field to true for VIP users and you don’t set it at all for every other user because you want to save space in your documents (flexible schema, yeah!).

When a user login to the application you need to enable the features that match the access level, you achieve this by querying your Mongo database for this user and the VIP flag. If the query returns a result you unlock the VIP access otherwise the user continues with the free version of your app.

In Mongo your query will look like:

db.USERS.find({userID:"foo", "vip":true}, {"userID":1, "_id":0})

Obviously you need an index to cover the query. An index on {“userID”:1} is able to satisfy your query and since userID is unique it will only touch one document. The downside is that since the “vip” field is not included in the index you always have to fetch the entire document in order to examine if the user is a VIP.

A covering index can solve this issue. By adding the “vip” field to your index {“userID”:1, “vip”:1} your query doesn’t have to fetch the entire document back, the downside is that the index is bigger and more complex.

A common misconception is that a sparse index on {userID:1, vip:1} will be smaller, thus perform better. This is wrong because both values must be equal to NULL in order a document not to be indexed and since userID is never equal to NULL, a non-sparse index on {“userID”:1, “vip”:1} has the same size as sparse one on {“userID”:1 , “vip”:1}. In order to use a sparse index efficiently a solution could be to assign the UserID value to the “vip” field only if a user has VIP access and then apply a sparse index on {“vip”:1}. The document will become slightly bigger but the index will be relatively small. Your query in this case will be:

db.USERS.find({vip:"foo"}, {"vip":1, "_id":0})

In version 3.2 a partial index can simulate the above workaround without any schema changes. An index on {“userID”:1}, { partialFilterExpression: {“vip”:true}} can fully cover the following query:

db.USERS.find({userID:"foo", "vip":true}, {"userID":1, "_id":0})

The partialFilterExpression tells MongoDB to include only documents with {vip:true} to the index. The downside is that you will get an exception when creating the index if you already have an index on {userID:1}, which you probably will. MongoDB won’t allow two indexes that differs only on options. You may overcome this behavior by deploying a partial index on: {“userID”:1, “vip”:1}, {partialFilterExpression: {“vip”:true}}

Too many options and I am getting confused!

Let’s try to quantify all the strategies mentioned above. Let’s assume that our USER collection has 1 million users and 10 thousand users are VIPs.

Our document structure should be either:

{
  "_id" : ObjectId("57fc0c204323819151f90a65"),
  "userID" : ObjectId("57fc0c204323819151f90a64"),
  "cDate" : ISODate("2016-10-10T21:46:08.995Z"),
  "vip" : true
}

or, if we want to follow the workaround with the {“vip”:1} sparse index:

{
  "_id" : ObjectId("57fc0c204323819151f90a65"),
  "userID" : ObjectId("57fc0c204323819151f90a64"),
  "cDate" : ISODate("2016-10-10T21:46:08.995Z"),
  "vipID" : ObjectId("57fc0c204323819151f90a64")
}

It’s obvious if we check the following matrix that Sparse and Partial options produce way smaller indexes.

Index DefinitionIndex OptionsSize (bytes)Size (MiB)
{userID: 1}2921284827.85
{userID: 1, vip: 1}3030025628.89
{vipID: 1}{sparse: true}3025120.28
{userID: 1}{partialFilterExpression: {vip: true}}3025120.28
{userID: 1, vip: 1}{partialFilterExpression: {vip: true}}3106880.29

Matrix numbers were taken from MongoDB version 3.2.8 with MMAPv1 storage engine

The capabilities of partial indexes are not limited to our generic example where things were relatively easy as we had one flag: vip, and a unique field: userID. Imagine having multiple flags like isLocked (which is true if a user account is locked by an administrator), isActive (which is true if a use account is activated) etc. and some non-unique fields. A partial index could save the day in this case!

Partial indexes may be extremely helpful for non-B-tree-friendly operators like {$exists:true} and $type and multiple ranges ($gt, $lt, $gte, $lte).

Partial indexes have some limitations:

  • Are only available in MongoDB 3.2, which means you might need to upgrade your instance.
  • You can’t use a partial index for sharding
  • _id indexes cannot be partial, mainly because replication uses them.
  • You cannot create multiple partial indexes that differ only by the filter expression
  • You cannot specify both the partialFilterExpression option and the sparse option
  • You cannot use a partial index for a query if you are missing the exact partial Filter Expression from the query predicates.

The Datastores team is here to help you design and deploy optimal indexes for your MongoDB instances. All you have to do is create a ticket to support@objectrocket.com and let us help you improve your indexes!