Sunday, March 19, 2023

The Nuances of Mongoose Select

Developing strategies for more effective use of select/project statements with mongoose.

Selecting and Projecting: SQL vs NoSQL

One of the biggest differences I've noticed when working with document storage compared to relational database is the default behaviour of selecting fields in a query.

In SQL, while the * operator is always available to select all available fields, it's an active choice to use it. The user must explicitly choose to select whichever fields are required, with the option to return all of them.

Contrast this with MongoDB, where the default behaviour is to select the entire document with any query. To narrow down the results the user must opt in to selecting their desired fields, rather than be forced to make the choice up front (even if that choice is to use *).

While this is a fundamental difference between these two systems, I have never really seen it, or the implications of it, discussed in any detail. I believe that in many situations the decision to have Mongo implicitly return all fields without the user making a selection is harmful and causes more problems than the convenience that it affords.

Even while using an ORM, I see how easy it can be to mistakenly return fields that expose private information, compromise security, or degrade performance by unintentionally increasing request payload sizes.

I would go as far as to say that a linter that enforces that all find() queries require explicit select() statements would significantly improve security and performance for many programs. Unfortunately, I don't think this is a viable option for the codebase I work on.

In that case, what kind of system or approach can be used to prevent unintended data from reaching any client?

Mongoose Select

I use Mongoose to interact with Mongo and I like it quite a bit. The query abstractions are very composable, and the documentation is comprehensive and strikes the right balance between technical and approachable. The abstractions that it offers over top of regular mongo projections will be the focus of the rest of this post.

Mongoose has three different approaches you can use to select fields out of a table, and a couple of what could be called "modes".

This will serve as our example document to illustrate the different examples.

// Image this doc is an instance of Doc in a database somewhere
doc = {
	a: 1,
	b: 2,
	c: 3,
	d: 4
};

Three Select Syntaxes

Mongoose is quite flexible and allows three different ways of constructing a select statement. A string, an array of strings, and an object are all valid input to the select() function.

All of the following statements are equal:

// 1) String
Doc.findOne().select('a b');

// 2) Array of strings
Doc.findOne().select(['a', 'b']);

// 3) Object
Doc.findOne().select({ a: 1, b: 1 });

// resulting document: { a: 1, b: 2 }
// c and d have both been omitted

Inclusive and Exclusive Selects

Each of three of the examples above allows the user to include which fields are desired.

This is called an inclusive select.

The opposite of an inclusive select is an exclusive select, certain fields can be excluded by prefixing them with - or marking them as a 0 instead of 1 inside a select object:

Doc.findOne().select('-a -b');
// or
Doc.findOne().select({ a: 0, b: 0 });

// resulting document: { c: 3, d: 4 }

It's important to think of inclusive and exclusive as a pair of flags that both default to false, rather than a switch or a single boolean value. A query without any select clause is neither inclusive nor exclusive. This can be easy to forget and has some interesting implications.

Selects defined directly on schema fields do not count as a select statement. We can override that select using the + modifier. Because select() has never been called, the query has yet to declared exclusive or exclusive.

// Schema with field level select
Doc = new Schema({
	a: {
		type: String,
		select: false
	},
	b: String
});

// Valid - Exclusive query
Doc.findOne().select('-a +b'); // returns { b: 'something' }

// Error!
Doc.findOne().select('-a b');

Select Interactions

Now, with this many different permutations of select inclusion, there are a lot of ways things can break down or be combined with varying results.

What happens when we chain select statements?

Doc.findOne().select('a b').select('');

Path Collisions

In Mongo 4.4 and higher, selecting specific fields from a nested object can cause problems. Anytime you include an entire object in a projection, this means that you may not include individual fields from the subdocument.

// example object
Doc.create({
	a: {
		b: 1,
		c: 2
	}
});

Doc.findOne().select('a a.b'); // ERROR path collision

In the above example, either a can be selected, or any combination of a.b and a.c.

While this seems like a fairly simple behaviour to work around, bringing select: false at the model level into the equation can path collisions more difficult to track down and cause unexpected problems.

// subdocument example
const Doc = mongoose.Schema({
	a: {
		// nested document that is projected out by default
		type: new mongoose.Schema({
			b: {
				type: String,
				select: false
			},
			c: String
		})
	}
});

Doc.findOne(); // no problems, returns { a: {c: 1} }, ignores b

// what if we want to ignore the entire parent document?
Doc.findOne().select('-a'); // ERROR path collision

With this behaviour in mind, we should consider two main situations when dealing with embedded subdocuments:

  • When you need the subdoc, you need all of it

For situations when the subdocument should default to private, do not add select: false to any of the fields within the subdocument. Only select the entire subdocument or ignore it completely.

  • When you need finer grained access

Do not apply select: false to the subdocument, only to the fields within it. This situation causes more complicated selects to be required more frequently.

If you find yourself running into this, it might be worth considering migrating the subdocument into its own collection in order to have full control over it.

Populate

Speaking of migrating documents into their own collection, I should probably talk about the behaviour of populate() as well.

populate() replaces an ObjectId that references another collection with the referenced document. populate() also contains its own select statement, since it's essentially a helper for executing a find() query based on the target field.

The select part of populate behaves in much the same way as any regular select statement. It does, however, have one interesting trait, which is that it can be used to override a prior exclusive select.

// pretend that "a" refernces another collection
Doc.create({
	a: ObjectId('...'),
	b: String
});

Doc.findOne().select('-a').populate('a'); // works as intended, no Error

What's The Best Approach?

With all of the prior examples in mind, I recommend the following approach for structuring schemas:

  • Use schema level select: false on top level (not nested) fields that shouldn't always be available.
    • It's easier to select things when you need them than to have to remember to get rid of them.
  • Use select: false on the top level of subdocuments if you either need all of it or none of it.
    • Otherwise, stick to selecting individual fields and be prepared to return the whole thing everytime.
  • While I didn't discuss them in detail here, Query Helpers are useful!
    • Move complicated select logic into Query Helpers for repeatability and consistency.
    • An individual UI element on the front end that requires specific data can be supported by a query helper behind it.

These are pretty safe tips for navigating selecting and projecting with mongoose. I realize this isn't the most interesting topic, but I also think it's deceptively important due to mongo's "include everything by default" design. Think carefully about selections at the beginning of the design process can save a disproportionately large amount of time and effort down the line.

Once you have logic scattered throughout your system that relies on specific default select behaviour, it can cause you to get stuck in a position where you're painstakingly combing your app for places where you need to exclude or include a specific field to preserve some kind of functionality. This is compounded by select not having any real form of type checking or validation to tell you where to look while refactoring. Having a plan up front and using query helpers to define default behaviour can save hours of headaches.

Also, don't use the autopopulate plugin! Please. All it does is cause pain. The extra 11 characters of calling .populate() is worth it. Trust me. All it takes is one time where you forget it's lurking there in your schema for it to autopopulate something sensitive and ruin your day. Or week.

Thanks for reading!

If you have any questions, anecdotes, or corrections, hit me up!