The Nuances of Mongoose Select
Developing strategies for more effective use of select/project statements with mongoose.
Selecting and Projecting: SQL vs NoSQL
One of the biggest differences I've noticed when working with document storage compared to relational database is the default behaviour of selecting fields in a query.
In SQL, while the *
operator is always available to select all available fields, it's
an active choice to use it. The user must explicitly choose to select whichever fields are required, with
the option to return all of them.
Contrast this with MongoDB, where the default behaviour is to select the entire document
with any query. To narrow down the results the user must opt in to
selecting their desired fields, rather than be forced to make the choice up front (even if that
choice is to use *
).
While this is a fundamental difference between these two systems, I have never really seen it, or the implications of it, discussed in any detail. I believe that in many situations the decision to have Mongo implicitly return all fields without the user making a selection is harmful and causes more problems than the convenience that it affords.
Even while using an ORM, I see how easy it can be to mistakenly return fields that expose private information, compromise security, or degrade performance by unintentionally increasing request payload sizes.
I would go as far as to say that a linter that enforces that all find()
queries require explicit
select()
statements would significantly improve security and performance for many programs. Unfortunately,
I don't think this is a viable option for the codebase I work on.
In that case, what kind of system or approach can be used to prevent unintended data from reaching any client?
Mongoose Select
I use Mongoose to interact with Mongo and I like it quite a bit. The query abstractions are very composable, and the documentation is comprehensive and strikes the right balance between technical and approachable. The abstractions that it offers over top of regular mongo projections will be the focus of the rest of this post.
Mongoose has three different approaches you can use to select fields out of a table, and a couple of what could be called "modes".
This will serve as our example document to illustrate the different examples.
// Image this doc is an instance of Doc in a database somewhere
doc = {
a: 1,
b: 2,
c: 3,
d: 4
};
Three Select Syntaxes
Mongoose is quite flexible and allows three different ways of constructing a select statement.
A string, an array of strings, and an object are all valid input to the select()
function.
All of the following statements are equal:
// 1) String
Doc.findOne().select('a b');
// 2) Array of strings
Doc.findOne().select(['a', 'b']);
// 3) Object
Doc.findOne().select({ a: 1, b: 1 });
// resulting document: { a: 1, b: 2 }
// c and d have both been omitted
Inclusive and Exclusive Selects
Each of three of the examples above allows the user to include which fields are desired.
This is called an inclusive select.
The opposite of an inclusive select is an exclusive select, certain fields can be
excluded by prefixing them with -
or marking them as a 0
instead of 1
inside a select
object:
Doc.findOne().select('-a -b');
// or
Doc.findOne().select({ a: 0, b: 0 });
// resulting document: { c: 3, d: 4 }
It's important to think of inclusive and exclusive as a pair of flags that both default to false,
rather than a switch or a single boolean value. A query without any select
clause is neither
inclusive nor exclusive. This can be easy to forget and has some interesting implications.
Selects defined directly on schema fields do not count as a select statement. We can override
that select using the +
modifier. Because select()
has never been called, the query has yet to
declared exclusive or exclusive.
// Schema with field level select
Doc = new Schema({
a: {
type: String,
select: false
},
b: String
});
// Valid - Exclusive query
Doc.findOne().select('-a +b'); // returns { b: 'something' }
// Error!
Doc.findOne().select('-a b');
Select Interactions
Now, with this many different permutations of select inclusion, there are a lot of ways things can break down or be combined with varying results.
What happens when we chain select statements?
Doc.findOne().select('a b').select('');
Path Collisions
In Mongo 4.4 and higher, selecting specific fields from a nested object can cause problems. Anytime you include an entire object in a projection, this means that you may not include individual fields from the subdocument.
// example object
Doc.create({
a: {
b: 1,
c: 2
}
});
Doc.findOne().select('a a.b'); // ERROR path collision
In the above example, either a
can be selected, or any combination of a.b
and a.c
.
While this seems like a fairly simple behaviour to work around, bringing select: false
at the
model level into the equation can path collisions more difficult to track down and cause unexpected problems.
// subdocument example
const Doc = mongoose.Schema({
a: {
// nested document that is projected out by default
type: new mongoose.Schema({
b: {
type: String,
select: false
},
c: String
})
}
});
Doc.findOne(); // no problems, returns { a: {c: 1} }, ignores b
// what if we want to ignore the entire parent document?
Doc.findOne().select('-a'); // ERROR path collision
With this behaviour in mind, we should consider two main situations when dealing with embedded subdocuments:
- When you need the subdoc, you need all of it
For situations when the subdocument should default to private, do not add select: false
to any
of the fields within the subdocument. Only select the entire subdocument or ignore it completely.
- When you need finer grained access
Do not apply select: false
to the subdocument, only to the fields within it. This situation causes
more complicated selects to be required more frequently.
If you find yourself running into this, it might be worth considering migrating the subdocument into its own collection in order to have full control over it.
Populate
Speaking of migrating documents into their own collection, I should probably talk about the behaviour of
populate()
as well.
populate()
replaces an ObjectId
that references another collection with the referenced document. populate()
also contains its own select
statement, since it's essentially a helper for executing a find()
query based on the
target field.
The select
part of populate
behaves in much the same way as any regular select statement. It does,
however, have one interesting trait, which is that it can be used to override a prior exclusive select.
// pretend that "a" refernces another collection
Doc.create({
a: ObjectId('...'),
b: String
});
Doc.findOne().select('-a').populate('a'); // works as intended, no Error
What's The Best Approach?
With all of the prior examples in mind, I recommend the following approach for structuring schemas:
- Use schema level
select: false
on top level (not nested) fields that shouldn't always be available.- It's easier to select things when you need them than to have to remember to get rid of them.
- Use
select: false
on the top level of subdocuments if you either need all of it or none of it.- Otherwise, stick to selecting individual fields and be prepared to return the whole thing everytime.
- While I didn't discuss them in detail here, Query Helpers are useful!
- Move complicated select logic into Query Helpers for repeatability and consistency.
- An individual UI element on the front end that requires specific data can be supported by a query helper behind it.
These are pretty safe tips for navigating selecting and projecting with mongoose. I realize this isn't the most interesting topic, but I also think it's deceptively important due to mongo's "include everything by default" design. Think carefully about selections at the beginning of the design process can save a disproportionately large amount of time and effort down the line.
Once you have logic scattered throughout your system that relies on specific default
select behaviour, it can cause you to get stuck in a position where you're painstakingly combing your app for places where
you need to exclude or include a specific field to preserve some kind of functionality. This is compounded by select
not having
any real form of type checking or validation to tell you where to look while refactoring. Having a plan up front and using query
helpers to define default behaviour can save hours of headaches.
Also, don't use the autopopulate plugin! Please. All it does is cause pain. The extra 11 characters of calling
.populate()
is worth it. Trust me. All it takes is one time where you forget it's lurking there in your schema
for it to autopopulate something sensitive and ruin your day. Or week.
Thanks for reading!
If you have any questions, anecdotes, or corrections, hit me up!