MongoDB Notes

If you’re storing binary files inside MongoDB, the convention is called GridFS. It splits each logical file into two collections: a metadata document and a sequence of binary chunks. This post is a cheat sheet for inspecting and tweaking those documents from the Mongo shell. 🍃

When using MongoDB to store files, we have two collections:

The place where MongoDB stores the file metadata: store.files
And the place where MongoDB stores the file content: store.chunks

Depending on the size of the file, one entry in store.files can point to many entries in store.chunks. The bigger the file, the more entries you’ll encounter.

1
2
3
4
5
6
7
8

// Show all / list all entries from store.files
db.getCollection('store.files').find({});

// Show only a particular entry from store.files
db.getCollection('store.files').find({ _id: ObjectId("5b02d232cbce1d07e08401c7") });

// The same can be used for store.chunks.
db.getCollection('store.chunks').find({});

The metadata fields in store.files can be augmented at query time (the new field exists only in the result, not in the database):

1
2
3
4

db.getCollection('store.files').aggregate([
{ $match: { _id: ObjectId("5b02d232cbce1d07e08401c7") } },
{ $addFields: { 'key_reference': '1234' } }
]);

Or we can do an update on store.files, which actually persists the new field into the database:

1
2
3
4

db.getCollection('store.files').updateMany(
{ _id: ObjectId("5b02d232cbce1d07e08401c7") },
{ $set: { 'key_reference': '1234' } }
);

A few useful additions.

Why files are split into chunks. MongoDB’s per-document hard limit is 16 MB. GridFS works around that by splitting any file larger than the chunk size into many small chunk documents and writing one metadata doc that links them together. The default chunk size is 255 KB, configurable per bucket. So a 10 MB upload becomes one *.files doc and roughly 40 *.chunks docs, all sharing the same files_id. To inspect that relationship for a specific file:

1
2
3

db.getCollection('store.chunks')
.find({ files_id: ObjectId("5b02d232cbce1d07e08401c7") })
.sort({ n: 1 }); // n is the chunk index, 0..N-1

The bucket name store.* is custom. The default GridFS bucket is named fs, so out of the box you’d see fs.files and fs.chunks. The bucket name is whatever the application set when it opened the GridFS handle. If your app uses store, replace fs with store in any docs example you find online.

Putting and getting files in the first place. The shell snippets above are for inspecting files that are already there — they don’t help you upload or download the binary content. For that, use the mongofiles CLI or the driver-level GridFS API:

1
2
3
4
5
6
7
8

# Upload
mongofiles --uri "mongodb://localhost/mydb" --prefix store put /path/to/file.pdf

# Download
mongofiles --uri "mongodb://localhost/mydb" --prefix store get file.pdf

# List
mongofiles --uri "mongodb://localhost/mydb" --prefix store list

From application code, every official driver has a GridFS class — GridFSBucket in Node and Java, GridFS in PyMongo, IGridFSBucket in C#. They handle the chunking and reassembly for you.

Don’t delete files by hand. A common pitfall: deleting a row from store.files directly leaves the matching chunks orphaned in store.chunks, slowly bloating the collection. Either use mongofiles delete <filename>, or your driver’s GridFSBucket.delete(fileId), both of which remove the metadata and the chunks atomically.

Should you actually use GridFS? A practical heads-up: if your files are bigger than 16 MB and you already use MongoDB, GridFS is a reasonable fit and keeps backups simple. But for most modern stacks, putting the bytes in object storage (S3, GCS, MinIO, R2) and keeping only a URL or key in MongoDB is cheaper, faster, and easier to scale. GridFS is most defensible when you genuinely want files transactionally co-located with the database — e.g. mobile/embedded scenarios, or when network egress to S3 is a non-starter. 💡

Archives

Meta

Categories