There are many different layers and perspectives when it comes to optimization. Throughout my career as a software developer, I can’t recall a single project where optimization wasn’t mentioned in some shape or form — whether early on or much later in the process. It is a topic that comes up repeatedly and, hopefully, not too late.
When I was studying, optimization was mainly taught as a concern about algorithmic execution time: the well-known Big-O notation that frequently appears in job interviews, and for good reason. Once I started working professionally, I realized that in practice we often want to optimize across many other dimensions throughout the software development lifecycle.
On a personal note, I genuinely enjoy optimization-related tasks, particularly those focused on execution time, memory usage, network latency, and other technical concerns. These problems tend to bring you back to programming fundamentals and are often universal across languages, which makes them a great way to learn new ones by applying familiar algorithms in a different syntax.
A couple of months ago, I worked on a task involving the optimization of a long-running process. This post covers the strategies I applied. As a result, it leans toward the technical side and includes some JavaScript snippets. The example problem is simplified and made up, but it closely resembles the original scenario and should still convey the core ideas.
The Problem
Imagine we need to read data from a third-party database that stores information about students and their attendance history. Since this is external data, we want to periodically synchronize it into our own database, potentially transforming it along the way to suit our needs.
Let’s start with a basic and intuitive approach:
async function syncAllStudents() {
const students = await fetchStudents()
for (const student of students) {
const records = await fetchAssistanceRecords(student.id)
const profile = await fetchProfile(student.id)
const mappedStudent = mapStudent(student, records, profile)
await db.students.upsert(student.id, mappedStudent)
}
}
This implementation is straightforward and works fine for a relatively small number of students. However, as the dataset grows, execution time increases significantly, to the point where the process can take hours to complete.
Analysis & Solution
It is clear that the function’s execution time scales with the number of students. For each one, we fetch additional data and perform database operations. Since fetching and iterating over all students is unavoidable, the real question becomes: how can we make this faster?
Input/output operations are generally much slower than in-memory operations. Reducing the number of IO calls can therefore have a major impact. Consider the following refactoring:
async function syncAllStudents() {
const students = await fetchStudents()
const ids = students.map(student => student.id)
const recordsByStudent = await fetchAllAssistanceRecords(ids)
const profilesByStudent = await fetchAllProfiles(ids)
for (const student of students) {
const records = recordsByStudent.get(student.id)
const profile = profilesByStudent.get(student.id)
const mappedStudent = mapStudent(student, records, profile)
await db.students.upsert(student.id, mappedStudent)
}
}
From a Big-O perspective, both implementations have the same complexity. However,
the second version already performs much better in practice. The number of database
queries drops from n * 2 + 1 to just 3.
The lookup operations remain efficient because the fetched data is stored in Maps,
where access by key is cheap. Building these maps adds a couple of O(n) passes, but
this is still far more efficient than repeatedly querying external systems.
At this point, we are left with one IO operation inside the loop. Many databases support bulk operations, so instead of issuing individual upserts, we can batch them:
async function syncAllStudents() {
const students = await fetchStudents()
const ids = students.map(student => student.id)
const recordsByStudent = await fetchAllAssistanceRecords(ids)
const profilesByStudent = await fetchAllProfiles(ids)
const operations = students.map(student => {
const records = recordsByStudent.get(student.id)
const profile = profilesByStudent.get(student.id)
const mappedStudent = mapStudent(student, records, profile)
return {
op: 'upsert',
where: { id: student.id },
data: mappedStudent,
}
})
await db.students.bulk(operations)
}
Finally, while async/await`` greatly improves readability, it can sometimes obscure the fact that we are working with **Promises**. Independent IO operations can be executed concurrently usingPromise.all`:
async function syncAllStudents() {
const students = await fetchStudents()
const ids = students.map(student => student.id)
const [recordsByStudent, profilesByStudent] = await Promise.all([
fetchAllAssistanceRecords(ids),
fetchAllProfiles(ids),
])
const operations = students.map(student => {
const records = recordsByStudent.get(student.id)
const profile = profilesByStudent.get(student.id)
const mappedStudent = mapStudent(student, records, profile)
return {
op: 'upsert',
where: { id: student.id },
data: mappedStudent,
}
})
await db.students.bulk(operations)
}
Conclusion
At first glance, this approach introduces more code and slightly reduces readability. However, the payoff is substantial. In the original task that inspired this post, execution time dropped from roughly seven hours to ten minutes, which is a clear win.
There are additional concerns worth exploring — for example, what happens when all this data no longer fits comfortably in memory. This post has already exceeded its original scope, so I will leave those topics for another time.