High-Performance Bulk Insert with relationships

July 29, 20256 min read
High-Performance Bulk Insert with relationships

Pattern to bulk insert thousands of related records with multi-level relationships

Ever tried to bulk insert thousands of records that depend on each other?

If you have, you probably know the pain.

If you haven't, we're about to dive into a problem that'll make you question your life choices, and then show you a neat trick to solve it.

The Problem

Picture this: You're building a project management system and the templating feature requires the ability to save project templates and then create project instances for all clients.

Your project structure looks like this:

ProjectPhasesTasksSubtasks

We store a similar structure for templates, so we can reuse them across different clients - make sense?

Template ProjectTemplate PhasesTemplate TasksTemplate Subtasks

For simplicity, let’s assume - A Project template has about 5 phases, 10 tasks per phase, and 3 subtasks per task.

We have 100 clients, and are we’ll create a project for each of them from our project template.

So you're looking at generating:

  • 100 projects × 5 phases × 10 tasks × 3 subtasks = 15,000 related records

Sounds simple, right? Wrong. Here's the catch:

  • You need the Project ID to create Phases

  • You need the Phase ID to create Tasks

  • You need the Task ID to create Subtasks

  • You want to do this FAST (because reasons)

  • You want to avoid 15,000+ individual database queries

We’ll explore a scenario with Laravel, but the pattern can apply to any other language or framework.

The "Normal" Approaches (And Why They Hurt)

Approach 1: Individual Inserts

foreach ($clients as $client) { $project = Project::create([ 'name' => 'Website Development - ' . $client->name, 'client_id' => $client->id, 'start_date' => $startDate, ]); foreach ($template->phases as $phaseTemplate) { $phase = Phase::create([ 'project_id' => $project->id, 'name' => $phaseTemplate->name, 'order' => $phaseTemplate->order, ]); foreach ($phaseTemplate->tasks as $taskTemplate) { $task = Task::create([ 'phase_id' => $phase->id, 'name' => $taskTemplate->name, 'estimated_hours' => $taskTemplate->estimated_hours, ]); foreach ($taskTemplate->subtasks as $subtaskTemplate) { Subtask::create([ 'task_id' => $task->id, 'name' => $subtaskTemplate->name, 'description' => $subtaskTemplate->description, ]); } } } }

Pros: Simple to understand and write

Cons: You're looking at 15,000+ database queries


Approach 2: Memory Mapping

// Step 1: Bulk insert all projects and keep references $projectData = []; foreach ($clients as $client) { $projectData[] = [ 'name' => 'Website Development - ' . $client->name, 'client_id' => $client->id, 'start_date' => $startDate, 'created_at' => now(), ]; } // Bulk insert projects Project::insert($projectData); // Get the inserted projects (this is where it gets tricky) $insertedProjects = Project::where('name', 'LIKE', 'Website Development%') ->where('created_at', '>=', now()->subMinute()) ->orderBy('id') ->get(); // Step 2: Prepare phases with real project IDs $phaseData = []; foreach ($insertedProjects as $index => $project) { foreach ($template->phases as $phaseTemplate) { $phaseData[] = [ 'project_id' => $project->id, 'name' => $phaseTemplate->name, 'order' => $phaseTemplate->order, 'created_at' => now(), ]; } } // Bulk insert phases Phase::insert($phaseData); // Get inserted phases and map them back... $insertedPhases = Phase::whereIn('project_id', $insertedProjects->pluck('id')) ->where('created_at', '>=', now()->subMinute()) ->orderBy('project_id', 'order') ->get(); // Step 3: Continue this pattern for tasks and subtasks... // (This gets increasingly complex and error-prone)

Pros:

  • Uses bulk inserts for better performance than individual creates

Cons:

  • Complex ID mapping logic that's prone to race conditions

  • Requires careful ordering and filtering to match inserted records

  • Gets exponentially more complex with each relationship level


Our Solution: The temp_identifier Trick

Here's where we got creative.

What if we could get the benefits of bulk inserts AND handle relationships properly?

The idea is simple:

  1. Add a temporary column to your tables

  2. Use it to track relationships during bulk operations

  3. Clean it up when you're done

Step 1: Add Temporary Columns

ALTER TABLE projects ADD COLUMN temp_identifier BIGINT NULL, ADD COLUMN batch_id VARCHAR(255) NULL; ALTER TABLE phases ADD COLUMN temp_identifier BIGINT NULL, ADD COLUMN batch_id VARCHAR(255) NULL; ALTER TABLE tasks ADD COLUMN temp_identifier BIGINT NULL, ADD COLUMN batch_id VARCHAR(255) NULL; ALTER TABLE subtasks ADD COLUMN batch_id VARCHAR(255) NULL;

Step 2: Prepare Data with Temporary IDs

$batchId = uniqid(); // Unique identifier for this batch to prevent concurrency issues $projectData = []; $phaseData = []; $taskData = []; $subtaskData = []; $projectCounter = 0; $phaseCounter = 0; $taskCounter = 0; foreach ($clients as $client) { $tempProjectId = $projectCounter++; // Prepare project data $projectData[] = [ 'temp_identifier' => $tempProjectId, 'batch_id' => $batchId, 'name' => 'Recurring Task - ' . $client->name, 'client_id' => $client->id, 'start_date' => $startDate, 'created_at' => now(), 'updated_at' => now(), ]; foreach ($template->phases as $phaseTemplate) { $tempPhaseId = $phaseCounter++; // Prepare phase data $phaseData[] = [ 'temp_identifier' => $tempPhaseId, 'batch_id' => $batchId, 'project_temp_identifier' => $tempProjectId, // Reference to project's temp_identifier 'name' => $phaseTemplate->name, 'order' => $phaseTemplate->order, 'created_at' => now(), 'updated_at' => now(), ]; foreach ($phaseTemplate->tasks as $taskTemplate) { $tempTaskId = $taskCounter++; // Prepare task data $taskData[] = [ 'temp_identifier' => $tempTaskId, 'batch_id' => $batchId, 'phase_temp_identifier' => $tempPhaseId, // Reference to phase's temp_identifier 'name' => $taskTemplate->name, 'estimated_hours' => $taskTemplate->estimated_hours, 'created_at' => now(), 'updated_at' => now(), ]; foreach ($taskTemplate->subtasks as $subtaskTemplate) { $subtaskData[] = [ 'batch_id' => $batchId, 'task_temp_identifier' => $tempTaskId, // Reference to task's temp_identifier 'name' => $subtaskTemplate->name, 'description' => $subtaskTemplate->description, 'created_at' => now(), 'updated_at' => now(), ]; } } } }

Step 3: Bulk Insert and Map IDs

// Bulk insert projects Project::insert($projectData); // Get the mapping of temp_identifier to real ID $projectMapping = Project::where('batch_id', $batchId) ->whereNotNull('temp_identifier') ->pluck('id', 'temp_identifier') ->toArray(); // Update phase data with real project IDs foreach ($phaseData as &$phase) { $phase['project_id'] = $projectMapping[$phase['project_temp_identifier']] ?? throw new Exception('Project temp_identifier not found'); unset($phase['project_temp_identifier']); } // Bulk insert phases Phase::insert($phaseData); // Get phase mapping $phaseMapping = Phase::where('batch_id', $batchId) ->whereNotNull('temp_identifier') ->pluck('id', 'temp_identifier') ->toArray(); // Update task data with real phase IDs foreach ($taskData as &$task) { $task['phase_id'] = $phaseMapping[$task['phase_temp_identifier']] ?? throw new Exception('Phase temp_identifier not found'); unset($task['phase_temp_identifier']); } // Bulk insert tasks Task::insert($taskData); // Get task mapping $taskMapping = Task::where('batch_id', $batchId) ->whereNotNull('temp_identifier') ->pluck('id', 'temp_identifier') ->toArray(); // Update subtask data with real task IDs foreach ($subtaskData as &$subtask) { $subtask['task_id'] = $taskMapping[$subtask['task_temp_identifier']] ?? throw new Exception('Task temp_identifier not found'); unset($subtask['task_temp_identifier']); } // Bulk insert subtasks Subtask::insert($subtaskData);

This should be wrapped in a DB::transaction with temporarily disabling foriegn key checks if required.

DB::transaction(function () use ($projectData, $phaseData, $taskData, $subtaskData, $batchId) { // Temporarily disable foreign key checks for bulk inserts DB::statement('SET FOREIGN_KEY_CHECKS=0;'); // Bulk Operation // Re-enable foreign key checks DB::statement('SET FOREIGN_KEY_CHECKS=1;'); });

Step 4: Clean Up

Project::where('batch_id', $batchId)->update(['temp_identifier' => null, 'batch_id' => null]); Phase::where('batch_id', $batchId)->update(['temp_identifier' => null, 'batch_id' => null]); Task::where('batch_id', $batchId)->update(['temp_identifier' => null, 'batch_id' => null]); Subtask::where('batch_id', $batchId)->update(['batch_id' => null]);

Pros:

  • Achieves true bulk insert performance with clean relationship handling

  • Simple integer lookups eliminate complex field matching

  • More predictable than approaches relying on insertion order or field matching

Cons:

  • Requires temporary database columns and schema changes

  • More complex setup than simpler approaches

  • Cleanup step needed after completion

When Should You Use This?

This approach works best when:

  • You're dealing with thousands of related records - multi-level relationships

Don't use it when:

  • You're inserting just a few hundred records (the overhead isn't worth it)

  • Your relationships are simple (just use insertGetId loops)

  • Your team isn't comfortable with the added complexity

Comparison

Let's say you're generating projects for 100 clients from a single template, each with 5 phases, 10 tasks per phase, and 3 subtasks per task:

ApproachDatabase QueriesMemory UsageReliability
Individual Inserts~20,600LowHigh
Memory Mapping~7HighMedium
temp_identifier~10HighHigh

The next time you're staring at a bulk insert problem that seems too complex, remember: sometimes the answer is as simple as adding a temporary column and getting creative with how you use it.


I hope you find this valuable, if you did - share it with your folks who'll find it useful too.

If you have any suggestions/comments please feel free.

Happy coding!