Drupal 8 PDF Export Optimization: How We Handled 600+ Node Reports

Reading time: 4 minutes

Last modified: 24 May 2018

The Challenge: Large-Scale PDF Generation in Drupal 8

When a client needed to generate comprehensive PDF reports from hundreds of Drupal nodes, we knew the standard Entity Print module wouldn’t cut it. The requirements were clear:

Export individual content nodes as separate PDFs
Combine up to 600 nodes into a single, unified PDF document
Operate within Pantheon’s strict execution time and memory limits
Maintain responsive performance for content editors
Ensure data integrity throughout the export process

Why Standard Drupal Solutions Weren’t Enough

Drupal 8’s Entity Print module is excellent for small to medium-sized exports, but it hits critical limitations with larger datasets. Here’s why traditional approaches failed:

Performance Bottlenecks

Memory Overload: Processing 600 nodes simultaneously required nearly 1GB of RAM
Execution Timeouts: The process could take up to 2 hours, exceeding standard web server timeouts
Server Restrictions: Pantheon’s platform enforces resource limits that prevented successful completion
Unreliable Processing: Standard methods often crashed midway, forcing restarts

Our Optimized Solution: Batch Processing with Drush

After extensive testing, we developed a robust solution that combines Drupal’s batch API with command-line processing. Here’s how we tackled the challenge:

Phase 1: Chunked PDF Generation

We implemented an intelligent batch processing system that:

Processes nodes in optimized chunks (20 nodes per batch)
Generates individual PDFs using Drupal’s Entity Print
Tracks progress and handles failures gracefully
Uses temporary storage efficiently

/**
 * Batch operation callback for PDF generation.
 * 
 * @param array $nids
 *   Array of node IDs to process.
 * @param array $context
 *   Batch context array for progress tracking.
 */
function mymodule_generate_pdf_batch_operation($nids, &$context) {
  $node_storage = \Drupal::entityTypeManager()->getStorage('node');
  $nodes = $node_storage->loadMultiple($nids);
  
  // Initialize results if not set
  if (!isset($context['results']['processed'])) {
    $context['results']['processed'] = 0;
    $context['results']['files'] = [];
  }
  
  foreach ($nodes as $node) {
    // Generate individual PDF for each node
      $pdf = \Drupal::service('entity_print.pdf.engine')->getBlob($node);
    // Save PDF to temporary directory
    file_save_data($pdf, 'temporary://pdf-export/node-' . $node->id() . '.pdf');
  }
  
  $context['results']['processed'] = count($nodes);
  $context['message'] = t('Processed @count nodes', ['@count' => count($nodes)]);
}

Phase 2: Efficient PDF Merging with Ghostscript

After generating individual PDFs, we used Ghostscript to merge them into a single document. This approach is significantly more efficient than PHP-based merging solutions:

# Example Ghostscript merge command
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf node-*.pdf

Performance Results: From Hours to Minutes

Our optimized solution delivered remarkable performance improvements:

Metric	Before Optimization	After Optimization	Improvement
Processing Time	120+ minutes	14 minutes	88% faster
Memory Usage	~1GB	~256MB	75% reduction
Merge Operation	30+ minutes	9 minutes	70% faster
Final Output	Unreliable	Single 596-page PDF (25MB)	100% reliable

Key Technical Insights

Batch Processing is Essential
- Breaks large operations into manageable chunks
- Prevents timeouts and memory issues
- Provides better progress tracking
Drush for Reliability
- More stable than web-based processing
- Better error handling and logging
- Can run as a background process
Ghostscript for PDF Merging
- Native binary execution is faster than PHP solutions
- Better memory management for large documents
- Advanced compression options
Modular Architecture
- Works through both UI and command line
- Easy to extend for different content types
- Configurable batch sizes and processing parameters

Implementation Best Practices

Error Handling & Recovery

Implemented comprehensive error logging
Added automatic retry for failed operations
Created resume functionality for interrupted processes

Resource Management

Used Drupal’s temporary file system
Implemented automatic cleanup of temporary files
Added memory usage monitoring

User Experience

Clear progress indicators
Email notifications on completion
Download links for generated reports

Real-World Applications

This solution can be extended for various use cases:

Scheduled Report Generation
- Weekly/Monthly executive reports
- Compliance documentation
- Data dumps for archival
E-commerce Applications
- Bulk order processing
- Invoice generation
- Catalog exports
Educational Platforms
- Course material compilation
- Student progress reports
- Certification generation

Conclusion

By implementing this optimized batch processing solution, we transformed a previously unreliable, hours-long process into a reliable 14-minute operation. The key to success was combining Drupal’s batch API with command-line tools like Drush and Ghostscript, creating a robust solution that works within platform constraints while delivering excellent performance.

Ready to Optimize Your Drupal Site?

If you’re facing similar challenges with large-scale content processing in Drupal, our team can help. Contact us today to discuss how we can optimize your Drupal implementation.