Drupal 8 PDF Export Optimization: How We Handled 600+ Node Reports

Reading time: 4 minutes

Last modified: 24 May 2018

The Challenge: Large-Scale PDF Generation in Drupal 8

When a client needed to generate comprehensive PDF reports from hundreds of Drupal nodes, we knew the standard Entity Print module wouldn’t cut it. The requirements were clear:

  • Export individual content nodes as separate PDFs
  • Combine up to 600 nodes into a single, unified PDF document
  • Operate within Pantheon’s strict execution time and memory limits
  • Maintain responsive performance for content editors
  • Ensure data integrity throughout the export process

Why Standard Drupal Solutions Weren’t Enough

Drupal 8’s Entity Print module is excellent for small to medium-sized exports, but it hits critical limitations with larger datasets. Here’s why traditional approaches failed:

Performance Bottlenecks

  • Memory Overload: Processing 600 nodes simultaneously required nearly 1GB of RAM
  • Execution Timeouts: The process could take up to 2 hours, exceeding standard web server timeouts
  • Server Restrictions: Pantheon’s platform enforces resource limits that prevented successful completion
  • Unreliable Processing: Standard methods often crashed midway, forcing restarts
Drupal PDF Export Process Architecture

Our Optimized Solution: Batch Processing with Drush

After extensive testing, we developed a robust solution that combines Drupal’s batch API with command-line processing. Here’s how we tackled the challenge:

Phase 1: Chunked PDF Generation

We implemented an intelligent batch processing system that:

  1. Processes nodes in optimized chunks (20 nodes per batch)
  2. Generates individual PDFs using Drupal’s Entity Print
  3. Tracks progress and handles failures gracefully
  4. Uses temporary storage efficiently
/**
 * Batch operation callback for PDF generation.
 * 
 * @param array $nids
 *   Array of node IDs to process.
 * @param array $context
 *   Batch context array for progress tracking.
 */
function mymodule_generate_pdf_batch_operation($nids, &$context) {
  $node_storage = \Drupal::entityTypeManager()->getStorage('node');
  $nodes = $node_storage->loadMultiple($nids);
  
  // Initialize results if not set
  if (!isset($context['results']['processed'])) {
    $context['results']['processed'] = 0;
    $context['results']['files'] = [];
  }
  
  foreach ($nodes as $node) {
    // Generate individual PDF for each node
      $pdf = \Drupal::service('entity_print.pdf.engine')->getBlob($node);
    // Save PDF to temporary directory
    file_save_data($pdf, 'temporary://pdf-export/node-' . $node->id() . '.pdf');
  }
  
  $context['results']['processed'] = count($nodes);
  $context['message'] = t('Processed @count nodes', ['@count' => count($nodes)]);
}

Phase 2: Efficient PDF Merging with Ghostscript

After generating individual PDFs, we used Ghostscript to merge them into a single document. This approach is significantly more efficient than PHP-based merging solutions:

# Example Ghostscript merge command
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf node-*.pdf

Performance Results: From Hours to Minutes

Our optimized solution delivered remarkable performance improvements:

Metric Before Optimization After Optimization Improvement
Processing Time 120+ minutes 14 minutes 88% faster
Memory Usage ~1GB ~256MB 75% reduction
Merge Operation 30+ minutes 9 minutes 70% faster
Final Output Unreliable Single 596-page PDF (25MB) 100% reliable

Key Technical Insights

  1. Batch Processing is Essential

    • Breaks large operations into manageable chunks
    • Prevents timeouts and memory issues
    • Provides better progress tracking
  2. Drush for Reliability

    • More stable than web-based processing
    • Better error handling and logging
    • Can run as a background process
  3. Ghostscript for PDF Merging

    • Native binary execution is faster than PHP solutions
    • Better memory management for large documents
    • Advanced compression options
  4. Modular Architecture

    • Works through both UI and command line
    • Easy to extend for different content types
    • Configurable batch sizes and processing parameters

Implementation Best Practices

Error Handling & Recovery

  • Implemented comprehensive error logging
  • Added automatic retry for failed operations
  • Created resume functionality for interrupted processes

Resource Management

  • Used Drupal’s temporary file system
  • Implemented automatic cleanup of temporary files
  • Added memory usage monitoring

User Experience

  • Clear progress indicators
  • Email notifications on completion
  • Download links for generated reports

Real-World Applications

This solution can be extended for various use cases:

  1. Scheduled Report Generation

    • Weekly/Monthly executive reports
    • Compliance documentation
    • Data dumps for archival
  2. E-commerce Applications

    • Bulk order processing
    • Invoice generation
    • Catalog exports
  3. Educational Platforms

    • Course material compilation
    • Student progress reports
    • Certification generation

Conclusion

By implementing this optimized batch processing solution, we transformed a previously unreliable, hours-long process into a reliable 14-minute operation. The key to success was combining Drupal’s batch API with command-line tools like Drush and Ghostscript, creating a robust solution that works within platform constraints while delivering excellent performance.

Ready to Optimize Your Drupal Site?

If you’re facing similar challenges with large-scale content processing in Drupal, our team can help. Contact us today to discuss how we can optimize your Drupal implementation.

Further Reading

  1. Drupal Batch API Documentation
  2. Ghostscript PDF Optimization Guide
  3. Pantheon Performance Best Practices
  4. Drupal 9 Migration Guide
  • Integration with Drupal’s queue system for even larger datasets

Need help with your Drupal performance challenges? Contact our team to discuss how we can optimize your Drupal implementation.

Table of Contents