The Challenge: Large-Scale PDF Generation in Drupal 8
When a client needed to generate comprehensive PDF reports from hundreds of Drupal nodes, we knew the standard Entity Print module wouldn’t cut it. The requirements were clear:
- Export individual content nodes as separate PDFs
- Combine up to 600 nodes into a single, unified PDF document
- Operate within Pantheon’s strict execution time and memory limits
- Maintain responsive performance for content editors
- Ensure data integrity throughout the export process
Why Standard Drupal Solutions Weren’t Enough
Drupal 8’s Entity Print module is excellent for small to medium-sized exports, but it hits critical limitations with larger datasets. Here’s why traditional approaches failed:
Performance Bottlenecks
- Memory Overload: Processing 600 nodes simultaneously required nearly 1GB of RAM
- Execution Timeouts: The process could take up to 2 hours, exceeding standard web server timeouts
- Server Restrictions: Pantheon’s platform enforces resource limits that prevented successful completion
- Unreliable Processing: Standard methods often crashed midway, forcing restarts

Our Optimized Solution: Batch Processing with Drush
After extensive testing, we developed a robust solution that combines Drupal’s batch API with command-line processing. Here’s how we tackled the challenge:
Phase 1: Chunked PDF Generation
We implemented an intelligent batch processing system that:
- Processes nodes in optimized chunks (20 nodes per batch)
- Generates individual PDFs using Drupal’s Entity Print
- Tracks progress and handles failures gracefully
- Uses temporary storage efficiently
/**
* Batch operation callback for PDF generation.
*
* @param array $nids
* Array of node IDs to process.
* @param array $context
* Batch context array for progress tracking.
*/
function mymodule_generate_pdf_batch_operation($nids, &$context) {
$node_storage = \Drupal::entityTypeManager()->getStorage('node');
$nodes = $node_storage->loadMultiple($nids);
// Initialize results if not set
if (!isset($context['results']['processed'])) {
$context['results']['processed'] = 0;
$context['results']['files'] = [];
}
foreach ($nodes as $node) {
// Generate individual PDF for each node
$pdf = \Drupal::service('entity_print.pdf.engine')->getBlob($node);
// Save PDF to temporary directory
file_save_data($pdf, 'temporary://pdf-export/node-' . $node->id() . '.pdf');
}
$context['results']['processed'] = count($nodes);
$context['message'] = t('Processed @count nodes', ['@count' => count($nodes)]);
}
Phase 2: Efficient PDF Merging with Ghostscript
After generating individual PDFs, we used Ghostscript to merge them into a single document. This approach is significantly more efficient than PHP-based merging solutions:
# Example Ghostscript merge command
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf node-*.pdf
Performance Results: From Hours to Minutes
Our optimized solution delivered remarkable performance improvements:
Metric | Before Optimization | After Optimization | Improvement |
---|---|---|---|
Processing Time | 120+ minutes | 14 minutes | 88% faster |
Memory Usage | ~1GB | ~256MB | 75% reduction |
Merge Operation | 30+ minutes | 9 minutes | 70% faster |
Final Output | Unreliable | Single 596-page PDF (25MB) | 100% reliable |
Key Technical Insights
-
Batch Processing is Essential
- Breaks large operations into manageable chunks
- Prevents timeouts and memory issues
- Provides better progress tracking
-
Drush for Reliability
- More stable than web-based processing
- Better error handling and logging
- Can run as a background process
-
Ghostscript for PDF Merging
- Native binary execution is faster than PHP solutions
- Better memory management for large documents
- Advanced compression options
-
Modular Architecture
- Works through both UI and command line
- Easy to extend for different content types
- Configurable batch sizes and processing parameters
Implementation Best Practices
Error Handling & Recovery
- Implemented comprehensive error logging
- Added automatic retry for failed operations
- Created resume functionality for interrupted processes
Resource Management
- Used Drupal’s temporary file system
- Implemented automatic cleanup of temporary files
- Added memory usage monitoring
User Experience
- Clear progress indicators
- Email notifications on completion
- Download links for generated reports
Real-World Applications
This solution can be extended for various use cases:
-
Scheduled Report Generation
- Weekly/Monthly executive reports
- Compliance documentation
- Data dumps for archival
-
E-commerce Applications
- Bulk order processing
- Invoice generation
- Catalog exports
-
Educational Platforms
- Course material compilation
- Student progress reports
- Certification generation
Conclusion
By implementing this optimized batch processing solution, we transformed a previously unreliable, hours-long process into a reliable 14-minute operation. The key to success was combining Drupal’s batch API with command-line tools like Drush and Ghostscript, creating a robust solution that works within platform constraints while delivering excellent performance.
Ready to Optimize Your Drupal Site?
If you’re facing similar challenges with large-scale content processing in Drupal, our team can help. Contact us today to discuss how we can optimize your Drupal implementation.
Further Reading
- Drupal Batch API Documentation
- Ghostscript PDF Optimization Guide
- Pantheon Performance Best Practices
- Drupal 9 Migration Guide
- Integration with Drupal’s queue system for even larger datasets
Need help with your Drupal performance challenges? Contact our team to discuss how we can optimize your Drupal implementation.