1 of 53

Drupal Backend Performance and Scalability

Ashok Modi (btmash)

Christoph Weber (ChristophWeber)

DrupalCamp LA 2014

2 of 53

About Presentation

  • Similar content to other performance related presentations.
    • There are tons of performance presentations at camp on specific topics.
  • Lot of material.
    • May cover rest in BoFs
  • Won't really go into shared hosting.
    • Small part of presentation (code related optimizations) may apply.

3 of 53

About Presentation

  • May talk bit about cloud (dedicated sessions at camp)
  • Have a question? Ask!
  • Have something to share? Come on up!
    • We can all learn from each other.

4 of 53

Performance related Sessions

  • Asynchronous PHP and Real Time Messaging
    • Steve Rhoades
  • Headless Drupal (multiple sessions!)
    • Josh Koenig (Drupal 8 demos)
    • Steve Rifkin (AngularJS)
    • Matt Chapman (KnockoutJS / Backbone)
    • Matt Wrather (Drupal 7 demos)
  • HHVM (multiple sessions!)
    • Sara Golemon
    • Josh Koenig

5 of 53

Performance related Sessions

  • Get your head “in the cloud” with AWS
    • Jeremy Lindblom
  • Cloud Lightning for Drupal
    • Chris Charlton
  • Performance for Fun and Profit
    • Christoph Weber

  • Possible BoFs
  • Talk outside the sessions (bofs, hallways, lunch, dinner, parking lot)
    • Everybody

6 of 53

Goals and Objectives

  • Define them.
    • Do you want a faster page response for end user?
    • Handle more traffic?
    • Minimize downtime?
  • Related...but different.
  • Gets harder and harder to achieve better performance.
    • More Infrastructure.
    • Patching / Hacking Drupal.
    • Change site architecture.
    • Make Drupal faster by not using Drupal.

7 of 53

Diagnosis

  • Proper diagnosis is essential before proposing and implementing a solution.
  • Based on proper data.
  • Analysis of data.
    • Possible paths of optimization.

8 of 53

Validation

  • Avoid the 'wild goose chase'.
  • Validate results on a test server.
  • Replicate the data.
    • Backup and Migrate is useful.
    • Migrate is a heavier but also useful approach.
  • Recreate the site.
  • Gather a performance difference between test and production server.
  • Measure again and see if relative times remain the same.

9 of 53

Points of Optimization.

  • Introduction
  • Tools to measure and diagnose issues.
  • Speed optimizations.

10 of 53

Hardware - Introduction

  • Physical device matters (maybe not as much now?).
  • Multiple cores are the norm.
    • 32 > 16 > 8 > 4 > 2 > 1
  • Lots of RAM (caching file system, db, and page rendering as much as possible).
  • Multiple disks (split one server into various disks / servers)
    • SSD is much faster than regular HD.
    • Tuning DB on SSD is different from DB on HD.

11 of 53

LAMP Stack

  • Traditionally most common stack for hosting Drupal / similar applications.
    • Linux
    • Apache
    • MySQL
    • PHP
  • Not discussing Windows.
    • Can discuss outside.
  • There are many options out there.
  • Grow out your stack to many other technologies.

12 of 53

Multiple Servers

  • Master DB server, multiple web servers.
    • Use a load balancer (or something like HAProxy or other round robin).
    • Set up slave DB servers for select queries.
    • Or set up a cluster (mysql/galera in open source)
  • Do it only if you have the budget or resources.
  • Complexity is expensive.
  • Tuning a system can avoid or delay a split.
  • Site by 2bits runs on 1 server.
    • Read more at http://goo.gl/XueVY

13 of 53

Testing Tools

  • Apache Benchmark (DEMO)
    • ab –n 100 –c 10 http://www.example.com
    • ab –n 10 –c 10 –C PHPSESSID=<sessid> http://www.example.com
    • Do 10 concurrent requests for up to 100 requests.
    • Average response time per second.
    • How many requests handled per second.
  • Jmeter
    • Similar to Apache Benchmark.
    • Can natively use on Windows.
    • Can test POST functionality.

14 of 53

Testing Tools

  • Grinder
  • Gatling
    • http://gatling.io/
  • LoadStorm
    • Web service to load test site.

15 of 53

Console Monitoring Tools

  • Top
    • Real time monitoring.
    • Load average.
    • CPU utilization.
    • Memory usage.
    • List of processes.
  • htop
    • Similar to top but for multiple cores.
    • Faster.
    • Very slick.

16 of 53

Console Monitoring Tools (cont'd)

  • atop
    • Shows network statistics.
    • Runs a collection daemon in the background.
  • vmstat
    • Report memory statistics
  • netstat
    • Shows active network connections
    • netstat –an
    • netstat –an | grep EST

17 of 53

Graphical Monitoring Tools

  • Cacti
    • http://www.cacti.net
    • Available as a package on Ubuntu, Debian (various other *nix/bsd flavors).
    • Easy to understand graphs.
    • Displays history over day, week, month, year.
    • Graphs available to display stats for CPU, memory, network, Apache, MySQL.
    • Many others written by others available online.

18 of 53

Graphical Monitoring Tools (cont'd)

  • Munin
    • http://munin.projects.linpro.no
    • Very similar to Cacti (doesn’t require a db).
  • Nagios
    • Very powerful (alerts by email, sms, etc).
    • Drupal module for integration.
    • Lots of configuration.
  • Panopta and New Relic offer hosted monitoring.
  • VPS providers may also offer *something*.

19 of 53

Linux / BSD

  • Use proven, stable distribution (Debian, Ubuntu LTS, RHEL, CentOS)
  • Use recent versions.
  • Use whatever your staff has expertise in.
  • Try to avoid bloat.
    • Don't install PostGreSQL if you are using MySQL, no desktop version, java, etc.
  • Balance compiling own version vs. using packages.
    • Compiling gives full control.
    • Possible pain to upgrade.

20 of 53

Apache

  • Most popular, supported, feature rich.
  • Stable.
  • Usually enabled with too many modules.
    • mod_proxy, mod_cgi, mod_dav, etc may be unnecessary.
    • Smaller processes = less memory
      • More users can access site.
    • apachectl -M - show all enabled modules.
  • apachetop
    • Reads / Analyzes apache access logs.
    • Good to detect crawlers.
    • apachetop -f /path/to/access.log

21 of 53

Apache Optimizations

  • MaxClients (prevent swapping / thrashing)
    • Too low - cannot serve enough clients.
    • Too high - you run out of memory and start swapping. Server dies and you cannot serve any clients.
  • MaxRequestsPerChild
    • Tune it to terminate process faster and free up memory.
  • KeepAlive
    • Keep it enabled.
    • New connects will not get opened constantly.
  • mod_gzip/deflate

Fewer bytes = serve content more quickly.

22 of 53

Varnish

  • HTTP accelerator
    • Serve millions of pages of content will little impact on server.
  • Used on Drupal.org, grammys.com, etc.
  • Set up as reverse proxy to web server (Apache, Nginx, etc) if cannot serve itself.
  • Serve anonymous page requests and static files.
    • D6 core will not serve anonymously - pressflow.
    • D7 core and varnish play nicely.
  • Requires tuning.

23 of 53

Varnish (cont'd)

  • Define IP/Port in backend <name> for each server.
    • Define multiple backends for different servers.
    • backend b1 {.host="127.0.0.1"; .port="81"}
  • Use director to group backends for round robin.
  • return (pass); // Do not cache
  • return (look); // Return cache or lookup backend, cache, and serve.
  • unset beresp.http.Set-Cookie; // Remove cookie - what allows caching.
  • Lullabot’s article: http://goo.gl/7JFrP
  • Basic setup for D7: http://goo.gl/l7601
    • Tested on own blog last year - handled about 3k requests per second. Apache w/ mod_php handled about 30 - 50 in comparison.

24 of 53

Nginx

  • http://nginx.net
  • Stable
  • More lightweight than Apache.
    • Uses less memory.
    • Less functionality.
    • Good enough for 90% of use cases out there.
  • Easy to set up.
    • http://wiki.nginx.org/Drupal for base setup.
  • Run PHP as FastCGI process.
    • Can also do this with Apache (install php-fpm and don't look back).
  • Uses less memory.
  • Act as proxy server (load balancer).
  • Built-in file caching (like boost / varnish).
  • Built-in ESI (called SSI - also like varnish).

25 of 53

Nginx (cont'd)

  • Lots of options.
    • worker_processes 24; // Max number of processes. 1 per cpu
    • worker_connections 1000; // "Max Clients"
    • keepalive_timeout 30; // Keepalive.
    • gzip enabled; // mod_gzip
  • http://dak1n1.com/blog/12-nginx-performance-tuning
  • https://github.com/perusio/drupal-with-nginx
    • Tuning specifically for Drupal (uses nginx cache).
    • Server went from serving 200+ reqs / second from base to 3000+ reqs / second from varnish to 4000+ reqs / second from nginx.

26 of 53

Other web servers.

  • Cherokee
    • http://www.cherokee-project.com/
    • Benchmarks are very promising.
    • Configured through web ui.
    • Have not used it.
    • Atleast 2x as fast as Nginx.
  • G-WAN
    • http://gwan.ch/
    • Very new.
    • Potentially 3x - 4x faster than Nginx.
  • Any others?

27 of 53

MySQL/MariaDB

  • Most popular database for Drupal.
  • Easy to set up, lots to tune.
    • Pressflow, D7+ install with InnoDB as default, which requires tuning even for small sites.
  • Various pluggable engines (InnoDB, MyISAM, Aria, etc)
  • Forks
    • Percona - Closest to MySQL 5.5
    • MariaDB - More changes.
    • Drizzle - Rewritten in C++.
  • MySQL 5.5 is a big difference.
    • More to tune.
    • http://goo.gl/hU8tW

28 of 53

MySQL Monitoring

  • mtop / mytop
    • Like top but for MySQL.
    • Real time monitoring (no history).
    • Shows slow queries and locks.
    • If you have neither - SHOW FULL PROCESSLIST;
  • mysqlreport
    • Deprecated but still useful.
    • http://hackmysql.com/mysqlreport
    • Reports on server - no recommendations (documentation explains everything about stats)
  • mysqltuner
    • Comes with percona / mariadb.
    • Simple but useful.

29 of 53

MySQL Engines

  • MyISAM
    • Fast reads.
    • Less overhead.
    • Poor concurrency (table level locking).
  • InnoDB
    • Transactional.
    • Slower (SELECT COUNT(...))
    • Better concurrency (row level locking).
  • Forks
    • Percona comes with XtraDB (InnoDB replacement).
    • MariaDB also comes with XtraDB, Aria (MyISAM).
    • XtraDB contains patches that did not get in.
    • Same tuning settings.

30 of 53

MySQL Tuning

  • Lots of things - focus on a few.
  • innodb_buffer_pool_size
    • Very important.
    • Set up to 80% of memory allocated for DB to this.
      • If DB is small, use memory elsewhere.
  • innodb_flush_log_at_trx_commit
    • Each update flushes log by default (expensive).
    • 0 => No flush on transaction.
    • 2 => Flush cache on transaction.
      • log still flushed every second.
    • No flush loses 1-2 seconds on OS crash. Cache flush loses 1 second on hard server crash.

31 of 53

MySQL Tuning (cont'd)

  • innodb_log_file_size
    • Important for sites with lots of writes.
  • table_cache
    • Opening tables can be expensive.
    • Keep tables open in cache.
    • See output from mysqltuner (usually >1024)
  • thread_cache
    • Increase if lots of quick connections.
  • query_cache_size
    • Will cache query results.
    • Generally 32M - 512M.
  • Use mysqltuner to help you get started.
  • Use mysqlreport to give finer tuning options.

32 of 53

MySQL Replication

  • Used on Drupal.org
    • INSERT/UPDATE/DELETE goes to master.
    • SELECT goes to slave(s).
  • Provide noticeable improvements.
  • Supported in D7.
    • D6 => Pressflow.
  • Beware of complexity.
    • Connection bet master/slave goes down, bad day.
  • Extensive tuning could alleviate need for slave.

33 of 53

MySQL Replication (cont'd)

  • MySQL Cluster
    • Scales well.

High Availability.

    • Expensive.
  • Galera
    • Relatively new (2009)
    • Allows Master/Master setup.
    • Recommend atleast 3 servers (and odd numbers)
    • 1 out of sync -> quorum decides which one needs to be in line with others.
  • Various cloud options (Amazon RDS, etc.)
    • Slower but higher throughput.

34 of 53

MongoDB

  • Document Oriented.
    • 'no-sql'
    • b.collection.insert|add|update({parameters})
  • Retrieve subsets.
  • Manages collections of objects in json-like format.
  • Supports up to 64 indexes
    • 1 for ascending order, -1 for descending order.
  • Supports replication.
  • Built-in clustering.
  • Very fast.
  • Still need to architect despite being ‘schemaless’

35 of 53

MongoDB (cont'd)

  • http://drupal.org/project/mongodb
    • D6, D7
    • Cache, Field Storage, Blocks, Queues, Sessions, Watchdog.
    • Does a lot of heavy lifting.
    • See most gains from field storage, queues, watchdog.
    • Ad-hoc test to update 50000 nodes. Took 3.5 hours w/ reg. database. Took 40 minutes with mongo.
    • For anything exported into mongodb, previous sql queries need to become mongo queries.
      • Use EntityFieldQuery for entities.
      • Backwards compatible queries :)
      • http://drupal.org/project/efq_views

36 of 53

PHP

  • Use a recent, stable release.
    • D7 requires 5.2.x, as do a few 6.x contributed modules.
    • D8 will require 5.4.x.
  • Install an opcode cacher / accelerator.
    • Useful in bringing down memory usage.
      • APC
      • eAccelerator
      • XCache
      • Zend optimizer (commercial)
  • PHP 5.4 comes with OpCache (yay!)
  • Compile into hiphop. (2 sessions on topic)

37 of 53

Running PHP

  • mod_php
    • Standard module used by Apache.
    • Well tested, supported.
    • Resource hog.
  • FastCGI (PHP-FPM)
    • Proxy requests from web server to FastCGI process
    • Supported by Apache, NginX, Cherokee, etc.
    • Runs as a separate process (or pool).
    • Secure.
    • Slightly slower than mod_php.
    • Much more stable behavior (limited # of php processes).
    • Lots of online documentation.

38 of 53

Debugging PHP

  • XDebug
    • http://xdebug.org
    • Display traces on error conditions.
    • Trace functions.
    • Profile PHP scripts.
    • Manually used for testing D8 performance on WSCCI initiative.
  • kCacheGrind

39 of 53

Op-code caching

  • Lower memory usage.
  • Decrease in CPU utilization.
  • Usage on http://calarts.edu lowered memory usage from 45M down to less than 10M.
  • May crash.
  • May require restarts after updating code
  • Won't always work.
    • Network connections.
    • Sorting arrays.
    • Faster queries.
    • Bad code is bad.

40 of 53

Drupal

  • Database intensive.
  • Resource hog.
  • Memory Intensive.
    • (D8 >) D7 > D6 > D5
    • Disable unnecessary modules.
      • Views UI, Rules UI, <module> UI in production.
      • Create your own custom field consisting of all the data you need in one row.
    • Make sure cron runs regularly.

41 of 53

Hosted Drupal providers known for high performance

  • Acquia (Drupal)
  • Omega8.cc (High perf. Drupal/Aegir)
  • Pantheon (Wordpress and Drupal)
  • Platform.sh (Any PHP project?)

42 of 53

Debugging Drupal

  • Everything from before.
  • Devel
    • http://drupal.org/project/devel
    • Total page execution.
    • Query execution times.
    • Memory utilization.
    • Combine with stress testing.
  • DB Tuner
  • Trace

43 of 53

Drupal Caching

  • Helpful in not repeatedly processing same content.
  • Great for anonymous who would not see differing content.
  • Many caches in core.
    • Bootstrap
    • Field
    • ...
    • Page
  • Many from contrib modules (views, rules).

44 of 53

Useful contrib caching modules.

  • EntityCache
    • http://drupal.org/project/entitycache
    • Stays in cache until expiry or content is deleted/updated.
  • Boost
    • http://drupal.org/project/boost
    • Create html versions of pages and serve those.
    • Requires changes to .htaccess file (apache)
    • Does not load drupal once content is cached.
    • Can display site while in maintenance mode.
    • Varnish/Nginx have this built-in.
  • Views content cache.
  • Block Cache Alter
  • Panels Cache - serve authenticated users.

45 of 53

Pluggable caching

  • Use $conf variable in settings.php
    • $conf['cache_backends'][] = '/path/to/cache_type_1.inc';
    • $conf['cache_backends'][] = '/path/to/cache_type_2.inc';
    • $conf['cache_class_<bin>'] = 'CacheClass1';
    • $conf['cache_class_<bin>'] = 'CacheClass2';
  • Allows you to use a custom caching module.
    • APC (http://drupal.org/project/apc)
      • Very Fast.
      • Cannot use across multiple web servers.
    • Memcache (http://drupal.org/project/memcache)
      • Scalable.
    • Redis (http://drupal.org/project/redis)
      • Fast.

46 of 53

Memcached

  • Distributed object caching in memory.
  • Can span multiple servers.
  • D6, D7.
  • Scalable.
  • Does not clear cache on cron.
    • Call from CLI.
  • Not persistent.

47 of 53

Redis

  • More than just for caching.
    • Queue, Watchdog
    • Create your own complex data structure.
    • http://drupal.org/project/redis_ssi for powering logged in users solely through redis.
      • 1000s of requests per second.
      • Also takes drupal out of the picture.
  • Very fast.
  • Can store data on HD.
    • Recoverable cache!

48 of 53

Search

  • Drupal core search.
    • Slow.
  • Google Custom Search Engine
    • Better alternative.
  • Search API
    • Very promising alternative.
    • Pluggable system to support various backends.
      • MongoDB.
      • ApacheSolr / Lucene
      • SphinxSE
    • Facets.
    • Views Support.

49 of 53

ApacheSolr

  • Fast.
  • Scalable.
  • Easy-ish to configure.
  • Various companies offer Solr as a Service.
  • Available as standalone Drupal module.
    • Search API Backend module.
  • Views plugins
    • Drive non-search pages using Solr!

50 of 53

Other Options

  • Optimized Distribution
    • Pressflow (D6).
      • Only supports MySQL.
      • Supports reverse proxies.
      • Requires PHP5.
    • Cocomore.
      • Same as pressflow.
  • Pressflow for D7 is very similar to D7 core.
    • Need performance backports from D8.

51 of 53

Other Options (cont'd)

  • 'Patch' Drupal.
    • Hack Core.
    • Need to know what you're doing.
    • Sometimes it is necessary.
    • Create a patches directory.
      • Better yet, use drush make and call your patches in there.
    • Create own module and alter DB schema from there.

52 of 53

Advice for developers.

  • Take advantage of caching.
    • Whole session dedicated to just that!
  • Use memory wisely.
    • Unset a variable if you don't need it anymore.
    • Save a variable to static memory (see drupal_static())
  • Take advantage of AHAH functionality.
    • Fewer queries.
    • No page rendering.
    • Save bandwidth.
  • Learn to use jQuery.

53 of 53

Thank you

  • Have a question?
  • Want to talk more about performance?
    • Let's talk after :)