LAMP considerations, installation and tuning

It’s been three years since I had been involved in mass hosting back at one of the biggest ISPs in my country. Well, it’s time to refresh memory and catch up with the change log. This article presents somewhat basic overview of different building blocks of the traditional LAMP application stack. It also provides few tips on installing and optimizing some of those components on a VPS with limited resources.

Apache httpd

Versions

As of 21th February 2012 the new major release of Apache httpd is 2.4, while the previous one was 2.2. There are numerous welcoming enhancements in the latest major version, many of them being well suited for cloud environment. The primary goal of new release is to deliver more performance, directly aiming at nginx (which has the reputation of providing better performance and lower memory footprint).

New features include:

  • Improved performance (lower resource utilization and better concurrency)
  • Reduced memory usage
  • Dynamic reverse proxy configuration
  • Performance on par, or better, than pure event-driven Web servers
  • More granular timeout and rate/resource limiting capability
  • More finely-tuned caching support, tailored for high traffic servers and proxies
  • Dynamic loadable MPMs
  • mod_ssl with OCSP support
  • Event MPM is now fully supported

For the overview of all of the new features visit this link.

Architecture

Instead of implementing a single architecture, Apache provides several Multi-Processing Modules (MPMs) which allow Apache to run in a process based, hybrid based (processes and threads) and event based modes, to better match different demands. These modules are responsible for basic web server operation: binding to network ports on the machine, accepting requests and dispatching children to handle requests.

In the following paragraphs different Apache MPMs that are available for GNU/Linux are described.

prefork – safer choice, traditional, non-threaded, forking

A single control process is responsible for spawning child processes which listen for connections and serve them when they arrive. The control process is started as root to bind to port 80, while child processes are launched as less-privileged users (User and Group directives). Apache always tries to maintain a number of spare (idle) processes which stand ready to serve incoming requests.  Prefork is the best choice if Apache has to use non-thread-safe libraries (such as mod_php) and it is ideal for request (process) isolation. It is process per request model.

worker – hybrid multi-process, multi-threaded

A single process is responsible for spawning child processes. Each child process creates a fixed number of threads, as well as listener thread which listens for connections and passes them to server thread for processing. Basically worker uses threads to serve requests so it is being able to serve a large number of request by using less system resources than a prefork, process-based web server. It retains much of the stability of the process-based server by maintaining multiple processes available, each with many threads. Apache always tries to maintain a pool of spare or idle server threads, which stand ready to serve incoming requests. The main process is started as root, while the child processes and threads are launched as a less-privileged user (User and Group directives). Though, keep in mind that PHP’s thread safety is highly disputed.

event – consuming threads only for connections with active processing, based on worker

Event MPM is designed to allow more requests to be served simultaneously by passing some processing work to supporting threads, freeing up the main threads to work on new processes. Event MPM is experimental in 2.2, but stable and supported in 2.4.

I will also mention two more MPMs that are written to address the privilege separation problem (running virtual hosts under different UID and GID): mpm-itk and mpm-peruser. I won’t go into describing these as each of these have their own shortcomings, such as peruser MPM is considered not to be production ready, while mpm-itk processes request headers as root, switches to the target UID, and then kills the httpd process when finished serving the connection. Processing headers as root is very dangerous and opens the web server up to many potential security problems.

So to conclude MPMs – prefork is a safer choice, while threading should have better performance with less overhead (should be more effective within multiprocessor environments).

Apache allows the use of filters to process incoming or outgoing data in a configurable manner. Some examples of filter modules include mod_ssl for https, mod_deflate for compression/decompression on the fly and mod_ext_filter for running external programs as filters.

Apache also uses handlers which define action to be performed when a file is called. Handlers may be configured based on either file type, filename extensions or on location, without relation to file type. Some types of built-in handlers are server-status and server-info.

Installation and optimization

Since I want everything to install quickly and easily for the purpose of this article, I will not go for the newest version, but will instead install httpd version 2.2.15 from CentOS 6.2 repositories:

[pawwa@www2 ~]$ sudo yum install httpd

Query the set of the modules that are compiled directly into the binary:

[pawwa@www2 ~]$ httpd -l
Compiled in modules:
  core.c
  prefork.c
  http_core.c
  mod_so.c

The httpd version from the repositories has the common prefork MPM compiled in. If we needed the worker MPM we would have to compile httpd from source with enabled support for the worker module.

Next, ensure that the service starts on boot:

[pawwa@www2 ~]$ chkconfig --list httpd
httpd          	0:off	1:off	2:off	3:off	4:off	5:off	6:off
[pawwa@www2 ~]$ sudo chkconfig --level 35 httpd on
[pawwa@www2 ~]$ chkconfig --list httpd
httpd          	0:off	1:off	2:off	3:on	4:off	5:on	6:off

Start the service with the default configuration and check its memory footprint and number of processes:

[pawwa@www2 ~]$ sudo service httpd start
Starting httpd:                                            [  OK  ]
[pawwa@www2 ~]$ sudo netstat -tlnp | grep httpd
tcp        0      0 :::80                       :::*                        LISTEN      4408/httpd
[pawwa@www2 ~]$ sudo ps -C httpd -O user,ppid,vsz,rss --forest
  PID USER      PPID    VSZ   RSS S TTY          TIME COMMAND
 4408 root         1 175524  3796 S ?        00:00:00 /usr/sbin/httpd
 4410 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4411 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4412 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4413 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4414 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4415 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4416 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd
 4417 apache    4408 175524  2412 S ?        00:00:00  \_ /usr/sbin/httpd

This confirms that we are using prefork MPM: there is one parent httpd process that is listening on http port which launched several child processes. 8 server processes are started because of the ‘StartServers 8’ directive in httpd.conf. The listing shows virtual memory size of 175524 KB (VSZ is entire virtual memory of a process, pretty much irrelevant, VmLib + VmExe + VmData + VmStk) and resident set size of 2413 of individual child processes (RSS is the non-swapped physical memory that a task has used,  including code, data and stack segments). Keep in mind that GNU/Linux ps utility does not report the real memory usage of the process, only the approximation, since it assumes that the single process is the only one running on the system, while there are of course dozen of running processes at any given time that share single copies of referenced libraries (such as libc). The more realistic representation of memory usage can be acquired with pmap -d PID command but its output is somewhat hard to interpret. To make it short, “r-x” are code segments, while “rw-” are data segments. If you factor out the shared libraries code segments, you end up with wirtable/private figure that is shown at the end of the pmap output, which in my example shows 1996K for httpd child process:

[pawwa@www2 ~]$ sudo pmap -d 4417 | tail -1
mapped: 175524K    writeable/private: 1996K    shared: 580K

This is different and less than approximated 2412K reported by ps. This information becomes crucial for configuring MaxClients directive which determines how many concurrent requests the web server can actually handle. Another trick to get detailed stats is to check  /proc/PID/status of a process.

Next step is to comment LoadModule directives in httpd.conf for the dynamicly shared modules (DSOs) that are not being used. When I have cleaned my configuration out of unnecessary modules I reduced the size of a single httpd child process by ~500K. Here’s a dump of modules at this point:

[pawwa@www2 ~]$ httpd -M
Loaded Modules:
 core_module (static)
 mpm_prefork_module (static)
 http_module (static)
 so_module (static)
 auth_basic_module (shared)
 authn_file_module (shared)
 authn_default_module (shared)
 authz_host_module (shared)
 authz_user_module (shared)
 authz_groupfile_module (shared)
 authz_default_module (shared)
 log_config_module (shared)
 logio_module (shared)
 setenvif_module (shared)
 mime_module (shared)
 autoindex_module (shared)
 negotiation_module (shared)
 dir_module (shared)
 actions_module (shared)
 alias_module (shared)
Syntax OK

Later on I will be adding modules such as mod_rewrite, mod_deflate and mod_ssl, which will increase httpd’s memory footprint for about 500K.

Next, move-on to prefork MPM optimization. The most important directive is MaxClients, which defines the number of simultaneous requests that can be served. It defines the hard limit on the number of running child httpd processes. The default value of 256 sounds like a lot and should be reduced on smaller systems. A too high setting can pre-fork a lot of httpd processes which could use all of the servers available memory. This could cause the system to come to an OOM state and start thrashing (performing more memory management and paging in and out then any real work). One should furthermore carefully tune this parameter if the server is running DBMS and any other subsystems. The more memory you leave for the operating system, the more space it will have to perform file system caching. Choosing values for MaxClients involves some trial and error, and measuring number of processes at peak times. Basically, the figure could be calculated as:

( total memory – operating system memory – DBMS memory ) / memory footprint of httpd

Eliminate all the extra checks the Web server must do. Some well known optimization techniques are to disable hostname lookups, enable FollowSymLinks (saves a lot on disk activity), disable .htaccess checks on every directory (move .htaccess configuration to httpd.conf whenever possible), avoid wild cards as parts of directives, experiment on keep alive values for handling persistent connections, and similar.

From the hardware perspective, RAM is crucial for performance (caching, mod_mem_cache for dynamic content), while heavily dynamic websites have higher CPU requirements. RAID arrays with striping modes of operation increase file serving performance.

 

PHP: Hypertext Preprocessor

PHP is a general purpose scripting language that is most frequently embedded into HTML. Ultimately, the code is interpreted by a web server containing a PHP processor module which generates a web page.

Versions

PHP 5 is the current generation of PHP. At the time of writing this article the supported versions are 5.3.10 and 5.4.0. One can run several versions on a single system and contain debug builds.

PHP handling

There are different ways to integrate PHP with Apache httpd. Each handler affects web server performance and features that can be used by employing them. The basics of a few popular PHP handlers are presented below.

mod_php runs as Apache DSO module. This means excellent performance, but not so flexible, and poor security as all PHP code is run as the user that has launched httpd process (defined with the User directive within httpd.conf). This is the most usual way to use PHP tough.

suPHP runs as a CGI module, but executes scripts as the user who owns them. It consists of an Apache module (mod_suphp) and a setuid root binary (suphp) that is called by the Apache module to change the uid of the process executing the PHP interpreter. It is more secure as the scripts not owned by a particular user will not be executable. Also, the files that have permissions set to world writable will likewise be non-executable. You can have a custom php.ini per site and run PHP 4 and PHP 5 at the same time. The drawback is that suPHP runs higher on CPU making it a slower solution than mod_php (mod_php is about 20-30% faster) and it does not support opcode caching.

FastCGI is high performance CGI. It has the security benefits of user separation, very good performance and support for opcode caching. The caveat: high memory usage. This is because rather then creating a process per PHP request, it keeps a persistent session open in the background. One benefit is that you can have FastCGI running on different machine.

PHP-FPM (FastCGI Process Manager) is an alternative PHP FastCGI implementation with some additional features useful for sites of any size, especially busier sites. It is good in security and performance terms, but I am interested in the stability.

There is also mod_ruid2, which isn’t a PHP handler but can work with one (except with FastCGI). It is an Apache extension that allows all requests to a domain to run as the owner of that domain. It is usually deployed in conjunction with mod_php to leverage security of user separation through POSIX capabilities, while having excellent performance. It also supports opcode caching, as oposed to suPHP, but allows only one php.ini for all web sites. I am interested in how stable this module really is, as one benchmark marked it as bad in terms of stability.

In terms of security, almost all of these modules are subject to a server compromise if Apache was hacked, and because of that I would recommend employing additional security layers, such as mandatory access control mechanisms.

Installation

To deploy PHP to prefork-based Apache for a VPS that is tight on CPU and RAM, I choose mod_php and mod_ruid2 for user-based security. Follow installation instructions from mod_ruid2’s README file, while a simple ‘yum install php’ will install mod_php and all the necessary files such as php.ini, few PHP modules, php.conf configuration for httpd and such.

If you need additional PHP modules for your applications, search for their corresponding package names by executing ‘yum search php. module’. For example, if you will be connecting to MySQL database from your PHP scripts you will need to install php-mysql package. Keep in mind that you should install only the modules you will actually need!

PHP files don’t need to be executable, since they are handled by the module directly (AddHandler Apache directive). The php.ini configuration file is read when PHP starts up, which happens only once  – when the web server has started. Remember to copy the recommended php.ini-production file over /etc/php.ini for production environment. You can set PHP configuration directives in httpd.conf and .htaccess files also as far as you have ‘AllowOverride Options’ or ‘AllowOverride All’ privileges.

Keep an eye on the following PHP resource settings:

  • max_execution_time – how much CPU seconds a script can use (30)
  • max_input_time – How long (seconds) a script can wait for input data (60)
  • memory_limit – How much memory (bytes) a script can consume before being killed (32M)
  • output_buffering – How much data (bytes) to buffer before sending out to the client (4096)

Since my VPS has a small amount of RAM I will not employ opcode caching. If you have more RAM, install opcode cache such as eAccelerator, APC or ionCube Accelerator. Opcode is a binary representation of the code to be executed. An opcode cache saves compiled opcode and reuses it the next time the page is called. This saves a considerable amount of time.

 

MySQL

MySQL is a popular database for use in web applications. It is also used in many large-scale web sites such as wikipedia, google, facebook, twitter, youtube and others.

Versions

Versions 5.0.x and below are no longer actively developed. Current stable release is 5.5.23. As of version 5.1 there are two offerings that have a common code base: the open source MySQL Community Server and commercial MySQL Enterprise Server.

Since Sun Microsystems was acquired by Oracle, MySQL database had gotten a community-developed fork under the name of MariaDB. The aim of the project is to provide a drop-in replacement to MySQL while being licensed under GNU/GPL.

Features and database engines

MySQL had grown into a full featured RDBMS. Among other features, it supports transactions, stored procedures, triggers, SSL, query caching, ACID compliance, replication, clustering, … It supports several different storage engines which provide CRUD (create, read, update, delete) functions on the database. The most popular ones are:

MyISAMfor read speed, but no transactions. Table-level locking (performance can suffer with high profile applications). Full-text indexes and searches. They crash relatively often. In the case of a crash, it has to rescan whole indexes and possibly tables to recover. There is a myisamchk utility to repair the database in the case of data corruption, but it is not guaranteed to work. Many hosting providers only support MyISAM.  Each table is presented as three files on disk with the following extensions: frm (definition), MYD (data) and MYI (index data).

InnoDB – default as of 5.5, ACID compliant transactional features, referential integrity through foreign keys, higher concurrency. Row-level locking. In the case of a crash it recovers faster from the transactional log.

Installation and tuning

Install mysql server and client packages:

[pawwa@www2 ~]$ sudo yum install mysql mysql-server

Start mysqld service and run mysql_secure_installation to set the root password, remove test database and anonymous users:

[pawwa@www2 ~]$ sudo mysql_secure_installation

If the goal of database tuning is to reduce the memory footprint of the database, eliminating various buffers will certainly help, at the expense of query speed and application performance. Instead, one of the metrics should be application response time, which opens up tuning possibilities other than just the database’s memory usage. Always look at optimizing your queries first though – the most dramatic benefits usually come from proper indexing and carefully written queries.

Here are some parameters to tune:

  • key_buffer_size – the most useful single variable to tweak (some rough suggestions are to set it to at least a quoter of available memory). The larger you set it, the more of your MyISAM table indexes you store in memory.
  • innodb_buffer_pool_size – while the key_buffer_size is the variable to target for MyISAM tables, for InnoDB tables it is innodb_buffer_pool_size.
  • table_open_cache – each time MySQL accesses a table, it places it in the cache. If your system accesses many tables, it is faster to have these in the cache. A good way to see whether your system needs to increase this is to examine the value of open_tables at peak times. Variables to watch are Open_tables and Opened_tables.
  • sort_buffer – it can be useful if performing large numbers of sorts.
  • read_rnd_buffer_size – if you use many queries with ORDER BY, increasing this parameter can improve performance.
  • tmp_table_size – This variable determines the maximum size for a temporary table in memory. If the table becomes too large, a MyISAM table is created on disk. Try to avoid temporary tables by optimizing the queries where possible, but where this is not possible, try to ensure temporary tables are always stored in memory. Watching the processlist for queries with temporary tables that take too long to resolve can give you an early warning that tmp_table_size needs to be increased.

 

This is the end of this longish article 🙂 As a final tip remember that constant measurement of performance really helps in tuning and optimizing. Watch CPU, IO, bandwidth, etc. Spot trends. React.

Comments are closed.