Emiller Nginx模块开发指南

作者:Evan Miller

作者上来先来了一段废话说nginx巨像蝙蝠侠,都很快什么什么的,而且nginx还能把cpu和内存处理的巨牛B,并且在巨大的压力下还能很happy的工作。但是蝙蝠侠是要靠一个腰带的,没了腰带蝙蝠侠就不行了。

特点1:蝙蝠侠腰带什么的

对蝙蝠侠腰带感兴趣的同学,请去看电影,这一段不翻译,主要原因是看不懂。。。

跟蝙蝠侠腰带相对的,nginx有一个“模块链”(译注:这条链子是仅仅是用来hold模块的,与模块的调用和运行基本关系不大)。当nginx需要gzip或者chunk-encode(译注:需要了解gzip和chunk-encode的同学可以参考http://www.w3schools.com的文章)一个响应的时候,他就调用一个模块搞定这些。nginx要阻止一个IP段或者非法http请求的时候,也是通过调用模块链上的模块来搞定的。nginx跟Memcache或者FastCGI通讯的时候,又有一些模块充当通讯工具。

又是一段关于蝙蝠侠的。。。一如既往的不翻译。并且看不懂。。

特点2:这一段话主要是说,你读完了这片guide基本上就能写一些相对比较牛B的扩展出来了,让nignx去做一些新的工作。但是,由于nginx还是有一些潜规则的,需要你掌握一些奇技淫巧才能比较游刃有余的控制他,那么你可能会一次一次的来翻这个文档。最后,作者说:写nginx模块还是有点难度滴(译注:标准废话)。

(译注:但是通过这片文章我真的了解了蝙蝠侠。。。谢谢作者。。。)

(译注:目录不翻译)

0. 前戏(译注:请原谅我这么直言不讳的翻译)

写nginx扩展需要你对C语言有一定的了解,注意,这里所谓的了解不仅仅是C语法,你应该了解更多,比如数据结构啊,指针啊,函数引用啊,还有预处理什么的。如果你还没有到这个份上,那看看这本书:K&R(译注:作者推荐的,我也没看过)。

还有,就是了解一些关于HTTP协议的东西也挺有用的,毕竟你是在搞一个web server么~

另外就是要对nginx的配置文件巨熟悉。无论如何,先废话介绍一下(译注:熟悉配置文件的请跳过):配置文件里边有四种上下文(译注:原文的context,不知道翻译成啥更贴切,后边还是用context的原文),这些context中都有一些带参数的指令。main context里的指令适用于所有其他的context;server context适用于制定的主机和端口;upstream context里的指令会提交到后端服务器;还有location context指令只应用于匹配到的web location(就是“/”,“/images”什么的)。location context的配置是从他前一个location继承来的(译注:就跟apache里的VirtualHost差不多的意思),server context是从main context继承来的。upstream context不继承任何一个属性;它有自己单独的指令,这些指令在别处也没用。在这里小提一下这四种contexts,恩,记住他们。

Let's get started.

开始吧(译注:终于开始了,泪奔庆祝)

1. nginx模块委托概览

nginx模块有三种角色:

你能想到的与web server相关的工作,几乎都是模块来完成的:nginx处理一个文件请求或者是把一个请求转发到别的服务器,是由一个handler模块来做这件事的;nginx要gzip一个输出或者执行一次SSI,就会调用一个filter模块。nginx的“core”模块关注所有的网络和应用协议,还有一坨模块以怎样的顺序去处理请求。这种松散架构(译注:原文是de-centralized architecture,我比较喜欢翻成松散架构,或许不准确,大家可以自行换成自己喜欢的词)使得开发者可以非常方便自由的做爱做的事(译注:这种架构也是我最喜欢的,我的一个Perl项目,也采用了这种架构,非常灵活;还有LotusPHP的php框架,也采用了这种架构,非常强大)。

注意:跟apache的模块不同,nginx的模块不是动态加载的,也就是说nginx的模块必须和主程序一起编译进去(译注:调试时候略显痛苦,不过拷二进制文件的方法可以稍微缓解一下)。

模块是如何被调用的呢?一般来说,在服务启动的时候,每一个handler都会根据配置文件找到属于他的location,然后挂上去;如果多个handler要挂在同一个location,那只有一个能成功挂上(好的配置文件不会让这种情况发生)(译注:别干那种让多个handler竞争的傻事...否则则死的巨悲惨)。handler有三种返回方式:一切正常,发生错误,或这拒绝处理请求并转向默认handler(一般是静态文件,译注:404之类的)。

如果这个handler是一个反向代理,那么load-balancer就有一次被调用的机会。load-balancer拿到一个请求和一坨后端服务器,并且决定把这个请求发到具体的哪一台上。nginx带了两个负载均衡模块:轮询(译注:作者举例就像发牌,我觉得这不需要任何举例);还有一种是:IP哈希,保证来自同一个IP的请求都能发到同一台机器(译注:在性能调忧的过程中,IP哈希可能更好用一点)。

如果handler不报错,就会调用filter。每个location都可以挂n个filter,所以一个响应可以先被压缩,然后分块。这些filter的执行顺序在编译阶段就决定了(译注:忘了apache是怎么处理这个顺序问题的,我想知道,求指教)。filter采用了经典的“CHAIN OF RESPONSIBILITY”设计模式:一个filter被调用,做属于它的工作,然后调用下一个,直到最后一个filter被调用,nginx完成这次响应。

filter chain真正牛的地方是,每一个filter可以不用等他之前的filter完全结束,就开始对前一个filter的输出进行操作,就像Unix pipeline一样(译注:整个协议是个流,而不是一个块,所以一个响应在这堆filter里不具有原子性,这样filter的效率大大提高)。filter一般在一个4K大小的一个缓冲区里进行操作,但可以在nginx.conf里改。也就是说,例如一个模块可以在完全接收到后端传来的响应之前,就开始对一部分数据进行压缩并传到客户端。牛B!

总结一下这些概念性的东西,一般的的操作周期就是:

客户端发了一个http请求 → nginx根据location配置挑了一个handler → (如果一切正常)load-balancer会挑一个后端服务器 → handler做它该做的事,并且把一个一个的输出缓冲发给第一个filter → 第一个filter → 第一个filter把输出传给第二个filter → 第二个给第三个 → 依次往后传 → 最后响应发还给客户端

这里我所谓“一般的”,是因为nginx模块的调用是可以非常定制化的。他给模块开发这极大的自由去定义模块在何时如何去运行(有时候这个自由甚至显得太大了)(译注:原文用了burden,责任这个词,我更倾向这种对模块运行时的定义是一种自由)。模块调用是通过很多的回调来实现的,也就是说,你要写一堆可以被调用的函数:

(译注:作者又开始感慨了)我嘞个擦!看了这么多的东西。你应该储备了很多的知识了,但你似乎还是只能用这些钩子方法什么的工作。还是深入这些模块看看吧。

2. nginx模块的各个组件

正如我所说的,开发一个nginx模块的时候,具有非常非常强的灵活性。这部分就介绍一下这些东西。类似于理解模块的一个指南或者是你觉得你已经准备好开始开发模块的时候的一个参考资料。(译注:这一部分是最重要的,而且在日后的开发中,这部分可以当作字典来用)

2.1. 模块配置结构

一个模块要定义三个配置结构,分别对应main、server和location context。大多数模块只需要定义一个location的结构就行了。命名规则如下:ngx_http_<模块名>_[main|srv|loc]_conf_t,一下是个取自dav模块(译注:dav模块被认为是nginx模块开发的hello world,属于必读代码)的例子:

typedef struct {

    ngx_uint_t  methods;

    ngx_flag_t  create_full_put_path;

    ngx_uint_t  access;

} ngx_http_dav_loc_conf_t;

注意,nginx有一些特别的数据类型(ngx_uint_t 和 ngx_flag_t);这些都是原始数据类型的别名(在 core/ngx_config.h里可以找到这些别名的定义)。

配置结构里的这些成员会被模块的指令付值(译注:换句话说,这些成员是配置文件里每一条指令的句柄)。

2.2. Module Directives

2.2. 模块指令

一个模块的指令放在一个叫ngx_command_ts的静态数组里。这有个如何声明指令的例子:

static ngx_command_t  ngx_http_circle_gif_commands[] = {

    { ngx_string("circle_gif"),

      NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS,

      ngx_http_circle_gif,

      NGX_HTTP_LOC_CONF_OFFSET,

      0,

      NULL },

    { ngx_string("circle_gif_min_radius"),

      NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_CONF_TAKE1,

      ngx_conf_set_num_slot,

      NGX_HTTP_LOC_CONF_OFFSET,

      offsetof(ngx_http_circle_gif_loc_conf_t, min_radius),

      NULL },

      ...

      ngx_null_command

};

(译注:这里用ngx_string()定义的字符串就是配置文件中的指令名)

这是ngx_command_t的声明(在core/ngx_conf_file.h里):

struct ngx_command_t {

    ngx_str_t             name;

    ngx_uint_t            type;

    char               *(*set)(ngx_conf_t *cf, ngx_command_t *cmd, void *conf);

    ngx_uint_t            conf;

    ngx_uint_t            offset;

    void                 *post;

};

貌似有点小复杂,但每一个成员都有意义。

name是指令字符串,不包括空格。这是个ngx_str_t类型的数据,这种数据类型通常用ngx_string(“prox_pass”)(译注:原文写成了ngx_str,应该是ngx_string)的方式来实例化(译注:实例化是一个oop的词,不过用在这挺贴切的)。注意:ngx_str_t这个数据结构包括了一个类型为字符串的data成员和一个记录data长度的整型形的len成员。nginx几乎所有涉及到字符串的地方都使用这种数据结构(译注:很大程度上是为了方便内存管理,这种方式初始化的字符串,是放在一个pool中的)。

type是一个flag的集合,用来标记指令在哪是合法的,有几个参数什么的。在使用过程中,通常是下边这些数值的按位或:

还有一些其他的选项,参见core/ngx_conf_file.h

set结构成员是一个指向设定该模块配置函数的指针;set结构成员是一个指向设定该模块配置函数的指针;一般来说这个函数用来处理该指令接收到的参数并把处理后的值存入相应的结构中。这个设定函数可以带三个参数:

  1. 一个指向ngx_conf_t结构的指针,包括该指令接收到的参数
  2. 一个指向当前ngx_command_t结构的指针
  3. 一个指向模块自定义配置结构的指针

当检测到相应指令时,这写设置函数就会被调用。nginx提供了一堆把自定义的配置结构转换成标准类型的函数。这些函数包括:

还有别的,不过都不复杂(参考:core/ngx_conf_file.h)。模块开发者也可以把对自定义函数的引用写在这里,用来替代内建函数。

那么内建函数是如何获得这些数据的存储位置呢?ngx_command_t里有两个成员标记这个位置,conf和offset。conf标示数据是放在main,server还是location(NGX_HTTP_MAIN_CONF_OFFSET、NGX_HTTP_SRV_CONF_OFFSET、或NGX_HTTP_LOC_CONF_OFFSET)。offset标示在相应的配置里写在哪个结构中。

最后,post就是一个指向其他无足轻重的变量的指针,一般来说配置为NULL。

ngx_null_command是这个数组的结束标志。

2.3. The Module Context

2.3. 模块上下文

这是一个静态的ngx_http_module_t结构,由一系列负责创建配置和合并配置的函数引用组成。命名规则是:ngx_http_<模块名>_module_ctx。函数引用依次为:

不同的参数取决于不同的功能。这里有一个摘自http/ngx_http_config.h的结构定义,可以看出不同的回调函数之间的区别:

typedef struct {

    ngx_int_t   (*preconfiguration)(ngx_conf_t *cf);

    ngx_int_t   (*postconfiguration)(ngx_conf_t *cf);

    void       *(*create_main_conf)(ngx_conf_t *cf);

    char       *(*init_main_conf)(ngx_conf_t *cf, void *conf);

    void       *(*create_srv_conf)(ngx_conf_t *cf);

    char       *(*merge_srv_conf)(ngx_conf_t *cf, void *prev, void *conf);

    void       *(*create_loc_conf)(ngx_conf_t *cf);

    char       *(*merge_loc_conf)(ngx_conf_t *cf, void *prev, void *conf);

} ngx_http_module_t;

你可以把不需要的函数就设成NULL,nginx会搞定它。

大多数的handler都只用最后两个:一个用来给特定的配置分配内存(叫作:ngx_http_<模块名>_create_loc_conf),另一个用来设置默认值,和合并继承来的配置(叫作:ngx_http_<模块名>_merge_loc_conf)。合并函数还负责检测到非法配置后的报错;这些错误会导致服务终止。

这里是一个示例模块的context结构:

static ngx_http_module_t  ngx_http_circle_gif_module_ctx = {

    NULL,                          /* preconfiguration */

    NULL,                          /* postconfiguration */

    NULL,                          /* create main configuration */

    NULL,                          /* init main configuration */

    NULL,                          /* create server configuration */

    NULL,                          /* merge server configuration */

    ngx_http_circle_gif_create_loc_conf,  /* create location configuration */

    ngx_http_circle_gif_merge_loc_conf /* merge location configuration */

};

是时候去了解更深入的东西了。这些配置回调在几乎所有的模块中都差不多,而且调用的都是同一块nginx API,所以还是很值得了解的。

2.3.1. create_loc_conf

2.3.1. create_loc_conf

这是一个最精简的create_loc_conf函数,出自circle_gif模块(查看原码)。它输入一个指令结构(ngx_conf_t),返回一个新建模块的配置结构(在这个例子中是:ngx_http_circle_gif_loc_conf_t)。

static void *

ngx_http_circle_gif_create_loc_conf(ngx_conf_t *cf)

{

    ngx_http_circle_gif_loc_conf_t  *conf;

    conf = ngx_pcalloc(cf->pool, sizeof(ngx_http_circle_gif_loc_conf_t));

    if (conf == NULL) {

        return NGX_CONF_ERROR;

    }

    conf->min_radius = NGX_CONF_UNSET_UINT;

    conf->max_radius = NGX_CONF_UNSET_UINT;

    return conf;

}

首先要注意nginx的内存分配,只要用ngx_palloc(malloc的封装)或ngx_palloc(calloc的封装),系统会自动释放内存(译注:跟前边提到过的ngx_string一样,都是一个pool)。

UNSET常量有可能是NGX_CONF_UNSET_UINT、NGX_CONF_UNSET_PTR、NGX_CONF_UNSET_SIZE或NGX_CONF_UNSET_MSEC,并且表示全部的NGX_CONF_UNSET。UNSET常量告诉合并函数哪些值应该被覆盖。

2.3.2. merge_loc_conf

这里是circle_gif模块中用到的合并函数:

static char *

ngx_http_circle_gif_merge_loc_conf(ngx_conf_t *cf, void *parent, void *child)

{

    ngx_http_circle_gif_loc_conf_t *prev = parent;

    ngx_http_circle_gif_loc_conf_t *conf = child;

    ngx_conf_merge_uint_value(conf->min_radius, prev->min_radius, 10);

    ngx_conf_merge_uint_value(conf->max_radius, prev->max_radius, 20);

    if (conf->min_radius < 1) {

        ngx_conf_log_error(NGX_LOG_EMERG, cf, 0,

            "min_radius must be equal or more than 1");

        return NGX_CONF_ERROR;

    }

    if (conf->max_radius < conf->min_radius) {

        ngx_conf_log_error(NGX_LOG_EMERG, cf, 0,

            "max_radius must be equal or more than min_radius");

        return NGX_CONF_ERROR;

    }

    return NGX_CONF_OK;

}

首先要知道nginx提供了很好用的合并不同数据类型的函数(ngx_conf_merge_<数据类型>_value);这些参数是:

  1. this location's value

      1、当前location的值

  1. the value to inherit if #1 is not set

      2、如果第一个参数没有付值

  1. the default if neither #1 nor #2 is set

      3、如果前两个参数都没有付值

结果放在第一个参数中,合并函数包括ngx_conf_merge_size_value和ngx_conf_merge_msed_value,还有其他的参考:core/ngx_conf_file.h。

有一个问题:第一个参数是用来付值的,那么这些默认值是如何被写进去的呢?

回答:这些函数其实都是预处理命令(在编译之前,他们就会被扩展成一些“if”语句什么的)。

还有要注意的就是错误是如何产生的;函数去写日志文件,并返回NGX_CONF_ERROR。返回代码会终止服务。(如果这些消息被标记了NGX_LOG_EMERG,那么消息也会打印到标准输出;core/nginx_log.h有完整的log级别定义。)

2.4. The Module Definition

2.4. 模块定义

下边我们增加一层,ngx_module_t结构。他的变量命名为ngx_http_<模块名>_module。之前说道的context引用和指令执行都包含其中,另外还有回调函数(退出线程/进程,等等)。模块定义有时候就像是一个用来查找与某一特定模块相关的数据的键。模块定义通常就像这样:

ngx_module_t  ngx_http_<module name>_module = {

    NGX_MODULE_V1,

    &ngx_http_<module name>_module_ctx, /* module context */

    ngx_http_<module name>_commands,   /* module directives */

    NGX_HTTP_MODULE,               /* module type */

    NULL,                          /* init master */

    NULL,                          /* init module */

    NULL,                          /* init process */

    NULL,                          /* init thread */

    NULL,                          /* exit thread */

    NULL,                          /* exit process */

    NULL,                          /* exit master */

    NGX_MODULE_V1_PADDING

};

<module name>的位置就填相应的模块名。模块可以为进程/线程创建和销毁定义回调函数,但是大多数模块都不这么干,保持一个简洁的定义。(参数详情参见:core/ngx_conf_file.h)

2.5. Module Installation

2.5. 模块安装

模块的安装方式各不相同,取决于他们是那一种类型的:句柄,过滤器,还是负载均衡;所以安装的详细说明参见各自章节。(译注:作者写这一节是为了让我骂他么?)

3. Handlers

Now we'll put some trivial modules under the microscope to see how they work.

3.1. Anatomy of a Handler (Non-proxying)

Handlers typically do four things: get the location configuration, generate an appropriate response, send the header, and send the body. A handler has one argument, the request struct. A request struct has a lot of useful information about the client request, such as the request method, URI, and headers. We'll go over these steps one by one.

3.1.1. Getting the location configuration

This part's easy. All you need to do is call ngx_http_get_module_loc_conf and pass in the current request struct and the module definition. Here's the relevant part of my circle gif handler:

static ngx_int_t

ngx_http_circle_gif_handler(ngx_http_request_t *r)

{

    ngx_http_circle_gif_loc_conf_t  *circle_gif_config;

    circle_gif_config = ngx_http_get_module_loc_conf(r, ngx_http_circle_gif_module);

    ...

Now I've got access to all the variables that I set up in my merge function.

3.1.2. Generating a response

This is the interesting part where modules actually do work.

The request struct will be helpful here, particularly these elements:

typedef struct {

...

/* the memory pool, used in the ngx_palloc functions */

    ngx_pool_t                       *pool;

    ngx_str_t                         uri;

    ngx_str_t                         args;

    ngx_http_headers_in_t             headers_in;

...

} ngx_http_request_t;

uri is the path of the request, e.g. "/query.cgi".

args is the part of the request after the question mark (e.g. "name=john").

headers_in has a lot of useful stuff, such as cookies and browser information, but many modules don't need anything from it. See http/ngx_http_request.h if you're interested.

This should be enough information to produce some useful output. The full ngx_http_request_t struct can be found in http/ngx_http_request.h.

3.1.3. Sending the header

The response headers live in a struct called headers_out referenced by the request struct. The handler sets the ones it wants and then calls ngx_http_send_header(r). Some useful parts of headers_out include:

typedef stuct {

...

    ngx_uint_t                        status;

    size_t                            content_type_len;

    ngx_str_t                         content_type;

    ngx_table_elt_t                  *content_encoding;

    off_t                             content_length_n;

    time_t                            date_time;

    time_t                            last_modified_time;

..

} ngx_http_headers_out_t;

(The rest can be found in http/ngx_http_request.h.)

So for example, if a module were to set the Content-Type to "image/gif", Content-Length to 100, and return a 200 OK response, this code would do the trick:

    r->headers_out.status = NGX_HTTP_OK;

    r->headers_out.content_length_n = 100;

    r->headers_out.content_type.len = sizeof("image/gif") - 1;

    r->headers_out.content_type.data = (u_char *) "image/gif";

    ngx_http_send_header(r);

Most legal HTTP headers are available (somewhere) for your setting pleasure. However, some headers are a bit trickier to set than the ones you see above; for example, content_encoding has type (ngx_table_elt_t*), so the module must allocate memory for it. This is done with a function called ngx_list_push, which takes in an ngx_list_t (similar to an array) and returns a reference to a newly created member of the list (of type ngx_table_elt_t). The following code sets the Content-Encoding to "deflate" and sends the header:

    r->headers_out.content_encoding = ngx_list_push(&r->headers_out.headers);

    if (r->headers_out.content_encoding == NULL) {

        return NGX_ERROR;

    }

    r->headers_out.content_encoding->hash = 1;

    r->headers_out.content_encoding->key.len = sizeof("Content-Encoding") - 1;

    r->headers_out.content_encoding->key.data = (u_char *) "Content-Encoding";

    r->headers_out.content_encoding->value.len = sizeof("deflate") - 1;

    r->headers_out.content_encoding->value.data = (u_char *) "deflate";

    ngx_http_send_header(r);

This mechanism is usually used when a header can have multiple values simultaneously; it (theoretically) makes it easier for filter modules to add and delete certain values while preserving others, because they don't have to resort to string manipulation.

3.1.4. Sending the body

Now that the module has generated a response and put it in memory, it needs to assign the response to a special buffer, and then assign the buffer to a chain link, and then call the "send body" function on the chain link.

What are the chain links for? Nginx lets handler modules generate (and filter modules process) responses one buffer at a time; each chain link keeps a pointer to the next link in the chain, or NULL if it's the last one. We'll keep it simple and assume there is just one buffer.

First, a module will declare the buffer and the chain link:

    ngx_buf_t    *b;

    ngx_chain_t   out;

The next step is to allocate the buffer and point our response data to it:

    b = ngx_pcalloc(r->pool, sizeof(ngx_buf_t));

    if (b == NULL) {

        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,

            "Failed to allocate response buffer.");

        return NGX_HTTP_INTERNAL_SERVER_ERROR;

    }

    b->pos = some_bytes; /* first position in memory of the data */

    b->last = some_bytes + some_bytes_length; /* last position */

    b->memory = 1; /* content is in read-only memory */

    /* (i.e., filters should copy it rather than rewrite in place) */

    b->last_buf = 1; /* there will be no more buffers in the request */

Now the module attaches it to the chain link:

    out.buf = b;

    out.next = NULL;

FINALLY, we send the body, and return the status code of the output filter chain all in one go:

    return ngx_http_output_filter(r, &out);

Buffer chains are a critical part of Nginx's IO model, so you should be comfortable with how they work.

Trivia question: Why does the buffer have the last_buf variable, when we can tell we're at the end of a chain by checking "next" for NULL?

Answer: A chain might be incomplete, i.e., have multiple buffers, but not all the buffers in this request or response. So some buffers are at the end of the chain but not the end of a request. This brings us to...

3.2. Anatomy of an Upstream (a.k.a Proxy) Handler

I waved my hands a bit about having your handler generate a response. Sometimes you'll be able to get that response just with a chunk of C code, but often you'll want to talk to another server (for example, if you're writing a module to implement another network protocol). You could do all of the network programming yourself, but what happens if you receive a partial response? You don't want to block the primary event loop with your own event loop while you're waiting for the rest of the response. You'd kill the Nginx's performance. Fortunately, Nginx lets you hook right into its own mechanisms for dealing with back-end servers (called "upstreams"), so your module can talk to another server without getting in the way of other requests. This section describes how a module talks to an upstream, such as Memcached, FastCGI, or another HTTP server.

3.2.1. Summary of upstream callbacks

Unlike the handler function for other modules, the handler function of an upstream module does little "real work". It does not call ngx_http_output_filter. It merely sets callbacks that will be invoked when the upstream server is ready to be written to and read from. There are actually 6 available hooks:

create_request crafts a request buffer (or chain of them) to be sent to the upstream

reinit_request is called if the connection to the back-end is reset (just before create_request is called for the second time)

process_header processes the first bit of the upstream's response, and usually saves a pointer to the upstream's "payload"

abort_request is called if the client aborts the request

finalize_request is called when Nginx is finished reading from the upstream

input_filter is a body filter that can be called on the response body (e.g., to remove a trailer)

How do these get attached? An example is in order. Here's a simplified version of the proxy module's handler:

static ngx_int_t

ngx_http_proxy_handler(ngx_http_request_t *r)

{

    ngx_int_t                   rc;

    ngx_http_upstream_t        *u;

    ngx_http_proxy_loc_conf_t  *plcf;

    plcf = ngx_http_get_module_loc_conf(r, ngx_http_proxy_module);

/* set up our upstream struct */

    u = ngx_pcalloc(r->pool, sizeof(ngx_http_upstream_t));

    if (u == NULL) {

        return NGX_HTTP_INTERNAL_SERVER_ERROR;

    }

    u->peer.log = r->connection->log;

    u->peer.log_error = NGX_ERROR_ERR;

    u->output.tag = (ngx_buf_tag_t) &ngx_http_proxy_module;

    u->conf = &plcf->upstream;

/* attach the callback functions */

    u->create_request = ngx_http_proxy_create_request;

    u->reinit_request = ngx_http_proxy_reinit_request;

    u->process_header = ngx_http_proxy_process_status_line;

    u->abort_request = ngx_http_proxy_abort_request;

    u->finalize_request = ngx_http_proxy_finalize_request;

    r->upstream = u;

    rc = ngx_http_read_client_request_body(r, ngx_http_upstream_init);

    if (rc >= NGX_HTTP_SPECIAL_RESPONSE) {

        return rc;

    }

    return NGX_DONE;

}

It does a bit of housekeeping, but the important parts are the callbacks. Also notice the bit about ngx_http_read_client_request_body. That's setting another callback for when Nginx has finished reading from the client.

What will each of these callbacks do? Usually, reinit_request, abort_request, and finalize_request will set or reset some sort of internal state and are only a few lines long. The real workhorses are create_request and process_header.

3.2.2. The create_request callback

For the sake of simplicity, let's suppose I have an upstream server that reads in one character and prints out two characters. What would my functions look like?

The create_request needs to allocate a buffer for the single-character request, allocate a chain link for that buffer, and then point the upstream struct to that chain link. It would look like this:

static ngx_int_t

ngx_http_character_server_create_request(ngx_http_request_t *r)

{

/* make a buffer and chain */

    ngx_buf_t *b;

    ngx_chain_t *cl;

    b = ngx_create_temp_buf(r->pool, sizeof("a") - 1);

    if (b == NULL)

        return NGX_ERROR;

    cl = ngx_alloc_chain_link(r->pool);

    if (cl == NULL)

        return NGX_ERROR;

/* hook the buffer to the chain */

    cl->buf = b;

/* chain to the upstream */

    r->upstream->request_bufs = cl;

/* now write to the buffer */

    b->pos = "a";

    b->last = b->pos + sizeof("a") - 1;

    return NGX_OK;

}

That wasn't so bad, was it? Of course, in reality you'll probably want to use the request URI in some meaningful way. It's available as an ngx_str_t in r->uri, and the GET paramaters are in r->args, and don't forget you also have access to the request headers and cookies.

3.2.3. The process_header callback

Now it's time for the process_header. Just as create_request added a pointer to the request body, process_header shifts the response pointer to the part that the client will receive. It also reads in the header from the upstream and sets the client response headers accordingly.

Here's a bare-minimum example, reading in that two-character response. Let's suppose the first character is the "status" character. If it's a question mark, we want to return a 404 File Not Found to the client and disregard the other character. If it's a space, then we want to return the other character to the client along with a 200 OK response. All right, it's not the most useful protocol, but it's a good demonstration. How would we write this process_header function?

static ngx_int_t

ngx_http_character_server_process_header(ngx_http_request_t *r)

{

    ngx_http_upstream_t       *u;

    u = r->upstream;

    /* read the first character */

    switch(u->buffer.pos[0]) {

        case '?':

            r->header_only; /* suppress this buffer from the client */

            u->headers_in.status_n = 404;

            break;

        case ' ':

            u->buffer.pos++; /* move the buffer to point to the next character */

            u->headers_in.status_n = 200;

            break;

    }

    return NGX_OK;

}

That's it. Manipulate the header, change the pointer, it's done. Notice that headers_in is actually a response header struct like we've seen before (cf. http/ngx_http_request.h), but it can be populated with the headers from the upstream. A real proxying module will do a lot more header processing, not to mention error handling, but you get the main idea.

But.. what if we don't have the whole header from the upstream in one buffer?

3.2.4. Keeping state

Well, remember how I said that abort_request, reinit_request, and finalize_request could be used for resetting internal state? That's because many upstream modules have internal state. The module will need to define a custom context struct to keep track of what it has read so far from an upstream. This is NOT the same as the "Module Context" referred to above. That's of a pre-defined type, whereas the custom context can have whatever elements and data you need (it's your struct). This context struct should be instantiated inside the create_request function, perhaps like this:

    ngx_http_character_server_ctx_t   *p;   /* my custom context struct */

    p = ngx_pcalloc(r->pool, sizeof(ngx_http_character_server_ctx_t));

    if (p == NULL) {

        return NGX_HTTP_INTERNAL_SERVER_ERROR;

    }

    ngx_http_set_ctx(r, p, ngx_http_character_server_module);

That last line essentially registers the custom context struct with a particular request and module name for easy retrieval later. Whenever you need this context struct (probably in all the other callbacks), just do:

    ngx_http_proxy_ctx_t  *p;

    p = ngx_http_get_module_ctx(r, ngx_http_proxy_module);

And p will have the current state. Set it, reset it, increment, decrement, shove arbitrary data in there, whatever you want. This is a great way to use a persistent state machine when reading from an upstream that returns data in chunks, again without blocking the primary event loop. Nice!

3.3. Handler Installation

Handlers are installed by adding code to the callback of the directive that enables the module. For example, my circle gif ngx_command_t looks like this:

    { ngx_string("circle_gif"),

      NGX_HTTP_LOC_CONF|NGX_CONF_NOARGS,

      ngx_http_circle_gif,

      0,

      0,

      NULL }

The callback is the third element, in this case ngx_http_circle_gif. Recall that the arguments to this callback are the directive struct (ngx_conf_t, which holds the user's arguments), the relevant ngx_command_t struct, and a pointer to the module's custom configuration struct. For my circle gif module, the function looks like:

static char *

ngx_http_circle_gif(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)

{

    ngx_http_core_loc_conf_t  *clcf;

    clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);

    clcf->handler = ngx_http_circle_gif_handler;

    return NGX_CONF_OK;

}

There are two steps here: first, get the "core" struct for this location, then assign a handler to it. Pretty simple, eh?

I've said all I know about handler modules. It's time to move onto filter modules, the components in the output filter chain.

4. Filters

Filters manipulate responses generated by handlers. Header filters manipulate the HTTP headers, and body filters manipulate the response content.

4.1. Anatomy of a Header Filter

A header filter consists of three basic steps:

  1. Decide whether to operate on this response
  2. Operate on the response
  3. Call the next filter

To take an example, here's a simplified version of the "not modified" header filter, which sets the status to 304 Not Modified if the client's If-Modified-Since header matches the response's Last-Modified header. Note that header filters take in the ngx_http_request_t struct as the only argument, which gets us access to both the client headers and soon-to-be-sent response headers.

static

ngx_int_t ngx_http_not_modified_header_filter(ngx_http_request_t *r)

{

    time_t  if_modified_since;

    if_modified_since = ngx_http_parse_time(r->headers_in.if_modified_since->value.data,

                              r->headers_in.if_modified_since->value.len);

/* step 1: decide whether to operate */

    if (if_modified_since != NGX_ERROR &&

        if_modified_since == r->headers_out.last_modified_time) {

/* step 2: operate on the header */

        r->headers_out.status = NGX_HTTP_NOT_MODIFIED;

        r->headers_out.content_type.len = 0;

        ngx_http_clear_content_length(r);

        ngx_http_clear_accept_ranges(r);

    }

/* step 3: call the next filter */

    return ngx_http_next_header_filter(r);

}

The headers_out structure is just the same as we saw in the section about handlers (cf. http/ngx_http_request.h), and can be manipulated to no end.

4.2. Anatomy of a Body Filter

The buffer chain makes it a little tricky to write a body filter, because the body filter can only operate on one buffer (chain link) at a time. The module must decide whether to overwrite the input buffer, replace the buffer with a newly allocated buffer, or insert a new buffer before or after the buffer in question. To complicate things, sometimes a module will receive several buffers so that it has an incomplete buffer chain that it must operate on. Unfortunately, Nginx does not provide a high-level API for manipulating the buffer chain, so body filters can be difficult to understand (and to write). But, here are some operations you might see in action.

A body filter's prototype might look like this (example taken from the "chunked" filter in the Nginx source):

static ngx_int_t ngx_http_chunked_body_filter(ngx_http_request_t *r, ngx_chain_t *in);

The first argument is our old friend the request struct. The second argument is a pointer to the head of the current partial chain (which could contain 0, 1, or more buffers).

Let's take a simple example. Suppose we want to insert the text "<l!-- Served by Nginx -->" to the end of every request. First, we need to figure out if the response's final buffer is included in the buffer chain we were given. Like I said, there's not a fancy API, so we'll be rolling our own for loop:

    ngx_chain_t *chain_link;

    int chain_contains_last_buffer = 0;

    for ( chain_link = in; chain_link != NULL; chain_link = chain_link->next ) {

        if (chain_link->buf->last_buf)

            chain_contains_last_buffer = 1;

    }

Now let's bail out if we don't have that last buffer:

    if (!chain_contains_last_buffer)

        return ngx_http_next_body_filter(r, in);

Super, now the last buffer is stored in chain_link. Now we allocate a new buffer:

    ngx_buf_t    *b;

    b = ngx_calloc_buf(r->pool);

    if (b == NULL) {

        return NGX_ERROR;

    }

And put some data in it:

    b->pos = (u_char *) "<!-- Served by Nginx -->";

    b->last = b->pos + sizeof("<!-- Served by Nginx -->") - 1;

And hook the buffer into a new chain link:

    ngx_chain_t   added_link;

    added_link.buf = b;

    added_link.next = NULL;

Finally, hook the new chain link to the final chain link we found before:

    chain_link->next = added_link;

And reset the "last_buf" variables to reflect reality:

    chain_link->buf->last_buf = 0;

    added_link->buf->last_buf = 1;

And pass along the modified chain to the next output filter:

    return ngx_http_next_body_filter(r, in);

The resulting function takes much more effort than what you'd do with, say, mod_perl ($response->body =~ s/$/<!-- Served by mod_perl -->/), but the buffer chain is a very powerful construct, allowing programmers to process data incrementally so that the client gets something as soon as possible. However, in my opinion, the buffer chain desperately needs a cleaner interface so that programmers can't leave the chain in an inconsistent state. For now, manipulate it at your own risk.

4.3. Filter Installation

Filters are installed in the post-configuration step. We install both header filters and body filters in the same place.

Let's take a look at the chunked filter module for a simple example. Its module context looks like this:

static ngx_http_module_t  ngx_http_chunked_filter_module_ctx = {

    NULL,                                  /* preconfiguration */

    ngx_http_chunked_filter_init,          /* postconfiguration */

  ...

};

Here's what happens in ngx_http_chunked_filter_init:

static ngx_int_t

ngx_http_chunked_filter_init(ngx_conf_t *cf)

{

    ngx_http_next_header_filter = ngx_http_top_header_filter;

    ngx_http_top_header_filter = ngx_http_chunked_header_filter;

    ngx_http_next_body_filter = ngx_http_top_body_filter;

    ngx_http_top_body_filter = ngx_http_chunked_body_filter;

    return NGX_OK;

}

What's going on here? Well, if you remember, filters are set up with a CHAIN OF RESPONSIBILITY. When a handler generates a response, it calls two functions: ngx_http_output_filter, which calls the global function reference ngx_http_top_body_filter; and ngx_http_send_header, which calls the global function reference ngx_top_header_filter.

ngx_http_top_body_filter and ngx_http_top_header_filter are the respective "heads" of the body and header filter chains. Each "link" on the chain keeps a function reference to the next link in the chain (the references are called ngx_http_next_body_filter and ngx_http_next_header_filter). When a filter is finished executing, it just calls the next filter, until a specially defined "write" filter is called, which wraps up the HTTP response. What you see in this filter_init function is the module adding itself to the filter chains; it keeps a reference to the old "top" filters in its own "next" variables and declares its functions to be the new "top" filters. (Thus, the last filter to be installed is the first to be executed.)

Side note: how does this work exactly?

Each filter either returns an error code or uses this as the return statement:

return ngx_http_next_body_filter();

Thus, if the filter chain reaches the (specially-defined) end of the chain, an "OK" response is returned, but if there's an error along the way, the chain is cut short and Nginx serves up the appropriate error message. It's a singly-linked list with fast failures implemented solely with function references. Brilliant.

5. Load-Balancers

A load-balancer is just a way to decide which backend server will receive a particular request; implementations exist for distributing requests in round-robin fashion or hashing some information about the request. This section will describe both a load-balancer's installation and its invocation, using the upstream_hash module (full source) as an example. upstream_hash chooses a backend by hashing a variable specified in nginx.conf.

A load-balancing module has six pieces:

  1. The enabling configuration directive (e.g, hash;) will call a registration function
  2. The registration function will define the legal server options (e.g., weight=) and register an upstream initialization function
  3. The upstream initialization function is called just after the configuration is validated, and it:
  1. the peer initialization function, called once per request, sets up data structures that the load-balancing function will access and manipulate;
  2. the load-balancing function decides where to route requests; it is called at least once per client request (more, if a backend request fails). This is where the interesting stuff happens.
  3. and finally, the peer release function can update statistics after communication with a particular backend server has finished (whether successfully or not)

It's a lot, but I'll break it down into pieces.

5.1. The enabling directive

Directive declarations, recall, specify both where they're valid and a function to call when they're encountered. A directive for a load-balancer should have the NGX_HTTP_UPS_CONF flag set, so that Nginx knows this directive is only valid inside an upstream block. It should provide a pointer to a registration function. Here's the directive declaration from the upstream_hash module:

    { ngx_string("hash"),

      NGX_HTTP_UPS_CONF|NGX_CONF_NOARGS,

      ngx_http_upstream_hash,

      0,

      0,

      NULL },

Nothing new there.

5.2. The registration function

The callback ngx_http_upstream_hash above is the registration function, so named (by me) because it registers an upstream initialization function with the surrounding upstream configuration. In addition, the registration function defines which options to the server directive are legal inside this particular upstream block (e.g., weight=, fail_timeout=). Here's the registration function of the upstream_hash module:

ngx_http_upstream_hash(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)

 {

    ngx_http_upstream_srv_conf_t  *uscf;

    ngx_http_script_compile_t      sc;

    ngx_str_t                     *value;

    ngx_array_t                   *vars_lengths, *vars_values;

    value = cf->args->elts;

    /* the following is necessary to evaluate the argument to "hash" as a $variable */

    ngx_memzero(&sc, sizeof(ngx_http_script_compile_t));

    vars_lengths = NULL;

    vars_values = NULL;

    sc.cf = cf;

    sc.source = &value[1];

    sc.lengths = &vars_lengths;

    sc.values = &vars_values;

    sc.complete_lengths = 1;

    sc.complete_values = 1;

    if (ngx_http_script_compile(&sc) != NGX_OK) {

        return NGX_CONF_ERROR;

    }

    /* end of $variable stuff */

    uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);

    /* the upstream initialization function */

    uscf->peer.init_upstream = ngx_http_upstream_init_hash;

    uscf->flags = NGX_HTTP_UPSTREAM_CREATE;

    /* OK, more $variable stuff */

    uscf->values = vars_values->elts;

    uscf->lengths = vars_lengths->elts;

    /* set a default value for "hash_method" */

    if (uscf->hash_function == NULL) {

        uscf->hash_function = ngx_hash_key;

    }

    return NGX_CONF_OK;

 }

Aside from jumping through hoops so we can evaluation $variable later, it's pretty straightforward; assign a callback, set some flags. What flags are available?

Each module will have access to these configuration values. It's up to the module to decide what to do with them. That is, max_fails will not be automatically enforced; all the failure logic is up to the module author. More on that later. For now, we still haven't finished followed the trail of callbacks. Next up, we have the upstream initialization function (the init_upstream callback in the previous function).

5.3. The upstream initialization function

The purpose of the upstream initialization function is to resolve the host names, allocate space for sockets, and assign (yet another) callback. Here's how upstream_hash does it:

ngx_int_t

ngx_http_upstream_init_hash(ngx_conf_t *cf, ngx_http_upstream_srv_conf_t *us)

{

    ngx_uint_t                       i, j, n;

    ngx_http_upstream_server_t      *server;

    ngx_http_upstream_hash_peers_t  *peers;

    /* set the callback */

    us->peer.init = ngx_http_upstream_init_upstream_hash_peer;

    if (!us->servers) {

        return NGX_ERROR;

    }

    server = us->servers->elts;

    /* figure out how many IP addresses are in this upstream block. */

    /* remember a domain name can resolve to multiple IP addresses. */

    for (n = 0, i = 0; i < us->servers->nelts; i++) {

        n += server[i].naddrs;

    }

    /* allocate space for sockets, etc */

    peers = ngx_pcalloc(cf->pool, sizeof(ngx_http_upstream_hash_peers_t)

            + sizeof(ngx_peer_addr_t) * (n - 1));

    if (peers == NULL) {

        return NGX_ERROR;

    }

    peers->number = n;

    /* one port/IP address per peer */

    for (n = 0, i = 0; i < us->servers->nelts; i++) {

        for (j = 0; j < server[i].naddrs; j++, n++) {

            peers->peer[n].sockaddr = server[i].addrs[j].sockaddr;

            peers->peer[n].socklen = server[i].addrs[j].socklen;

            peers->peer[n].name = server[i].addrs[j].name;

        }

    }

    /* save a pointer to our peers for later */

    us->peer.data = peers;

    return NGX_OK;

}

This function is a bit more involved than one might hope. Most of the work seems like it should be abstracted, but it's not, so that's what we live with. One strategy for simplifying things is to call the upstream initialization function of another module, have it do all the dirty work (peer allocation, etc), and then override the us->peer.init callback afterwards. For an example, see http/modules/ngx_http_upstream_ip_hash_module.c.

The important bit from our point of view is setting a pointer to the peer initialization function, in this case ngx_http_upstream_init_upstream_hash_peer.

5.4. The peer initialization function

The peer initialization function is called once per request. It sets up a data structure that the module will use as it tries to find an appropriate backend server to service that request; this structure is persistent across backend re-tries, so it's a convenient place to keep track of the number of connection failures, or a computed hash value. By convention, this struct is called ngx_http_upstream_<module name>_peer_data_t.

In addition, the peer initalization function sets up two callbacks:

As if that weren't enough, it also initalizes a variable called tries. As long as tries is positive, nginx will keep retrying this load-balancer. When tries is zero, nginx will give up. It's up to the get and free functions to set tries appropriately.

Here's a peer initialization function from the upstream_hash module:

static ngx_int_t

ngx_http_upstream_init_hash_peer(ngx_http_request_t *r,

    ngx_http_upstream_srv_conf_t *us)

{

    ngx_http_upstream_hash_peer_data_t     *uhpd;

   

    ngx_str_t val;

    /* evaluate the argument to "hash" */

    if (ngx_http_script_run(r, &val, us->lengths, 0, us->values) == NULL) {

        return NGX_ERROR;

    }

    /* data persistent through the request */

    uhpd = ngx_pcalloc(r->pool, sizeof(ngx_http_upstream_hash_peer_data_t)

            + sizeof(uintptr_t)

              * ((ngx_http_upstream_hash_peers_t *)us->peer.data)->number

                  / (8 * sizeof(uintptr_t)));

    if (uhpd == NULL) {

        return NGX_ERROR;

    }

    /* save our struct for later */

    r->upstream->peer.data = uhpd;

    uhpd->peers = us->peer.data;

    /* set the callbacks and initialize "tries" to "hash_again" + 1*/

    r->upstream->peer.free = ngx_http_upstream_free_hash_peer;

    r->upstream->peer.get = ngx_http_upstream_get_hash_peer;

    r->upstream->peer.tries = us->retries + 1;

    /* do the hash and save the result */

    uhpd->hash = us->hash_function(val.data, val.len);

    return NGX_OK;

}

That wasn't so bad. Now we're ready to pick an upstream server.

5.5. The load-balancing function

It's time for the main course. The real meat and potatoes. This is where the module picks an upstream. The load-balancing function's prototype looks like:

static ngx_int_t

ngx_http_upstream_get_<module_name>_peer(ngx_peer_connection_t *pc, void *data);

data is our struct of useful information concerning this client connection. pc will have information about the server we're going to connect to. The job of the load-balancing function is to fill in values for pc->sockaddr, pc->socklen, and pc->name. If you know some network programming, then those variable names might be familiar; but they're actually not very important to the task at hand. We don't care what they stand for; we just want to know where to find appropriate values to fill them.

This function must find a list of available servers, choose one, and assign its values to pc. Let's look at how upstream_hash does it.

upstream_hash previously stashed the server list into the ngx_http_upstream_hash_peer_data_t struct back in the call to ngx_http_upstream_init_hash (above). This struct is now available as data:

    ngx_http_upstream_hash_peer_data_t *uhpd = data;

The list of peers is now stored in uhpd->peers->peer. Let's pick a peer from this array by dividing the computed hash value by the number of servers:

    ngx_peer_addr_t *peer = &uhpd->peers->peer[uhpd->hash % uhpd->peers->number];

Now for the grand finale:

    pc->sockaddr = peers->sockaddr;

    pc->socklen  = peers->socklen;

    pc->name     = &peers->name;

    return NGX_OK;

That's all! If the load-balancer returns NGX_OK, it means, "go ahead and try this server". If it returns NGX_BUSY, it means all the backend hosts are unavailable, and Nginx should try again.

But... how do we keep track of what's unavailable? And what if we don't want it to try again?

5.6. The peer release function

The peer release function operates after an upstream connection takes place; its purpose is to track failures. Here is its function prototype:

void

ngx_http_upstream_free_<module name>_peer(ngx_peer_connection_t *pc, void *data,

    ngx_uint_t state);

The first two parameters are just the same as we saw in the load-balancer function. The third parameter is a state variable, which indicates whether the connection was successful. It may contain two values bitwise OR'd together: NGX_PEER_FAILED (the connection failed) and NGX_PEER_NEXT (either the connection failed, or it succeeded but the application returned an error). Zero means the connection succeeded.

It's up to the module author to decide what to do about these failure events. If they are to be used at all, the results should be stored in data, a pointer to the custom per-request data struct.

But the crucial purpose of the peer release function is to set pc->tries to zero if you don't want Nginx to keep trying this load-balancer during this request. The simplest peer release function would look like this:

    pc->tries = 0;

That would ensure that if there's ever an error reaching a backend server, a 502 Bad Proxy error will be returned to the client.

Here's a more complicated example, taken from the upstream_hash module. If a backend connection fails, it marks it as failed in a bit-vector (called tried, an array of type uintptr_t), then keeps choosing a new backend until it finds one that has not failed.

#define ngx_bitvector_index(index) index / (8 * sizeof(uintptr_t))

#define ngx_bitvector_bit(index) (uintptr_t) 1 << index % (8 * sizeof(uintptr_t))

static void

ngx_http_upstream_free_hash_peer(ngx_peer_connection_t *pc, void *data,

    ngx_uint_t state)

{

    ngx_http_upstream_hash_peer_data_t  *uhpd = data;

    ngx_uint_t                           current;

    if (state & NGX_PEER_FAILED

            && --pc->tries)

    {

        /* the backend that failed */

        current = uhpd->hash % uhpd->peers->number;

       /* mark it in the bit-vector */

        uhpd->tried[ngx_bitvector_index(current)] |= ngx_bitvector_bit(current);

        do { /* rehash until we're out of retries or we find one that hasn't been tried */

            uhpd->hash = ngx_hash_key((u_char *)&uhpd->hash, sizeof(ngx_uint_t));

            current = uhpd->hash % uhpd->peers->number;

        } while ((uhpd->tried[ngx_bitvector_index(current)] & ngx_bitvector_bit(current)) && --pc->tries);

    }

}

This works because the load-balancing function will just look at the new value of uhpd->hash.

Many applications won't need retry or high-availability logic, but it's possible to provide it with just a few lines of code like you see here.

6. Writing and Compiling a New Nginx Module

So by now, you should be prepared to look at an Nginx module and try to understand what's going on (and you'll know where to look for help). Take a look in src/http/modules/ to see the available modules. Pick a module that's similar to what you're trying to accomplish and look through it. Stuff look familiar? It should. Refer between this guide and the module source to get an understanding about what's going on.

But Emiller didn't write a Balls-In Guide to Reading Nginx Modules. Hell no. This is a Balls-Out Guide. We're not reading. We're writing. Creating. Sharing with the world.

First thing, you're going to need a place to work on your module. Make a folder for your module anywhere on your hard drive, but separate from the Nginx source (and make sure you have the latest copy from nginx.net). Your new folder should contain two files to start with:

The "config" file will be included by ./configure, and its contents will depend on the type of module.

"config" for filter modules:

ngx_addon_name=ngx_http_<your module>_module

HTTP_AUX_FILTER_MODULES="$HTTP_AUX_FILTER_MODULES ngx_http_<your module>_module"

NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_<your module>_module.c"

"config" for other modules:

ngx_addon_name=ngx_http_<your module>_module

HTTP_MODULES="$HTTP_MODULES ngx_http_<your module>_module"

NGX_ADDON_SRCS="$NGX_ADDON_SRCS $ngx_addon_dir/ngx_http_<your module>_module.c"

Now for your C file. I recommend copying an existing module that does something similar to what you want, but rename it "ngx_http_<your module>_module.c". Let this be your model as you change the behavior to suit your needs, and refer to this guide as you understand and refashion the different pieces.

When you're ready to compile, just go into the Nginx directory and type

./configure --add-module=path/to/your/new/module/directory

and then make and make install like you normally would. If all goes well, your module will be compiled right in. Nice, huh? No need to muck with the Nginx source, and adding your module to new versions of Nginx is a snap, just use that same ./configure command. By the way, if your module needs any dynamically linked libraries, you can add this to your "config" file:

CORE_LIBS="$CORE_LIBS -lfoo"

Where foo is the library you need. If you make a cool or useful module, be sure to send a note to the Nginx mailing list and share your work.

7. Advanced Topics

This guide covers the basics of Nginx module development. For tips on writing more sophisticated modules, be sure to check out Emiller's Advanced Topics In Nginx Module Development.

Appendix A: Code References

Nginx source tree (cross-referenced)

Nginx module directory (cross-referenced)

Example addon: circle_gif

Example addon: upstream_hash

Example addon: upstream_fair

Appendix B: Changelog

Back to Evan Miller's Homepage