pgsql-4170

Version:

8.3.1

Bug Link:

http://postgresql.1045698.n5.nabble.com/BUG-4170-Rows-estimation-which-are-cast-from-TEXT-is-inaccurate-td2126457.html

Patch Link:

http://archives.postgresql.org/pgsql-committers/2008-05/msg00264.php

Symptom:

Rows estimation which are cast from TEXT  is inaccurate.

Failure type:

Incorrect output

Is there any log message?

No

How it is diagnosed:

We reproduced the failure.

How to reproduce:

test=# SELECT count(*) FROM test WHERE t < '2008-05-14 23:55:00';

 count

-------

 86099

(1 row)

test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14 23:55:00';

                             QUERY PLAN

--------------------------------------------------------------------

 Seq Scan on test  (cost=0.00..1727.00 rows=85721 width=12)

   Filter: (t < '2008-05-14 23:55:00'::timestamp without time zone)

(2 rows)

test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14

23:55:00'::text::timestamp;

                                 QUERY PLAN

----------------------------------------------------------------------------

 Seq Scan on test  (cost=0.00..2209.00 rows=32133 width=12)     

        <- too little number of the estimates

  Filter: (t < ('2008-05-14 23:55:00'::text)::timestamp without time zone)

(2 rows)

We can avoid this problem by setting appropriate cast-function.

test=# CREATE FUNCTION text2timestamp(text) RETURNS timestamp AS

$$

    SELECT timestamp_in(textout($1), 0, 0);

$$

LANGUAGE sql STRICT STABLE;

test=#  CREATE CAST (text AS timestamp) WITH FUNCTION text2timestamp(text)                 AS ASSIGNMENT;

test=# EXPLAIN SELECT * FROM test WHERE t < '2008-05-14

23:55:00'::text::timestamp;

                               QUERY PLAN

-------------------------------------------------------------------------

 Seq Scan on test  (cost=0.00..1968.00 rows=85721 width=12)

   Filter: (t < timestamp_in('2008-05-14 23:55:00'::cstring, 0::oid, 0))

(2 rows)

Root Cause:

eval_const_expressions() does not have handling logic for supporting const-simplification of CoerceViaIO nodes.  This improves the ability of the planner to deal with cases where the node input is a constant.

Note:

/* ----------------

 * CoerceViaIO

 *

 * CoerceViaIO represents a type coercion between two types whose textual

 * representations are compatible, implemented by invoking the source type's

 * typoutput function then the destination type's typinput function.

 * ----------------

 */

typedef struct CoerceViaIO

{

        Expr                xpr;

        Expr           *arg;                        /* input expression */

        Oid                        resulttype;                /* output type of coercion */

        /* output typmod is not stored, but is presumed -1 */

        CoercionForm coerceformat;        /* how to display this node */

        int                        location;                /* token location, or -1 if unknown */

} CoerceViaIO;

In postgresql:

CREATE TYPE name (

    INPUT = input_function,

    OUTPUT = output_function

    ...

)

The input_function converts the type's external textual representation to the internal representation used by the operators and functions defined for the type. output_function performs the reverse transformation.

pgsql/src/backend/optimizer/util/clauses.c

eval_const_expressions_mutator(...)

{

        if (IsA(node, Param))

                …

        if (IsA(node, FuncExpr))

                ...

        ...

+         if (IsA(node, CoerceViaIO))

+         {

                ...

+                 arg = (Expr *) eval_const_expressions_mutator((Node *)                                                 expr->arg, context);

                /*

                CoerceViaIO represents calling the source type's output                                 function then the result type's input function.So, try to simplify it

                 as though it were a stack of two such function calls.  First we

                 need to know what the functions are.

                 */

+                 getTypeOutputInfo(exprType((Node *) arg), &outfunc,

&outtypisvarlena);

+                 getTypeInputInfo(expr->resulttype, &infunc, &intypioparam);

+                 simple = simplify_function(...);

+                 if (simple)                /* successfully simplified output fn */

+                 {

                        ...

+                         List           *args;

+                         args = list_make3(...);

+                         simple = simplify_function(...);

+                         if (simple)        /* successfully simplified input fn */

+                                 return (Node *) simple;

+                 }

                ...

          }

        …

        if (IsA(node, NullTest))

                ...

        if (IsA(node, BooleanTest))

        …

...

        /*

         * For any node type not handled above, we recurse using

         * expression_tree_mutator...

         */

        return expression_tree_mutator(...);

}

Can ErrLog insert error message:

Yes, via general switch-default pattern