Skip to main content
Version: 1.0.16

seg

This module implements a data type seg for representing line segments or floating-point intervals. seg can represent uncertainty in interval endpoints, making it particularly useful for expressing laboratory measurements.

This module is considered "trusted", that is, it can be installed by non-superusers who have CREATE privilege on the current database.

1. Rationale

The geometric structure of a measurement is often more complex than a single point in a numeric continuum. A measurement is typically a segment of a continuum with somewhat fuzzy boundaries. Measurements take the form of intervals due to uncertainty and randomness, and also because the measured value may inherently be an interval indicating some condition (such as the stable-state temperature range of a protein).

Using common sense, we know that storing such data as intervals is more convenient than storing them as pairs of numbers. In practice, this is also more efficient in most applications.

Also based on common sense, the fuzziness of boundaries means that using traditional numeric data types will lead to information loss. Consider: your instrument reads 6.50, and you enter this reading into the database. What do you get back when you retrieve it? Take a look:

test=## select 6.50 :: float8 as "pH";

pH

---

6.5

(1 row)

In the world of measurement, 6.50 and 6.5 are not the same. Sometimes they can be very different. Experimenters typically write down (and publish) the digits they trust. 6.50 is actually a fuzzy interval contained within a larger and even fuzzier interval 6.5, and their center points (probably) are the only characteristic they share. We absolutely do not want such different data items to appear the same.

The conclusion? A special data type that can record the boundaries of intervals with arbitrarily variable precision would be ideal. In this sense, each data element records its own precision.

Consider this:

test=## select '6.25 .. 6.50'::seg as "pH";

pH

------------

6.25 .. 6.50

(1 row)

2. Syntax

The external representation of an interval consists of one or two floating-point numbers connected by a range operator. Alternatively, it can be specified as a center point plus or minus a deviation. Optional certainty indicators (<, >, or ~) can also be stored. However, all built-in operators ignore certainty indicators. Table C.26 shows all allowed expression forms, and Table C.27 shows some examples.

In Table C.26, x, y, and delta represent floating-point numbers. x and y can be preceded by a certainty indicator, but delta cannot.

Table C.26. seg External Representation

xSingle value (zero-length interval)
x .. yInterval from x to y
x (+-) deltaInterval from x - delta to x + delta
x ..Open interval with lower bound x
.. xOpen interval with upper bound x

Table C.27. Examples of Valid seg Input

5.0Creates a zero-length segment (a point)
~5.0Creates a zero-length segment and records ~ in the data. ~ is ignored by seg operators but preserved as a comment.
<5.0Creates a point at 5.0. < is ignored but preserved as a comment.
>5.0Creates a point at 5.0. > is ignored but preserved as a comment.
5(+-)0.3Creates an interval 4.7 .. 5.3. Note that the (+-) notation is not preserved.
50 ..Everything greater than or equal to 50
.. 0Everything less than or equal to 0
1.5e-2 .. 2E-2Creates an interval 0.015 .. 0.02
1 ... 2Same as 1...2, 1 .. 2, or 1..2 (whitespace around range operators is ignored)

Since the ... operator is widely used in data sources, it is allowed as an alternative to the .. operator. Unfortunately, this introduces parsing ambiguity: it is unclear whether the upper bound of 0...23 is 23 or 0.23. This is resolved by requiring all numbers in seg input to have at least one digit before the decimal point.

As a sanity check, seg will reject intervals where the lower bound is greater than the upper bound, such as 5 .. 2.

3. Precision

seg values are internally stored as a pair of 32-bit floating-point numbers. This means that numbers with more than 7 significant digits will be truncated.

Numbers with 7 or fewer significant digits retain their original precision. That is, if your query returns 0.00, you can be confident that the trailing zeros are not artificial — they reflect the precision of the original data. The number of leading zeros does not affect precision: the value 0.0067 is considered to have only 2 significant digits.

4. Usage

The seg module includes a GiST index operator class for seg values. The operators supported by the GiST operator class are shown in Table C.28.

Table C.28. Seg GiST Operators

Operator/Brief
seg << seg → boolean Is the first seg entirely to the left of the second? [a, b] << [c, d] is true if b < c.
seg >> seg → boolean Is the first seg entirely to the right of the second? [a, b] >> [c, d] is true if a > d.
seg &< seg → boolean Does the first seg not extend to the right of the second? [a, b] &< [c, d] is true if b <= d.
seg &> seg → boolean Does the first seg not extend to the left of the second? [a, b] &> [c, d] is true if a >= c.
seg = seg → boolean Are the two segs equal?
seg && seg → boolean Do the two segs overlap?
seg @> seg → boolean Does the first seg contain the second?
seg <@ seg → boolean Is the first seg contained by the second?

In addition to the above operators, the usual comparison operators shown in Table 9.1 are also available for type seg. These operators first compare (a) and (c), and if they are equal, compare (b) and (d). In most cases, this results in a reasonably good sort order, which is useful if you want to use ORDER BY with this type.

5. Notes

The mechanism for converting (+-) to a regular range is not entirely accurate in determining the number of significant digits in the boundaries. For example, if the resulting interval includes a 10, it adds an extra digit:

test=## select '10(+-)1'::seg as seg;

seg

--------------

9.0 .. 1.1e1

(1 row)

The performance of an R-tree index depends heavily on the initial ordering of input values. Sorting the input table by the seg column will be very helpful.