Skip to main content
Version: 1.0.16

xml2

The xml2 module provides XPath querying and XSLT functionality.

1. Function Descriptions

Table C.34 shows the functions provided by this module. These functions provide direct XML parsing and XPath querying.

Table C.34. xml2 Functions

Function/Description
xml_valid ( document text ) → boolean Parses the given document and returns true if the document is well-formed XML.
xpath_string ( document text, query text ) → text Evaluates the XPath query on the provided document and converts the result to text.
xpath_number ( document text, query text ) → real Evaluates the XPath query on the provided document and converts the result to real.
xpath_bool ( document text, query text ) → boolean Evaluates the XPath query on the provided document and converts the result to boolean.
xpath_nodeset ( document text, query text, toptag text, itemtag text ) → text This function evaluates the query on the document and wraps the result in XML tags. If the result is multi-valued, the output looks like this: Value 1 which could be an XML fragmentValue 2.... If toptag or itemtag is an empty string, the corresponding tag is omitted.
xpath_nodeset ( document text, query text, itemtag text ) → text Similar to xpath_nodeset(document, query, toptag, itemtag) but the result omits the toptag.
xpath_nodeset ( document text, query text ) → text Similar to xpath_nodeset(document, query, toptag, itemtag) but the result omits both tags.
xpath_list ( document text, query text, separator text ) → text Evaluates the query on the document and returns multiple values separated by the specified separator, e.g., Value 1,Value2,Value 3
xpath_list ( document text, query text ) → text This is a wrapper around the above function, using , as the separator.

3. xpath_table

xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record

xpath_table is a table function that evaluates a set of XPath queries on each of a set of documents and returns the results as a table. The primary key column from the original document table is returned as the first column of the result, so the result set can be used for joins. Its parameters are described in Table C.35.

Table C.35. xpath_table Parameters

ParameterDescription
keyThe name of the "key" column — this is simply used as the first column in the output table, i.e., it identifies which record each output row comes from (see notes below about multiple values)
documentThe name of the column containing XML documents
relationThe name of the table or view containing the documents
xpathsOne or more XPath expressions, separated by
criteriaThe contents of the WHERE clause. This cannot be omitted, so if you want to process all rows in the relation, you can use true or 1=1

These parameters (except the XPath strings) are simply substituted into a plain SQL SELECT statement, so you have some flexibility — the statement is:

`SELECT , FROM WHERE

Therefore, those parameters can be anything valid at those specific positions. The result from this SELECT needs to return exactly two columns (which it will, unless you try to list multiple columns for the key or document). Note that this simple approach requires you to validate any user-supplied values to avoid SQL injection attacks.

The function must be used in a FROM expression with an AS clause to specify the output columns, for example:

SELECT * FROM

xpath_table('article_id',

'article_xml',

'articles',

'/article/author|/article/pages|/article/title',

'date_entered > ''2003-01-01'' ')

AS t(article_id integer, author text, page_count integer, title text);

The AS clause defines the names and types of the columns in the output table. The first is the "key" column and the rest correspond to the XPath queries. If there are more XPath queries than result columns, the extra queries will be ignored. If there are more result columns than XPath queries, the extra columns will be NULL.

Note: This example defines the page_count result column as an integer. The function internally handles everything as strings, so when you want an integer in the output, it will take the string representation of the XPath result and use the input function to convert it to an integer (or whatever type the AS clause specifies). If it cannot do this, an error will result — for example, if the result is empty — so you may want to stick with text as the column type if you think your data might have any issues.

The calling SELECT statement does not have to be just SELECT * — it can reference output columns by name or join them to other tables. The function produces a virtual table on which you can perform any desired operations (such as aggregation, joins, sorting, etc.). So we could also have:

SELECT t.title, p.fullname, p.email

FROM xpath_table('article_id', 'article_xml', 'articles',

'/article/title|/article/author/@id',

'xpath_string(article_xml,''/article/@date'') > ''2003-03-20''
'
)

AS t(article_id integer, title text, author_id integer),

tblPeopleInfo AS p

WHERE t.author_id = p.person_id;

As a more complex example. Of course, for convenience you can also wrap all of this in a view.

3.1. Multi-valued Results

The xpath_table function assumes that the result of each XPath query may be multi-valued, so the number of rows returned by the function may differ from the number of input documents. The first returned row contains the first result from each query, the second row contains the second result from each query, and so on. If one of the queries has fewer values than the others, null values will be returned for it.

In some cases, a user will know that a given XPath query will return only a single result (perhaps a unique document identifier) — if used together with an XPath query that returns multiple values, the single-valued result will only appear in the first row of the result. The solution for this situation is to use the key column as part of a join against a simpler XPath query. An example:

test=## CREATE TABLE test (

test(## id int PRIMARY KEY,

test(## xml text

test(## );

CREATE TABLE

test=## INSERT INTO test VALUES (1, '<doc num="C1">

test'## <line num="L1"><a>1</a><b>2</b><c>3</c></line>

test'## <line num="L2"><a>11</a><b>22</b><c>33</c></line>

test'## </doc>');

INSERT 0 1

test=## INSERT INTO test VALUES (2, '<doc num="C2">

<line num="L1"><a>111</a><b>222</b><c>333</c></line>

<line num="L2"><a>111</a><b>222</b><c>333</c></line>

</doc>');

INSERT 0 1

test=## SELECT * FROM

xpath_table('id','xml','test','/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','true')

test-## AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int,val3 int)

test-## WHERE id = 1 ORDER BY doc_num, line_num ;

id | doc_num | line_num | val1 | val2 | val3

----+---------+----------+------+------+------

1 | C1 | L1 | 1 | 2 | 3

1 | | L2 | 11 | 22 | 33

(2 rows)


To get doc_num on every row, the solution is to use two calls to xpath_table and join the results:


test=## SELECT t.*,i.doc_num FROM

test-## xpath_table('id', 'xml', 'test',

test(## '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',

test(## 'true')

test-## AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int),

test-## xpath_table('id', 'xml', 'test', '/doc/@num', 'true')

test-## AS i(id int, doc_num varchar(10))

test-## WHERE i.id=t.id AND i.id=1

test-## ORDER BY doc_num, line_num;

id | line_num | val1 | val2 | val3 | doc_num

----+----------+------+------+------+---------

1 | L1 | 1 | 2 | 3 | C1

1 | L2 | 11 | 22 | 33 | C1

(2 rows)

4. XSLT Functions

If libxslt is installed, the following functions are available:

4.1. xslt_process

xslt_process(text document, text stylesheet, text paramlist) returns text

This function applies the XSL stylesheet to the document and returns the transformed result. paramlist is a list of parameter assignments used in the transformation, specified in the form a=1,b=2. Note that parameter parsing is very naive: parameter values cannot contain commas!

There is also a two-argument version of xslt_process that does not pass any parameters to the transformation.