xml2
The xml2 module provides XPath querying and XSLT functionality.
1. Function Descriptions
Table C.34 shows the functions provided by this module. These functions provide direct XML parsing and XPath querying.
Table C.34. xml2 Functions
| Function/Description |
|---|
| xml_valid ( document text ) → boolean Parses the given document and returns true if the document is well-formed XML. |
| xpath_string ( document text, query text ) → text Evaluates the XPath query on the provided document and converts the result to text. |
| xpath_number ( document text, query text ) → real Evaluates the XPath query on the provided document and converts the result to real. |
| xpath_bool ( document text, query text ) → boolean Evaluates the XPath query on the provided document and converts the result to boolean. |
| xpath_nodeset ( document text, query text, toptag text, itemtag text ) → text This function evaluates the query on the document and wraps the result in XML tags. If the result is multi-valued, the output looks like this: |
| xpath_nodeset ( document text, query text, itemtag text ) → text Similar to xpath_nodeset(document, query, toptag, itemtag) but the result omits the toptag. |
| xpath_nodeset ( document text, query text ) → text Similar to xpath_nodeset(document, query, toptag, itemtag) but the result omits both tags. |
| xpath_list ( document text, query text, separator text ) → text Evaluates the query on the document and returns multiple values separated by the specified separator, e.g., Value 1,Value2,Value 3 |
| xpath_list ( document text, query text ) → text This is a wrapper around the above function, using , as the separator. |
3. xpath_table
xpath_table(text key, text document, text relation, text xpaths, text criteria) returns setof record
xpath_table is a table function that evaluates a set of XPath queries on each of a set of documents and returns the results as a table. The primary key column from the original document table is returned as the first column of the result, so the result set can be used for joins. Its parameters are described in Table C.35.
Table C.35. xpath_table Parameters
| Parameter | Description |
|---|---|
| key | The name of the "key" column — this is simply used as the first column in the output table, i.e., it identifies which record each output row comes from (see notes below about multiple values) |
| document | The name of the column containing XML documents |
| relation | The name of the table or view containing the documents |
| xpaths | One or more XPath expressions, separated by |
| criteria | The contents of the WHERE clause. This cannot be omitted, so if you want to process all rows in the relation, you can use true or 1=1 |
These parameters (except the XPath strings) are simply substituted into a plain SQL SELECT statement, so you have some flexibility — the statement is:
`SELECT
Therefore, those parameters can be anything valid at those specific positions. The result from this SELECT needs to return exactly two columns (which it will, unless you try to list multiple columns for the key or document). Note that this simple approach requires you to validate any user-supplied values to avoid SQL injection attacks.
The function must be used in a FROM expression with an AS clause to specify the output columns, for example:
SELECT * FROM
xpath_table('article_id',
'article_xml',
'articles',
'/article/author|/article/pages|/article/title',
'date_entered > ''2003-01-01'' ')
AS t(article_id integer, author text, page_count integer, title text);
The AS clause defines the names and types of the columns in the output table. The first is the "key" column and the rest correspond to the XPath queries. If there are more XPath queries than result columns, the extra queries will be ignored. If there are more result columns than XPath queries, the extra columns will be NULL.
Note: This example defines the page_count result column as an integer. The function internally handles everything as strings, so when you want an integer in the output, it will take the string representation of the XPath result and use the input function to convert it to an integer (or whatever type the AS clause specifies). If it cannot do this, an error will result — for example, if the result is empty — so you may want to stick with text as the column type if you think your data might have any issues.
The calling SELECT statement does not have to be just SELECT * — it can reference output columns by name or join them to other tables. The function produces a virtual table on which you can perform any desired operations (such as aggregation, joins, sorting, etc.). So we could also have:
SELECT t.title, p.fullname, p.email
FROM xpath_table('article_id', 'article_xml', 'articles',
'/article/title|/article/author/@id',
'xpath_string(article_xml,''/article/@date'') > ''2003-03-20''
'
)
AS t(article_id integer, title text, author_id integer),
tblPeopleInfo AS p
WHERE t.author_id = p.person_id;
As a more complex example. Of course, for convenience you can also wrap all of this in a view.
3.1. Multi-valued Results
The xpath_table function assumes that the result of each XPath query may be multi-valued, so the number of rows returned by the function may differ from the number of input documents. The first returned row contains the first result from each query, the second row contains the second result from each query, and so on. If one of the queries has fewer values than the others, null values will be returned for it.
In some cases, a user will know that a given XPath query will return only a single result (perhaps a unique document identifier) — if used together with an XPath query that returns multiple values, the single-valued result will only appear in the first row of the result. The solution for this situation is to use the key column as part of a join against a simpler XPath query. An example:
test=## CREATE TABLE test (
test(## id int PRIMARY KEY,
test(## xml text
test(## );
CREATE TABLE
test=## INSERT INTO test VALUES (1, '<doc num="C1">
test'## <line num="L1"><a>1</a><b>2</b><c>3</c></line>
test'## <line num="L2"><a>11</a><b>22</b><c>33</c></line>
test'## </doc>');
INSERT 0 1
test=## INSERT INTO test VALUES (2, '<doc num="C2">
<line num="L1"><a>111</a><b>222</b><c>333</c></line>
<line num="L2"><a>111</a><b>222</b><c>333</c></line>
</doc>');
INSERT 0 1
test=## SELECT * FROM
xpath_table('id','xml','test','/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','true')
test-## AS t(id int, doc_num varchar(10), line_num varchar(10), val1 int, val2 int,val3 int)
test-## WHERE id = 1 ORDER BY doc_num, line_num ;
id | doc_num | line_num | val1 | val2 | val3
----+---------+----------+------+------+------
1 | C1 | L1 | 1 | 2 | 3
1 | | L2 | 11 | 22 | 33
(2 rows)
To get doc_num on every row, the solution is to use two calls to xpath_table and join the results:
test=## SELECT t.*,i.doc_num FROM
test-## xpath_table('id', 'xml', 'test',
test(## '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c',
test(## 'true')
test-## AS t(id int, line_num varchar(10), val1 int, val2 int, val3 int),
test-## xpath_table('id', 'xml', 'test', '/doc/@num', 'true')
test-## AS i(id int, doc_num varchar(10))
test-## WHERE i.id=t.id AND i.id=1
test-## ORDER BY doc_num, line_num;
id | line_num | val1 | val2 | val3 | doc_num
----+----------+------+------+------+---------
1 | L1 | 1 | 2 | 3 | C1
1 | L2 | 11 | 22 | 33 | C1
(2 rows)
4. XSLT Functions
If libxslt is installed, the following functions are available:
4.1. xslt_process
xslt_process(text document, text stylesheet, text paramlist) returns text
This function applies the XSL stylesheet to the document and returns the transformed result. paramlist is a list of parameter assignments used in the transformation, specified in the form a=1,b=2. Note that parameter parsing is very naive: parameter values cannot contain commas!
There is also a two-argument version of xslt_process that does not pass any parameters to the transformation.