XPath/Printable version
This is the print version of XPath You won't see this message or any elements not part of the book's content when you print or preview this page. |
The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at
https://en.wikibooks.org/wiki/XPath
Basic Syntax
Basic XPath Syntax
editExpressions that start with a forward slash "/" are called absolute expressions. They start at the root of the document. All other expressions are relative to the current position within an XML document.
Expressions are created by creating a list of step expressions of the form
step[predicate]/step[predicate]/step[predicate]
You can think of the predicate as a filter or conditional expression that service like a WHERE clause in SQL.
Sample XML file
editMany of the examples use a "books" example such as the following:
http://raw.github.com/dmccreary/learn-xquery/master/data/books.xml
In general the books file has the following structure:
<books>
<book>
<title>XQuery</title>
<format>wikibook</format>
</book>
</books>
Basic XPath Expressions
editThe root document node
/
Note that the forward slash returns the document root, not the full books element.
The root node that contains all the books:
/books
All book elements:
/books/book //book
The first version is with an absolute path. The second uses a relative path - book elements at any level of the file.
Note that the first expression is faster in unindexed XML but within indexed native XML databases the second is faster.
A count of the number of books:
count(//book)
All the book titles:
//book/title
The second book in the collection:
//book[2]
The title of the second book:
//book[2]/title
The third author of the second book
//book[2]/author[3]
All books with the format "wikibook":
//book[format='wikibook']
Get a list of all the publishers
//publisher
Get a distinct list of the publishers (duplicates removed)
distinct-values(//publisher)
Books that have at least one price over 30
//book[list-price > 30]
XPath abbreviations
edit. represents the current node
.. represents the nearest parent node
@ represents the attribute delimiter
$ represents the variable delimiter
[n] represents the n-th child of the current node
ancestor::div represents the set of parent div nodes
normalize-space(firstname)="Paul" matches Paul regardless of whitespace delimiters
boolean(string($myvar) ) checks for empty strings
/ represents the absolute path of the root node
@* represents all attributes of the current node
-Return all values using a union of attributes, node names, and text values:
@*|node()|text()
-Return all of a node's siblings using a union of the preceding-sibling and following-sibling axes:
preceding-sibling::node() | following-sibling::node()
-Return the adjacent sibling of a specific type
//div/following-sibling::h3
-Check string value of current node
[. = "Matthew Bob"]
-Node identity can be checked using the count() function to see if the intersection of two node-sets of the same length equals the length of either of the node sets(or in the case of a single node set whether it is equal to 1). For example, the following query returns TRUE in this case because both nodes are the same:
count(/bk:books | /bk:books/bk:book[1]/parent::*) = 1
CSS Equivalents
CSS Equivalents
edit:disabled Equivalent
//*[@disabled]
represented :disabled
:checked Equivalent
//*[@checked]
represents :checked
:selected Equivalent
//*[@selected]
represents :selected
:text Equivalent
//*[@type="text"]
represents :text
:contains Equivalent
//*[contains(text(),"you")]
represents :contains("you")
:only-of-type query
//p[contains(@me,"you")]
represents p[me*="you"]
Starts with Equivalent
//p[starts-with(@me,"you")]
represents p[me^="you"]
Contains Equivalent
//p[starts-with(@me,concat("you",'-'))]
represents p[me|="you"]
Ends with Equivalent
//p[substring(@me,string-length(@me)-2)="you"]
represents p[me$="you"]
Like Equivalent
//p[contains(concat(" ",@me, " ")," you ")]
represents p[me~="you"]
Negation Attribute Equivalent
//p[@me!="you"]
represents p[me!="you"]
ID Equivalent
//p[@id="me"]
represents p#me
:not Equivalent
//p[not(@id="me")]
represents p:not(#me)
Class Equivalent
//p[contains(concat(" ", @class, " "), " me ")]
represents p.me
Descendant Equivalent
//div//p
represents div p
Child Equivalent
//div/p
represents div > p
Adjacent Sibling Equivalent
//h1/following-sibling::div
represents h1 + div
General Sibling Equivalent
//h1/following-sibling::*[count(div)]
represents h1 ~ div
CSS3 / jQuery Equivalents
edit:nth-last-child(n) query
//*[count(child::node() ) > 0]
represents hasChildNodes
:root query
/*[1]
represents :root
:first-child query
descendant::*[1]
represents :first-child
:last-child query
//*[last()]
represents :last-child
:nth-last-child(n) query
//*[count(*)=1]
represents :only-child
:empty query
//*[count(*) = 0]
represents :empty
:nth-child(n) query
//*[position() mod n = 1]
represents :nth-child(n)
:nth-child(odd) query
//*[(position() mod 2)=1]
represents :nth-child(odd)
:nth-child(even) query
//*[(position() mod 2)=0]
represents :nth-child(even)
:nth-last-child(n) query
//*[(count() - position()) mod n = 1]
represents :nth-last-child(n)
:nth-of-type(n) query
//p[n]
represents :nth-of-type(n)
:nth-last-of-type(n) query
//p[(count() - position()) mod n = 1]
represents :nth-last-of-type(n)
:first-of-type query
descendant::p[1]
represents :first-of-type
:last-of-type query
//p[last()]
represents :last-of-type
:only-of-type query
//p[count(*)=1]
represents :only-of-type
-moz-any/-webkit-any query
[local-name()='h1' or local-name()='h4']/node()
represents -moz-any(h1,h4) *
DOM Equivalents
edit//mytag
represents getElementsByTagName
//*[@class=$myclass]
represents getElementsByClassName
//*
represents childNodes
preceding-sibling::*[1]
represents previousSibling
following-sibling::*[1]
represents nextSibling
./following-sibling::*
represents generalSibling
Conditional Logic Equivalents
edit-A predicate is like a SQL where clause
-A pipe is like a SQL union clause
-An axis is like a SQL t1.col = t2.col
-Use a predicate with a boolean variable check as an if statement
-Use a pipe with a tagname search as a range checker
//h2 | //h3 | //h4
-Use a pipe with a negation predicate variable check as an else statement
//var[1] | //var[not //var]
-Use a repeating axis to skip levels in the tree to retrieve nodes at every other branch
child::*/child::*
-Use variables to store individual checks within complex and/or conditional tests
-Use variables within loops to store iteration dependent variables and separate the logic from the output
-Use string-length to test the existence of functions
-Use separate tests like (a and c) or (b and c) instead of nested conditions like ((a or b) and c)
-Use string(.), local-name(.),string-length(concat(., '') ), number(.), and boolean(.)[boolean(.)] to test node values, names and existence
-Use //node()[local-name(.) = $myvar] to test for the existence of form values
-Use //node()[local-name(.) = $myvar][boolean(node())] to skip empty form values
-Always skip empty nodes when debugging
-Use local-name(.)[boolean(.)] to test for empty tags in the context node
-Use boolean(@*[not aa or not bb]) to filter known attributes
-Use {boolean((string$myvar))} to test interpolated variables
-Use boolean(following::*[1] or following::. or following::self::*) to test closing tag failures
-Use count(//*[1 | last() = 1]) to count nodes with only one child
-Use (table1 | table2)[col=val]/* to do a join
-Use *[not(@*)] to return nodes that have no attributes
-Use number(.) - number(.) to suppress number values
-Use substring(., 0, string-length(.) ) to suppress string values
-Use substring('0', 1, not($myvar) ) to set an undefined variable to zero
-Use normalize-whitespace($myvar) to remove tab characters in lists
-Use translate($myvar, $ABCvar, $abcvar) with variables storing A-Z and a-z to ignore case for node queries
-Use string-length instead of contains when checking for list position to make logic data independent
-Use use-attribute-sets to share boilerplate arguments with multiple elements, such as table or list rows
-Use * or self::* whenever selecting to return a nodeset or single node
-Use string(.) to get a node from an XML file called with the document function
-Use different delimiters between each list item to make substring-before/after logic more readable
-Use string concatenation of node/class names with counter/node values to generate ID attributes
-Use a variable to store the previous index value with a comma to make substring-after work with recursion
-Use a predicate with the boolean value of the node as a guard operator
smyvar[$myvar]
-Use XSL nodes to store local values and test XML nodes in predicates with a pipe to emulate a default operator $dynamic[$var] | $default[not($var)]
-Use the child axis instead of slash for grouping child nodes
child::(boy | girl)
-Avoid (*) because it walks the tree before testing child nodes
-Use node() instead of . when searching all elements using //
-Use a predicate check of the generate-id of the node itself and a node variable to do intersect and except set operations
-Use concat with a node and a dummy param to check for node existence
-Use string(number(.))=NaN to check existence of numeric node values
-Use not($a=$b) instead of $a !=$b when comparing variables that contain more than one node
-Use newlines after parentheses to avoid leaving one open ended
-Use predicates to check if form field names exist in the XML doc
-Use qname to get the namespace binding of a tag
-Use id() to get generate-id values instead of variable interpolation
SQL Equivalents
SQL Equivalents
edit-XPath cannot do join-like queries, but can do union, intersection, subset, and difference like SQL
-XPath supports set operations like SQL using variations of the Union operation and the count function:
a UNION b: $a | $b
b UNION c: $b | $c
a INTERSECTION b: $a[count(.|$b) = count($b)]
a INTERSECTION c: $a[count(.|$c) = count($c)]
(Intersection takes the union of $b with any node in $a and returns the set of nodes in $a that are also in $b)
a DIFFERENCE b: $a[count(.|$b) != count($b)] | $b[count(.|$a) != count($a)]
a DIFFERENCE c: $a[count(.|$c) != count($c)] | $c[count(.|$a) != count($a)]
(Difference takes the union of the differences of $a with $b or $c and returns the set of nodes unique to $a versus $b or $c)
a SYM DIFFERENCE b: $a[count(. | $b) != count($b)] | $b[count(. | $a) != count($a)]
(Symmetrical difference takes the union of the differences from both sides and returns the set of nodes unique to both $a and $b)
a SUBSET OF b: count($b | $a) = count($b) and count($b) > count($a)
b SUBSET OF a: count($b | $a) = count($a) and count($a) > count($b)
(Subset means that the union of $a with $b returns the same set of nodes and either $a or $b is larger)
XPath can be embedded in an xpointer to make a smart url:
http://www.abcpub.co.uk/sitemap.xml#xpointer(//url)
References
References
editThings to Know and Avoid When Querying XML Documents with XPath
Executing XPath Queries with Namespaces in the URL
Are multiple XPath Predicates the same as the Boolean "and" Operator
XPath Functions and Numeric Operators
Cool things you can do with XPath in XForms
Practical data binding: XPath as data binding tool, Part 1
Practical data binding: XPath as data binding tool, Part 2
Working XML: Get started with XPath 2.0
Avoid the dangers of XPath injection
Introduction to using XPath in JavaScript
XPath - What is an XmlNode, and what does node() return?
Deep XML Geekery: XPath and not()
XPath Powers: Calculating Totals
Powerful Declarative Logic: Phone Number Parsing
Enforcing unique values in a repeating list
Use XPath to Perform a Case-Insensitive Search with MSXML
Using XPath with PHP to Scrape Web Pages