What Is XPath? Querying XML and HTML Documents
XPath is a query language for selecting nodes in XML and HTML — like CSS selectors but more powerful. Plain explanation with practical examples for scraping and testing.
Short answer
XPath (XML Path Language) is a query language for selecting nodes in XML or HTML documents. It's like CSS selectors but more expressive — you can navigate parents, siblings, axes, and apply complex predicates. Browsers, scrapers, and test frameworks all support XPath natively.
Basic syntax
| Expression | Selects |
|---|---|
/html/body/div | Direct child path from root |
//div | Any <div> in the document |
//a[@href] | Any link with an href attribute |
//a[@class="btn"] | Links with class exactly "btn" |
//a[contains(@class,"btn")] | Links whose class contains "btn" |
//ul/li[1] | First list item under any <ul> |
//ul/li[last()] | Last list item |
//div[text()="Hello"] | Div containing exact text "Hello" |
//*[@id="main"] | Any element with id="main" |
//input[@type="email"] | Email-type inputs |
//button[contains(., "Submit")] | Button containing "Submit" anywhere in its text |
Why XPath beats CSS selectors sometimes
| CSS Selectors | XPath | |
|---|---|---|
| Match by text | No | Yes (text(), contains()) |
| Walk to parent | No (without :has) | Yes (.., parent::) |
| Sibling navigation | Limited | Full (preceding/following axes) |
| Index from end | :nth-last-child | last(), last()-1 |
| Predicates with logic | Comma OR only | and, or, not() |
| Speed in browsers | Faster | Slower |
| Readability | Cleaner | Verbose |
Where XPath shines
- Test automation: Selenium and Playwright accept XPath; selectors based on visible text are dramatically more stable than CSS class hashes
- Web scraping: "find the next sibling of this label" is one XPath expression vs many lines of JS DOM walking
- XML processing: SOAP envelopes, RSS feeds, sitemaps, configuration files — all XPath-native
- Browser DevTools: in Chrome/Firefox console,
$x("//a")evaluates an XPath; very useful for debugging
Common test selectors
// Click button labeled "Save"
//button[normalize-space()="Save"]
// Find input following a label "Email"
//label[text()="Email"]/following-sibling::input[1]
// Find row in a table by cell content
//tr[td[contains(., "Order #1234")]]
// Cell to the right of "Total:" label
//td[text()="Total:"]/following-sibling::td[1]
Versions
- XPath 1.0 (1999) — what browsers support, simple syntax
- XPath 2.0/3.0/3.1 — added regex, sequences, math; only XSLT engines (Saxon) support these
For browser/Selenium/Playwright work, you're using XPath 1.0.
JSON equivalent
XPath is XML-specific. For querying JSON, use JSONPath — same idea, JSON syntax. Try our JSONPath tester.
Related tools
Test patterns against text (XPath has limited regex; sometimes you need real regex): regex tester. Convert HTML entities encountered in scraped text: HTML encoder/decoder.
Featured Tools
Try these free tools directly in your browser — no sign-up required.
Regex Tester
Test and debug regular expressions in real time. Highlights matches, capture groups, and supports JavaScript regex flags for instant pattern validation.
JSONPath Tester
Test JSONPath expressions against JSON data instantly. Paste your JSON and a JSONPath query to see matched results highlighted in real time.
HTML Encoder / Decoder
Encode special characters to HTML entities or decode HTML entities back to plain text. Prevent XSS and display HTML code safely in web pages.