You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`onopentag(name, attribs, isImplied)`| Opening tag. `attribs` is an object mapping attribute names to values. `isImplied` is `true` when the tag was opened implicitly (HTML mode only). |
93
+
|`onopentagname(name)`| Emitted for the tag name as soon as it is available (before attributes are parsed). |
94
+
|`onattribute(name, value, quote)`| Attribute. `quote` is `"` / `'` / `null` (unquoted) / `undefined` (no value, e.g. `disabled`). |
95
+
|`onclosetag(name, isImplied)`| Closing tag. `isImplied` is `true` when the tag was closed implicitly (HTML mode only). |
96
+
|`ontext(data)`| Text content. May fire multiple times for a single text node. |
97
+
|`oncomment(data)`| Comment (content between `<!--` and `-->`). |
98
+
|`oncdatastart()`| Opening of a CDATA section (`<![CDATA[`). |
99
+
|`oncdataend()`| End of a CDATA section (`]]>`). |
|`xmlMode`|`boolean`|`false`| Treat the document as XML. This affects entity decoding, self-closing tags, CDATA handling, and more. Set this to `true` for XML, RSS, Atom and RDF feeds. |
112
+
|`decodeEntities`|`boolean`|`true`| Decode HTML entities (e.g. `&` -> `&`). |
113
+
|`lowerCaseTags`|`boolean`|`!xmlMode`| Lowercase tag names. |
The `DomHandler` produces a DOM (document object model) that can be manipulated using the [`DomUtils`](https://github.com/fb55/DomUtils) helper.
138
+
The `parseDocument` helper parses a string and returns a DOM tree (a [`Document`](https://github.com/fb55/domhandler) node).
110
139
111
140
```js
112
141
import*ashtmlparser2from"htmlparser2";
113
142
114
-
constdom=htmlparser2.parseDocument(htmlString);
143
+
constdom=htmlparser2.parseDocument(
144
+
`<ul id="fruits">
145
+
<li class="apple">Apple</li>
146
+
<li class="orange">Orange</li>
147
+
</ul>`,
148
+
);
149
+
```
150
+
151
+
`parseDocument` accepts an optional second argument with both parser and [DOM handler options](https://github.com/fb55/domhandler):
152
+
153
+
```js
154
+
constdom=htmlparser2.parseDocument(data, {
155
+
// Parser options
156
+
xmlMode:true,
157
+
158
+
// domhandler options
159
+
withStartIndices:true, // Add `startIndex` to each node
160
+
withEndIndices:true, // Add `endIndex` to each node
161
+
});
162
+
```
163
+
164
+
### Searching the DOM
165
+
166
+
The [`DomUtils`](https://github.com/fb55/domutils) module (re-exported on the main `htmlparser2` export) provides helpers for finding nodes:
For CSS selector queries, use [`css-select`](https://github.com/fb55/css-select):
188
+
189
+
```js
190
+
import { selectAll, selectOne } from"css-select";
191
+
192
+
constresults=selectAll("ul#fruits > li", dom);
193
+
constfirst=selectOne("li.apple", dom);
115
194
```
116
195
117
-
The `DomHandler`, while still bundled with this module, was moved to its [own module](https://github.com/fb55/domhandler).
118
-
Have a look at that for further information.
196
+
Or, if you'd prefer a jQuery-like API, use [`cheerio`](https://github.com/cheeriojs/cheerio).
197
+
198
+
### Modifying and serializing the DOM
119
199
120
-
## Parsing Feeds
200
+
Use `DomUtils` to modify the tree, and [`dom-serializer`](https://github.com/cheeriojs/dom-serializer) (also available as `DomUtils.getOuterHTML`) to serialize it back to HTML:
Other manipulation helpers include `appendChild`, `prependChild`, `append`, `prepend`, and `replaceElement` -- see the [`domutils` docs](https://github.com/fb55/domutils) for the full API.
219
+
220
+
## Parsing feeds
121
221
122
222
`htmlparser2` makes it easy to parse RSS, RDF and Atom feeds, by providing a `parseFeed` method:
This returns an object with `type`, `title`, `link`, `description`, `updated`, `author`, and `items` (an array of feed entries), or `null` if the document isn't a recognized feed format.
229
+
230
+
The `xmlMode` option is enabled by default for `parseFeed`. If you pass custom options, make sure to include `xmlMode:true`.
231
+
128
232
## Performance
129
233
130
234
After having some artificial benchmarks for some time, **@AndreasMadsen** published his [`htmlparser-benchmark`](https://github.com/AndreasMadsen/htmlparser-benchmark), which benchmarks HTML parses based on real-world websites.
## How does this module differ from [node-htmlparser](https://github.com/tautologistics/node-htmlparser)?
151
-
152
-
In 2011, this module started as a fork of the `htmlparser` module.
153
-
`htmlparser2` was rewritten multiple times and, while it maintains an API that's mostly compatible with `htmlparser`, the projects don't share any code anymore.
154
-
155
-
The parser now provides a callback interface inspired by [sax.js](https://github.com/isaacs/sax-js) (originally targeted at [readabilitySAX](https://github.com/fb55/readabilitysax)).
156
-
As a result, old handlers won't work anymore.
157
-
158
-
The `DefaultHandler` was renamed to clarify its purpose (to `DomHandler`). The old name is still available when requiring `htmlparser2` and your code should work as expected.
159
-
160
-
The `RssHandler` was replaced with a `getFeed` function that takes a `DomHandler` DOM and returns a feed object. There is a `parseFeed` helper function that can be used to parse a feed from a string.
161
-
162
254
## Security contact information
163
255
164
256
To report a security vulnerability, please use the [Tidelift security contact](https://tidelift.com/security).
0 commit comments