Skip to content

Commit fca7347

Browse files
authored
docs: document aggregateContent, make API ref clearer (#1328)
1 parent 79ace79 commit fca7347

File tree

5 files changed

+97
-49
lines changed

5 files changed

+97
-49
lines changed

packages/website/docs/DocSearch-v3.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,6 @@ This is useful to limit the scope of the search to one language or one version.
207207

208208
```js
209209
docsearch({
210-
// ...
211210
searchParameters: {
212211
facetFilters: ['language:en', 'version:1.0.0'],
213212
},
@@ -220,7 +219,6 @@ docsearch({
220219

221220
```jsx
222221
<DocSearch
223-
// ...
224222
searchParameters={{
225223
facetFilters: ['language:en', 'version:1.0.0'],
226224
}}
@@ -254,6 +252,6 @@ By adding this snippet to the `head` of your website, you can hint the browser t
254252
[11]: /docs/api#container
255253
[12]: /docs/api
256254
[13]: /docs/required-configuration#introduce-global-information-as-meta-tags
257-
[14]: /docs/record-extractor#with-custom-variables
255+
[14]: /docs/record-extractor#indexing-content-for-faceting
258256
[16]: https://www.algolia.com/doc/guides/managing-results/refine-results/filtering/#facetfilters
259257
[15]: https://www.algolia.com/doc/guides/managing-results/refine-results/faceting/

packages/website/docs/migrating-from-legacy.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,5 +82,5 @@ Below are the keys that can be found in the [`legacy` DocSearch configs][14] and
8282
[27]: https://github.com/micromatch/micromatch
8383
[28]: https://www.algolia.com/doc/tools/crawler/apis/configuration/actions/#parameter-param-recordextractor
8484
[29]: /docs/record-extractor
85-
[30]: /docs/record-extractor#with-custom-variables
85+
[30]: /docs/record-extractor#introduction
8686
[31]: /docs/record-extractor#pagerank

packages/website/docs/record-extractor.md

Lines changed: 83 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,9 @@ recordExtractor: ({ helpers }) => {
4141
},
4242
```
4343

44-
## Complex extractors
44+
### Manipulate the DOM with Cheerio
4545

46-
### Using the Cheerio instance (`$`)
47-
48-
We provide a [`Cheerio instance ($)`][14] for you to retrieve or remove content from the DOM:
46+
The [`Cheerio instance ($)`](https://cheerio.js.org/) allows you to manipulate the DOM:
4947

5048
```js
5149
recordExtractor: ({ $, helpers }) => {
@@ -54,7 +52,9 @@ recordExtractor: ({ $, helpers }) => {
5452

5553
return helpers.docsearch({
5654
recordProps: {
57-
lvl0: "header h1",
55+
lvl0: {
56+
selectors: "header h1",
57+
}
5858
lvl1: "article h2",
5959
lvl2: "article h3",
6060
lvl3: "article h4",
@@ -66,7 +66,7 @@ recordExtractor: ({ $, helpers }) => {
6666
},
6767
```
6868

69-
### Handling fallback DOM selectors
69+
### Provide fallback selectors
7070

7171
Fallback selectors can be useful when retrieving content that might not exist in some pages:
7272

@@ -93,18 +93,20 @@ recordExtractor: ({ $, helpers }) => {
9393
},
9494
```
9595

96-
### With custom variables
96+
### Provide raw text (`defaultValue`)
9797

98-
_These selectors also support [`defaultValue`](#with-raw-text-defaultvalue) and [fallback selectors](#with-fallback-dom-selectors)_
98+
_Only the `lvl0` and [custom variables][13] selectors support this option_
9999

100-
Custom variables are added to your Algolia records to be used as filters in the frontend (e.g. `version`, `lang`, etc.):
100+
You might want to structure your search results differently than your website, or provide a `defaultValue` to a potentially non-existent selector:
101101

102102
```js
103-
recordExtractor: ({ helpers }) => {
103+
recordExtractor: ({ $, helpers }) => {
104104
return helpers.docsearch({
105105
recordProps: {
106106
lvl0: {
107-
selectors: "header h1",
107+
// It also supports the fallback DOM selectors syntax!
108+
selectors: ".exists-probably h1",
109+
defaultValue: "myRawTextIfDoesNotExists",
108110
},
109111
lvl1: "article h2",
110112
lvl2: "article h3",
@@ -113,47 +115,30 @@ recordExtractor: ({ helpers }) => {
113115
lvl5: "article h6",
114116
content: "main p, main li",
115117
// The variables below can be used to filter your search
116-
foo: ".bar",
117118
language: {
118119
// It also supports the fallback DOM selectors syntax!
119-
selectors: ".does-not-exists",
120+
selectors: ".exists-probably .language",
120121
// Since custom variables are used for filtering, we allow sending
121122
// multiple raw values
122123
defaultValue: ["en", "en-US"],
123124
},
124-
version: {
125-
// You can send raw values without `selectors`
126-
defaultValue: ["latest", "stable"],
127-
},
128125
},
129126
});
130127
},
131128
```
132129

133-
The `version`, `lang` and `foo` attribute of these records will be :
130+
### Indexing content for faceting
134131

135-
```json
136-
foo: "valueFromBarSelector",
137-
language: ["en", "en-US"],
138-
version: ["latest", "stable"]
139-
```
140-
141-
You can now use them to [filter your search in the frontend][16]
132+
_These selectors also support [`defaultValue`](#provide-raw-text-defaultvalue) and [fallback selectors](#provide-fallback-selectors)_
142133

143-
### With raw text (`defaultValue`)
144-
145-
_Only the `lvl0` and [custom variables][13] selectors support this option_
146-
147-
You might want to structure your search results differently than your website, or provide a `defaultValue` to a potentially non-existent selector:
134+
You might want to index content that will be used as filters in your frontend (e.g. `version` or `lang`), you can defined any custom variable to the `recordProps` object to add them to your Algolia records:
148135

149136
```js
150-
recordExtractor: ({ $, helpers }) => {
137+
recordExtractor: ({ helpers }) => {
151138
return helpers.docsearch({
152139
recordProps: {
153140
lvl0: {
154-
// It also supports the fallback DOM selectors syntax!
155-
selectors: ".exists-probably h1",
156-
defaultValue: "myRawTextIfDoesNotExists",
141+
selectors: "header h1",
157142
},
158143
lvl1: "article h2",
159144
lvl2: "article h3",
@@ -162,23 +147,38 @@ recordExtractor: ({ $, helpers }) => {
162147
lvl5: "article h6",
163148
content: "main p, main li",
164149
// The variables below can be used to filter your search
150+
foo: ".bar",
165151
language: {
166152
// It also supports the fallback DOM selectors syntax!
167-
selectors: ".exists-probably .language",
153+
selectors: ".does-not-exists",
168154
// Since custom variables are used for filtering, we allow sending
169155
// multiple raw values
170156
defaultValue: ["en", "en-US"],
171157
},
158+
version: {
159+
// You can send raw values without `selectors`
160+
defaultValue: ["latest", "stable"],
161+
},
172162
},
173163
});
174164
},
175165
```
176166

177-
### Boosting search results with `pageRank`
167+
The following `version`, `lang` and `foo` attributes will be available in your records:
168+
169+
```json
170+
foo: "valueFromBarSelector",
171+
language: ["en", "en-US"],
172+
version: ["latest", "stable"]
173+
```
174+
175+
You can now use them to [filter your search in the frontend][16]
176+
177+
### Boost search results with `pageRank`
178178

179179
_[`pageRank`](#pagerank) used to be an **integer**, it is now a **string**_
180180

181-
This parameter helps to boost records built from the current `pathsToMatch`. Pages with highest [`pageRank`](#pagerank) will be returned before pages with a lower [`pageRank`](#pagerank). Note that you can pass any numeric value **as a string**, including negative values:
181+
This parameter allow you to boost records built from the current `pathsToMatch`. Pages with highest [`pageRank`](#pagerank) will be returned before pages with a lower [`pageRank`](#pagerank). Note that you can pass any numeric value **as a string**, including negative values:
182182

183183
```js
184184
{
@@ -196,7 +196,31 @@ This parameter helps to boost records built from the current `pathsToMatch`. Pag
196196
content: "article p, article li",
197197
pageRank: "30",
198198
},
199-
indexHeadings: true,
199+
});
200+
},
201+
},
202+
```
203+
204+
### Reduce the number records
205+
206+
If you encounter the `Extractors returned too many records` error when your page outputs more than 750 records, you can use the `aggregateContent` option to reduce the number of records at the `content` level.
207+
208+
```js
209+
{
210+
indexName: "YOUR_INDEX_NAME",
211+
pathsToMatch: ["https://YOUR_WEBSITE_URL/api/**"],
212+
recordExtractor: ({ $, helpers }) => {
213+
return helpers.docsearch({
214+
recordProps: {
215+
lvl0: "header h1",
216+
lvl1: "article h2",
217+
lvl2: "article h3",
218+
lvl3: "article h4",
219+
lvl4: "article h5",
220+
lvl5: "article h6",
221+
content: "article p, article li",
222+
},
223+
aggregateContent: true,
200224
});
201225
},
202226
},
@@ -227,9 +251,9 @@ type Lvl0 = {
227251

228252
> `type: string` | **optional**
229253
230-
See the [live example](#boosting-search-results-with-pagerank)
254+
See the [live example](#boost-search-results-with-pagerank)
231255

232-
### Custom variables (`[k: string]`)
256+
### Custom variables
233257

234258
> `type: string | string[] | CustomVariable` | **optional**
235259
@@ -244,7 +268,24 @@ type CustomVariable =
244268
};
245269
```
246270

247-
Contains values that can be used as [`facetFilters`][15]
271+
Custom variables are used to [`filter your search`](/docs/DocSearch-v3#filtering-your-search), you can define them in the [`recordProps`](#indexing-content-for-faceting)
272+
273+
## `helpers.docsearch` API Reference
274+
275+
### `aggregateContent`
276+
277+
> `type: boolean` | default: `true` | **optional**
278+
279+
[This options](#reduce-the-number-records) groups the Algolia records created at the `content` level of the selector into a single record for its matching heading.
280+
281+
### `indexHeadings`
282+
283+
> `type: boolean | { from: number, to: number }` | default: `true` | **optional**
284+
285+
This option tells the crawler if the `headings` (`lvlX`) should be indexed.
286+
287+
- When `false`, only records for the `content` level will be created.
288+
- When `from, to` is provided, only records for the `lvlX` to `lvlY` will be created.
248289

249290
[1]: /docs/DocSearch-v3
250291
[2]: https://github.com/algolia/docsearch/
@@ -258,7 +299,6 @@ Contains values that can be used as [`facetFilters`][15]
258299
[10]: https://www.algolia.com/doc/tools/crawler/apis/configuration/actions/#parameter-param-recordextractor-2
259300
[11]: https://www.algolia.com/doc/tools/crawler/guides/extracting-data/#extracting-records
260301
[12]: https://www.algolia.com/doc/tools/crawler/apis/configuration/actions/
261-
[13]: /docs/record-extractor#with-custom-variables
262-
[14]: https://cheerio.js.org/
302+
[13]: /docs/record-extractor#indexing-content-for-faceting
263303
[15]: https://www.algolia.com/doc/guides/managing-results/refine-results/faceting/
264304
[16]: /docs/docsearch-v3/#filtering-your-search

packages/website/docs/required-configuration.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ new Crawler({
4444
lvl6: ['article h6', 'main h6', 'h6'],
4545
content: ['article p, article li', 'main p, main li', 'p, li'],
4646
},
47+
aggregateContent: true,
4748
});
4849
},
4950
},
@@ -202,5 +203,5 @@ Any questions? [Send us an email][9].
202203
[9]: mailto:DocSearch@algolia.com
203204
[10]: /docs/DocSearch-v3#filtering-your-search
204205
[11]: /docs/templates
205-
[12]: /docs/record-extractor#complex-extractors
206+
[12]: /docs/record-extractor#introduction
206207
[13]: /docs/integrations

packages/website/docs/templates.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ new Crawler({
4848
recordProps: {
4949
lvl0: {
5050
selectors: '.navBreadcrumb h2 span',
51-
defaultValue: 'Blog',
51+
defaultValue: 'Docs',
5252
},
5353
lvl1: '.post h1',
5454
lvl2: '.post h2',
@@ -60,6 +60,7 @@ new Crawler({
6060
},
6161
},
6262
indexHeadings: true,
63+
aggregateContent: true,
6364
});
6465
},
6566
},
@@ -87,6 +88,7 @@ new Crawler({
8788
},
8889
},
8990
indexHeadings: true,
91+
aggregateContent: true,
9092
});
9193
},
9294
},
@@ -212,6 +214,7 @@ new Crawler({
212214
content: 'article p, article li, article td:last-child',
213215
},
214216
indexHeadings: true,
217+
aggregateContent: true,
215218
});
216219
},
217220
},
@@ -335,6 +338,7 @@ new Crawler({
335338
content: '.content__default p, .content__default li',
336339
},
337340
indexHeadings: true,
341+
aggregateContent: true,
338342
});
339343
},
340344
},
@@ -444,6 +448,7 @@ new Crawler({
444448
content: '.theme-default-content p, .theme-default-content li',
445449
},
446450
indexHeadings: true,
451+
aggregateContent: true,
447452
});
448453
},
449454
},
@@ -552,6 +557,7 @@ new Crawler({
552557
content: '.content p, .content li',
553558
},
554559
indexHeadings: true,
560+
aggregateContent: true,
555561
});
556562
},
557563
},
@@ -676,6 +682,7 @@ new Crawler({
676682
},
677683
},
678684
indexHeadings: { from: 2, to: 6 },
685+
aggregateContent: true,
679686
});
680687
},
681688
},
@@ -701,6 +708,7 @@ new Crawler({
701708
},
702709
},
703710
indexHeadings: { from: 2, to: 6 },
711+
aggregateContent: true,
704712
});
705713
},
706714
},
@@ -725,6 +733,7 @@ new Crawler({
725733
},
726734
},
727735
indexHeadings: { from: 2, to: 6 },
736+
aggregateContent: true,
728737
});
729738
},
730739
},

0 commit comments

Comments
 (0)