A Node.js demonstration application showcasing full-text search capabilities using Elasticlunr.js for URL pattern matching. This project demonstrates how to efficiently match URLs against a set of rules with wildcard support.
Built in November 2018. A practical example of implementing a full-text search engine for URL classification and matching.
- ๐ Full-text search indexing using Elasticlunr.js
- ๐ URL pattern matching with wildcard support
- ๐ฏ Best match algorithm for overlapping patterns
- โก Fast lookup performance with indexed search
- ๐ Simple rule-based configuration
- ๐งช Built-in test cases for validation
graph TD
A[URL Rules] -->|Index| B[Elasticlunr Engine]
C[Input URL] -->|Search| B
B -->|Match Score| D[UrlManager]
D -->|Best Match| E[Service Name]
F[core/urlManager.js] -->|Defines| A
F -->|Provides| C
G[models/UrlManager.js] -->|Implements| D
H[index.js] -->|Initializes| F
sequenceDiagram
participant User
participant UrlManager
participant Elasticlunr
participant Index
User->>UrlManager: Load Rules
UrlManager->>Elasticlunr: Create Index
loop For Each Rule
UrlManager->>Index: Add Document
end
User->>UrlManager: findBestMatchForURL(url)
UrlManager->>Elasticlunr: search(url)
Elasticlunr->>Index: Query
Index-->>Elasticlunr: Match Results
Elasticlunr-->>UrlManager: Best Match
UrlManager-->>User: Service Name
- Node.js (v12 or higher)
- npm
- Clone the repository:
git clone https://github.com/orassayag/full-text-search.git
cd full-text-search- Navigate to the server directory:
cd server- Install dependencies:
npm installnpm startThis will run the demo with predefined URL rules and test cases, showing how the search engine matches URLs to their corresponding services.
The application includes example rules for common services:
const setOfRules = {
'www.facebook.com/connect.js': 'Facebook',
'www.google-analytics.com/*': 'Google Analytics',
'www.twitter.com/scripts/v1/index.js': 'Twitter'
};When you query a URL like www.facebook.com/connect.js, the engine returns Facebook.
full-text-search/
โโโ server/
โ โโโ core/
โ โ โโโ urlManager.js # URL rules and test cases
โ โโโ models/
โ โ โโโ UrlManager.js # URL matching logic
โ โโโ index.js # Entry point
โ โโโ package.json
โโโ README.md
-
Indexing Phase:
- URL rules are loaded and indexed using Elasticlunr
- Each rule gets a unique ID and is stored as a document
-
Search Phase:
- Input URL is queried against the index
- Search engine returns relevance scores for all matching rules
- The rule with the highest score is selected
-
Result:
- Returns the name/tag associated with the best matching rule
- Supports exact matches and wildcard patterns
To customize URL rules, edit server/core/urlManager.js:
const setOfRules = {
'your.domain.com/path': 'Your Service',
'another.domain.com/*': 'Another Service'
};- Node.js - JavaScript runtime
- Elasticlunr.js - Lightweight full-text search engine
- ESLint - Code quality and style checking
- Git - Version control
- URL classification and categorization
- Third-party script detection
- Analytics tracking identification
- Content delivery network (CDN) recognition
- API endpoint routing
Contributions to this project are released to the public under the project's open source license.
Everyone is welcome to contribute. Contributing doesn't just mean submitting pull requestsโthere are many different ways to get involved, including answering questions and reporting issues.
Please feel free to contact me with any question, comment, pull-request, issue, or any other thing you have in mind.
See CONTRIBUTING.md for details.
- Or Assayag - Initial work - orassayag
- Or Assayag orassayag@gmail.com
- GitHub: https://github.com/orassayag
- StackOverflow: https://stackoverflow.com/users/4442606/or-assayag?tab=profile
- LinkedIn: https://linkedin.com/in/orassayag
This application has an MIT license - see the LICENSE file for details.