Loader - web url
Basic Introduction
The URL Document Loader is an implementation of the Document Loader interface, used to load document content from web URLs. This component implements the Eino: Document Loader guide.
Feature Introduction
The URL Document Loader has the following features:
- Default support for HTML web content parsing
 - Customizable HTTP client configurations (e.g., custom proxies, etc.)
 - Supports custom content parsers (e.g., body, or other specific containers)
 
Usage
Component Initialization
The URL Document Loader is initialized using the NewLoader function with the main configuration parameters as follows:
import (
  "github.com/cloudwego/eino-ext/components/document/loader/url"
)
func main() {
    loader, err := url.NewLoader(ctx, &url.LoaderConfig{
        Parser:         parser,
        Client:         httpClient,
        RequestBuilder: requestBuilder,
    })
}
Explanation of configuration parameters:
Parser: Document parser, defaults to the HTML parser, which extracts the main content of the web pageClient: HTTP client which can be customized with timeout, proxy, and other configurationsRequestBuilder: Request builder used to customize request methods, headers, etc.
Loading Documents
Documents are loaded through the Load method:
docs, err := loader.Load(ctx, document.Source{
    URI: "https://example.com/document",
})
Note:
- The URI must be a valid HTTP/HTTPS URL
 - The default request method is GET
 - If other HTTP methods or custom headers are needed, configure the RequestBuilder, for example in authentication scenarios
 
Complete Usage Example
Basic Usage
package main
import (
    "context"
    
    "github.com/cloudwego/eino-ext/components/document/loader/url"
    "github.com/cloudwego/eino/components/document"
)
func main() {
    ctx := context.Background()
    
    // Initialize the loader with default configuration
    loader, err := url.NewLoader(ctx, nil)
    if (err != nil) {
        panic(err)
    }
    
    // Load documents
    docs, err := loader.Load(ctx, document.Source{
        URI: "https://example.com/article",
    })
    if (err != nil) {
        panic(err)
    }
    
    // Use document content
    for _, doc := range docs {
        println(doc.Content)
    }
}
Custom Configuration Example
package main
import (
    "context"
    "net/http"
    "time"
    
    "github.com/cloudwego/eino-ext/components/document/loader/url"
    "github.com/cloudwego/eino/components/document"
)
func main() {
    ctx := context.Background()
    
    // Custom HTTP client
    client := &http.Client{
        Timeout: 10 * time.Second,
    }
    
    // Custom request builder
    requestBuilder := func(ctx context.Context, src document.Source, opts ...document.LoaderOption) (*http.Request, error) {
        req, err := http.NewRequestWithContext(ctx, "GET", src.URI, nil)
        if err != nil {
            return nil, err
        }
        // Add custom headers
        req.Header.Add("User-Agent", "MyBot/1.0")
        return req, nil
    }
    
    // Initialize the loader
    loader, err := url.NewLoader(ctx, &url.LoaderConfig{
        Client:         client,
        RequestBuilder: requestBuilder,
    })
    if (err != nil) {
        panic(err)
    }
    
    // Load documents
    docs, err := loader.Load(ctx, document.Source{
        URI: "https://example.com/article",
    })
    if (err != nil) {
        panic(err)
    }
    
    // Use document content
    for _, doc := range docs {
        println(doc.Content)
    }
}
Related Documentation
    Last modified
    May 7, 2025
    : docs: update eino chatmodel interface (#1324) (6a1bee15cb)