CrawlScraper — Self-Hosted Web Scraping Engine

// quick start

Simple API, Powerful Results

Drop into any Go project. Start scraping in minutes with a clean, expressive API.

example_single_url.go

package main

import (
    "fmt"
    "time"
    "github.com/harshitbansal184507/CrawlScraper/pkg/scraper"
)

func main() {
    s := scraper.New(scraper.DefaultConfig())
    start := time.Now()

    result, _ := s.ScrapeURL("https://example.com")
    time_taken := time.Since(start)

    fmt.Println("Time taken for scraping :", time_taken)

    if result.Status == "success" {
        fmt.Printf("Title: %s\n", result.Data.Title)
        fmt.Printf("Paragraphs: %d\n", len(result.Data.Paragraphs))
        fmt.Printf("Images: %d\n", result.Data.Images)
    }
}

example_multi_url.go

package main

import (
    "fmt"
    "time"
    "github.com/harshitbansal184507/CrawlScraper/pkg/scraper"
)

func main() {
    s := scraper.New(scraper.DefaultConfig())

    req := &scraper.ScrapeRequest{
        URLs: map[string]string{
            "example":    "https://example.com",
            "httpbin":    "https://httpbin.org/html",
            "hackernews": "https://news.ycombinator.com",
        },
    }

    start := time.Now()
    resp, err := s.Scrape(req)
    if err != nil {
        fmt.Println("err:", err)
    }
    fmt.Println("Time taken (ms) :", time.Since(start).Milliseconds())

    for key, result := range resp.Results {
        if result.Status == "success" {
            fmt.Printf("%s:\n", key)
            fmt.Printf("  Title: %s\n", result.Data.Title)
            fmt.Printf("  Content: %d chars, %d paragraphs, %d links\n",
                len(result.Data.TextContent),
                len(result.Data.Paragraphs),
                len(result.Data.Links))
            fmt.Printf("  Time: %v\n\n", result.Metadata.ResponseTime)
        } else {
            fmt.Printf("❌ %s: %s\n\n", key, result.Error.Message)
        }
    }
}

// capabilities

Everything You Need to Scrape the Web

Built on Go's concurrency model for fast, reliable data extraction at scale.

⚡

Concurrent Scraping

Leverage Go's goroutines for blazing-fast parallel scraping. Scrape hundreds of pages simultaneously.

🔧

Custom Headers

Set custom HTTP headers, user agents, cookies, and authentication for any target site or API.

🏠

Self-Hosted Interface Soon

A full visual UI for managing scraping jobs, viewing results, and scheduling tasks — running on your own server.

// installation

Up & Running in Minutes

A single Go module, dead simple to integrate.

STEP 01

Fork & Clone

Fork the repo and clone your fork locally.

git clone https://github.com/YOUR_USERNAME/CrawlScraper.git

STEP 02

Install Dependencies

Download all Go module dependencies.

go mod download

STEP 03

Run the Example

Test with the example file. Change the URL as needed.

go run examples/example_single_url.go

// contributing

Help Build CrawlScraper 🕷️

Thanks for your interest in contributing! Every bug fix, feature, and doc improvement makes a difference.

🐛 Found a Bug?

Check existing issues first. If it's new, open one with what happened, what you expected, and a code snippet to reproduce it.

✨ Feature Ideas?

Open an issue to discuss it first. Get feedback from the community before starting, then submit a PR when ready.

📝 Docs?

Fix typos, add examples, clarify confusing parts. Documentation PRs are always welcome and highly appreciated.

Commit Message Convention

fix(HTTP CLIENT): timeout handling in HTTP client

feat(Parser): Add support for custom headers

docs(Readme): Update README with installation steps

test(scraper): Add unit tests for URL validation

Web Scraping,
Self-Hosted.
No Limits.

Simple API, Powerful Results

Everything You Need to Scrape the Web

Concurrent Scraping

Custom Headers

Self-Hosted Interface Soon

Up & Running in Minutes

Fork & Clone

Install Dependencies

Run the Example

Help Build CrawlScraper 🕷️

🐛 Found a Bug?

✨ Feature Ideas?

📝 Docs?

Commit Message Convention

Ready to Crawl?

Web Scraping, Self-Hosted. No Limits.

Simple API, Powerful Results

Everything You Need to Scrape the Web

Concurrent Scraping

Custom Headers

Self-Hosted Interface Soon

Up & Running in Minutes

Fork & Clone

Install Dependencies

Run the Example

Help Build CrawlScraper 🕷️

🐛 Found a Bug?

✨ Feature Ideas?

📝 Docs?

Commit Message Convention

Ready to Crawl?

Web Scraping,
Self-Hosted.
No Limits.