> ## Documentation Index
> Fetch the complete documentation index at: https://docs.fingerprint.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Web Scraping

> Learn how to detect and prevent unauthorized web scraping

## Overview

This tutorial walks through implementing Fingerprint to prevent web scraping, where bots attempt to extract proprietary or sensitive data from your website automatically.

You'll begin with a starter app that includes a mock flight search page and a basic querying flow. From there, you'll add the JavaScript agent to identify each visitor and use server-side logic with Fingerprint data to detect and block automated scraping attempts.

By the end, you'll have a sample app that blocks bot-driven data scraping and can be customized to fit your use case and access control policies.

This tutorial uses just plain JavaScript and a Node server with SQLite on the backend. For language- or framework-specific setups, see the quickstarts.

> Estimated time: \< 15 minutes

<iframe className="w-full aspect-video rounded-md" src="https://www.youtube.com/embed/h90CSEpso1M" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen />

<Note>
  This tutorial requires the Bot Detection Smart Signal, which is only available on paid plans.
</Note>

## Prerequisites

Before you begin, make sure you have the following:

* A copy of the [starter repository](https://github.com/fingerprintjs/use-case-tutorials) (clone with Git or download as a ZIP)
* [Node.js](https://nodejs.org/) (v20 or later) and npm installed
* Your favorite code editor
* Basic knowledge of JavaScript

## 1. Create a Fingerprint account and get your API keys

1. [Sign up](https://dashboard.fingerprint.com/signup) for a free Fingerprint trial, or log in if you already have an account.
2. After signing in, go to the [**API keys**](https://dashboard.fingerprint.com/api-keys) page in the dashboard.
3. Save your **public API key**, which you'll use to initialize the JavaScript agent.
4. Create and securely store a **secret API key** for your server. Never expose it on the client side. You'll use this key on the backend to retrieve full visitor information through the Fingerprint Server API.

## 2. Set up your project

1. Clone or download the [starter repository](https://github.com/fingerprintjs/use-case-tutorials) and open it in your editor.

```bash Terminal theme={"theme":"github-dark-dimmed"}
git clone https://github.com/fingerprintjs/use-case-tutorials.git
```

2. This tutorial will be using the `web-scraping` folder. The project is organized as follows:

<Tree>
  <Tree.Folder name="public" defaultOpen>
    <Tree.File name="index.html - Flight search page" />

    <Tree.File name="index.js - Front-end logic for flight search" />
  </Tree.Folder>

  <Tree.Folder name="server" defaultOpen>
    <Tree.File name="db.js - SQLite database connection" />

    <Tree.File name="flights.js - Flight search and bot detection" />

    <Tree.File name="server.js - Serves static files and search endpoint" />
  </Tree.Folder>

  <Tree.File name=".env.example - Example environment variables" />
</Tree>

3. Install dependencies:

```bash Terminal theme={"theme":"github-dark-dimmed"}
npm install
```

4. Copy or rename `.env.example` to `.env`, then add your Fingerprint API keys:

```bash Terminal theme={"theme":"github-dark-dimmed"}
FP_PUBLIC_API_KEY=your-public-key
FP_SECRET_API_KEY=your-secret-key
```

5. Start the server:

```bash Terminal theme={"theme":"github-dark-dimmed"}
npm run dev
```

6. Visit [http://localhost:3000](http://localhost:3000) to view the mock flight search page from the starter app. Try a sample query (for example, SFO to MIA) and click **Search**.
7. Then try to search for flights using the included headless bot test script `test-bot.js`. While the app is running, execute `node test-bot.js` and observe that the automated search request returns all results. By default, the server does not distinguish between bots and real users.

```bash Terminal theme={"theme":"github-dark-dimmed"}
node test-bot.js
```

## 3. Add Fingerprint to the frontend

In this step, you'll load the JavaScript agent when the page loads and trigger identification when the user clicks **Search**. The JavaScript agent returns both a `visitor_id` and an `event_id`. Instead of relying on the `visitor_id` from the browser, you'll send the `event_id` to your server along with the search inputs. Your server will then call the [Fingerprint Events API](/reference/server-api-v4-get-event) to securely retrieve the full identification details, including bot detection and other signals.

1. At the top of `public/index.js`, load the JavaScript agent:

```javascript public/index.js theme={"theme":"github-dark-dimmed"}
const fpPromise = import(`https://fpjscdn.net/v4/${window.FP_PUBLIC_API_KEY}`).then((Fingerprint) =>
  Fingerprint.start({ region: "us" }),
);
```

2. Make sure to change `region` to match your workspace region (e.g., `eu` for Europe, `ap` for Asia, `us` for Global (default)).
3. Near the bottom of `public/index.js`, the **Search** button already has an event handler for submitting the query. Inside this handler, request visitor identification using `get()` and include the returned `event_id` when sending the search to the server:

```javascript public/index.js theme={"theme":"github-dark-dimmed"}
searchBtn.addEventListener("click", async () => {
  // ...

  const fp = await fpPromise;
  const { event_id: eventId } = await fp.get();

  try {
    const res = await fetch("/api/fetch-flights", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ from, to, departDate, passengers, eventId }),
    });
    const data = await res.json();

    // ...
  }
});
```

The `get()` method sends signals collected from the browser to Fingerprint servers, where they are analyzed to identify the visitor and determine if they are a bot. The returned `event_id` acts as a reference to this specific identification event, which your server can later use to fetch the full visitor details.

For lower latency in production, use [Sealed Client Results](/docs/sealed-client-results) to return full identification details as an encrypted payload from the `get()` method.

## 4. Receive and use the event ID to get visitor insights

Next, pass the `eventId` through to your flight search logic, initialize the [Fingerprint Node Server SDK](/reference/node-server-sdk), and fetch the full visitor identification event so you can access the Bot Detection [Smart Signal](https://fingerprint.com/products/smart-signals/).

1. In the backend, the `server/server.js` file already defines API routes for the app. The `/api/fetch-flights` route passes the request body to the `fetchFlights` function defined in `server/flights.js`. Because the frontend now sends `eventId` in the payload, that value will be available in the body when `fetchFlights` runs.

```javascript server/server.js theme={"theme":"github-dark-dimmed"}
app.post("/api/fetch-flights", async (req, reply) => {
  const result = await fetchFlights(req.body);
  return reply.send(result);
});
```

2. The `server/flights.js` file contains the logic for handling flight searches. Start by importing and initializing the Fingerprint Node Server SDK there, and load your environment variables with `dotenv`.

```javascript server/flights.js theme={"theme":"github-dark-dimmed"}
import { db } from "./db.js";
import { config } from "dotenv";
import { FingerprintServerApiClient, Region } from "@fingerprint/node-sdk";

config();

const fpServerApiClient = new FingerprintServerApiClient({
  apiKey: process.env.FP_SECRET_API_KEY,
  region: Region.Global,
});
```

3. Make sure to change `region` to match your workspace region (e.g., `EU` for Europe, `AP` for Asia, `Global` for Global (default)).
4. Update the `fetchFlights` function to accept the `eventId` and use it to fetch the full identification event details from Fingerprint:

```javascript server/flights.js theme={"theme":"github-dark-dimmed"}
export async function fetchFlights({ from, to, departDate, eventId }) {
  if (!from || !to || !departDate || !eventId) {
    console.error("Missing required fields.");
    return { success: false, message: "Missing required fields." };
  }

  const event = await fpServerApiClient.getEvent(eventId);

  // ...
}
```

Using the `eventId`, the getEvent method will retrieve the full data for the visitor identification event. The returned object will contain the visitor ID, IP address, device, and browser details, as well as Smart Signals, including bot detection, browser tampering detection, VPN detection, and more.

You can see a full example of the event structure and test it with your own device in the [demo playground](https://demo.fingerprint.com/playground).

For additional checks to ensure the validity of the data coming from your frontend, view [how to protect from client-side tampering and replay attacks](/docs/protecting-from-client-side-tampering).

## 5. Block content scraping bots

Web scraping relies on automated requests, so rejecting bots outright helps protect proprietary data. Fingerprint returns `not_detected` if no bot activity is found, `good` for known bots, like search engines, and `bad` for other automation tools. Any visitor identification that does not return `not_detected` can be blocked from retrieving flight data.

1. Continuing in the `fetchFlights` function in `server/flights.js`, check the bot signal returned in the `event` object and block bots:

```javascript server/flights.js theme={"theme":"github-dark-dimmed"}
export async function fetchFlights({ from, to, departDate, eventId }) {
  // ...

  const event = await fpServerApiClient.getEvent(eventId);

  const botDetected = event.bot !== "not_detected";

  if (botDetected) {
    console.error("Bot detected.");
    return { flights: [] };
  }

  // ...
}
```

You can also add [Suspect Score](/docs/suspect-score) as a secondary defense layer. The Suspect Score is a weighted representation of all Smart Signals present in the identification payload, helping to identify suspicious or automated activity. You likely wouldn't block searches based only on a high risk score, but you can use it to change how you respond, such as by adding a rate limit or requiring additional verification.

2. Below the bot detection check, add a condition that reads the Suspect Score from the `event` object and blocks the search if it exceeds a chosen threshold (for example, 20):

```javascript server/flights.js theme={"theme":"github-dark-dimmed"}
export async function fetchFlights({ from, to, departDate, eventId }) {
  // ...

  const botDetected = event.bot !== "not_detected";

  if (botDetected) {
    console.error("Bot detected.");
    return { flights: [] };
  }

  const suspectScore = event.suspect_score || 0;

  if (suspectScore > 20) {
    console.error(`High Suspect Score detected: ${suspectScore}`);
    return { flights: [] };
  }

  // ...
}
```

Smart Signals allow you to protect your site and prevent web scraping. You can extend this by analyzing additional signals, adjusting rate limits, varying responses for suspicious traffic, and more.

<Info>
  This is a minimal example to show how to implement Fingerprint. In a real application, make sure
  to implement proper security practices, error handling, and access controls that align with your
  production standards.
</Info>

## 7. Test your implementation

Now that everything is wired up, you can test the full protected search flow.

1. Start your server if it isn't already running and open [http://localhost:3000](http://localhost:3000):

```bash Terminal theme={"theme":"github-dark-dimmed"}
npm run dev
```

2. Try a normal search (for example, SFO to JFK). You should get matching flights returned.
3. Next, run the included headless bot test script. While the app is running, execute `node test-bot.js` and observe that the automated search requests do not return any results and the bot's access is blocked.

```bash Terminal theme={"theme":"github-dark-dimmed"}
node test-bot.js
```

*Note: If you encounter errors launching the automated browser, make sure you have the testing browser installed:*

```bash Terminal theme={"theme":"github-dark-dimmed"}
npx puppeteer browsers install chrome
```

## Next steps

You now have a working flight search flow that blocks scraping bots with Fingerprint. From here, you can expand the logic with more [Smart Signals](/docs/smart-signals-reference), fine-tune rules based on your business policies, or layer in additional defenses, such as rate limiting or dynamic content obfuscation.

To dive deeper, explore the other use case tutorials for more step-by-step examples.

Check out these related resources:

* [Node SDK Reference](/reference/node-server-sdk)
* [Vue frontend quickstart](/docs/vue-quickstart)
* [React frontend quickstart](/docs/react-quickstart)
* [API reference for the Events endpoint](/reference/server-api-v4-get-event)
* [Use case tutorial: Detecting new account fraud](/docs/new-account-fraud-use-case-tutorial)
* [Low-latency identification with Sealed Client Results](/docs/sealed-client-results)
