Segmenter

Free API for segmenting long text into chunks and tokenization.

What is a Segmenter?

A segmenter is a crucial component that converts text into tokens or chunks, which are the basic units of data that an embedding/reranker model or LLM processes. Tokens can represent whole words, parts of words, or even individual characters.

Input text

Chunking long documents, lightning fast!

You can also use Segmenter API to cut long documents into smaller chunks, making it easier to process them in embeddings or rerankers. We leverage common structural cues and build a set of rules and heuristics which perform well across diverse types of content, e.g. Markdown, HTML, LaTeX and CJK languages.

Input text

Maximum length of each chunk: 1000

Maximum number of characters in each chunk. In practice the chunk length can be smaller than this value, if there is a good boundary in the text.

0 chunks in total

Segmenter API is free!

By providing your API key, you can access a higher rate limit, and your key won't be charged.

Rate Limit

Rate limits are tracked in two ways: RPM (requests per minute) and TPM (tokens per minute). Limits are enforced per IP and can be reached based on whichever threshold—RPM or TPM—is hit first.

Columns

Product	API Endpoint	Description	w/o API Key	w/ API Key	w/ Premium API Key	Average Latency	Token Usage Counting	Allowed Request
Embedding API	`https://api.jina.ai/v1/embeddings`	Convert text/images to fixed-length vectors		500 RPM & 1,000,000 TPM	2,000 RPM & 5,000,000 TPM	depends on the input size	Count the number of tokens in the input request.	POST
Reranker API	`https://api.jina.ai/v1/rerank`	Tokenize and segment long text		500 RPM & 1,000,000 TPM	2,000 RPM & 5,000,000 TPM	depends on the input size	Count the number of tokens in the input request.	POST
Reader API	`https://r.jina.ai`	Convert URL to LLM-friendly text	20 RPM	200 RPM	1000 RPM	4.6s	Count the number of tokens in the output response.	GET/POST
Reader API	`https://s.jina.ai`	Search the web and convert results to LLM-friendly text		40 RPM	100 RPM	8.7s	Count the number of tokens in the output response.	GET/POST
Reader API	`https://g.jina.ai`	Grounding a statement with web knowledge		10 RPM	30 RPM	22.7s	Count the total number of tokens in the whole process.	GET/POST
Classifier API (Zero-shot)	`https://api.jina.ai/v1/classify`	Classify inputs using zero-shot classification		200 RPM & 500,000 TPM	1,000 RPM & 3,000,000 TPM	depends on the input size	Tokens counted as: input_tokens + label_tokens	POST
Classifier API (Few-shot)	`https://api.jina.ai/v1/classify`	Classify inputs using a trained few-shot classifier		20 RPM & 200,000 TPM	60 RPM & 1,000,000 TPM	depends on the input size	Tokens counted as: input_tokens	POST
Classifier API	`https://api.jina.ai/v1/train`	Train a classifier using labeled examples		20 RPM & 200,000 TPM	60 RPM & 1,000,000 TPM	depends on the input size	Tokens counted as: input_tokens × num_iters	POST
Segmenter API	`https://segment.jina.ai`	Tokenize and segment long text	20 RPM	200 RPM	1,000 RPM	0.3s	Token is not counted as usage.	GET/POST

Get your API key

Contact sales

Segmenter API

Our Segmenter API is crucial for helping LLMs manage input within context limits, and optimizing model performance. It allows developers to count tokens and extract relevant text segments, ensuring efficient data processing and cost management.

We cannot generate an API key because we couldn't verify if you are human. If you believe this is an error, please contact us.

Contact

Auto preview

FAQ

Status

https://segment.jina.ai/?content=

Change 'content' and see live result

78 tokens, 135 characters.

Return the tokens

Return the tokens and their corresponding ids in the response. Toggle to see the result visualization.

Return the chunks

Chunking the input into semantically meaningful segments while handling a wide variety of text types and edge cases based on common structural cues.

Return the first N tokens

Return the first N tokens of the given content. Boundary exclusive. Can not be used with 'tail'.

Return the last N tokens

Return the last N tokens of the given content. Boundary exclusive. Can not be used with 'head'.

Segmenter

Choose the tokenizer to use.

cl100k_base

Request

Bash

Language

curl -X POST 'https://segment.jina.ai/' \
  -H "Content-Type: application/json" \
  -d @- <<EOFEOF
  {
    "content": "\n  Jina AI: Your Search Foundation, Supercharged! 🚀\n  Ihrer Suchgrundlage, aufgeladen! 🚀\n  您的搜索底座，从此不同！🚀\n  検索ベース,もう二度と同じことはありません！🚀\n"
  }
EOFEOF

API key

Available tokens

This is your unique key. Store it securely!

Product	API Endpoint	Description	w/o API Key	w/ API Key	w/ Premium API Key	Average Latency	Token Usage Counting	Allowed Request
Embedding API	`https://api.jina.ai/v1/embeddings`	Convert text/images to fixed-length vectors		500 RPM & 1,000,000 TPM	2,000 RPM & 5,000,000 TPM	depends on the input size	Count the number of tokens in the input request.	POST
Reranker API	`https://api.jina.ai/v1/rerank`	Tokenize and segment long text		500 RPM & 1,000,000 TPM	2,000 RPM & 5,000,000 TPM	depends on the input size	Count the number of tokens in the input request.	POST
Reader API	`https://r.jina.ai`	Convert URL to LLM-friendly text	20 RPM	200 RPM	1000 RPM	4.6s	Count the number of tokens in the output response.	GET/POST
Reader API	`https://s.jina.ai`	Search the web and convert results to LLM-friendly text		40 RPM	100 RPM	8.7s	Count the number of tokens in the output response.	GET/POST
Reader API	`https://g.jina.ai`	Grounding a statement with web knowledge		10 RPM	30 RPM	22.7s	Count the total number of tokens in the whole process.	GET/POST
Classifier API (Zero-shot)	`https://api.jina.ai/v1/classify`	Classify inputs using zero-shot classification		200 RPM & 500,000 TPM	1,000 RPM & 3,000,000 TPM	depends on the input size	Tokens counted as: input_tokens + label_tokens	POST
Classifier API (Few-shot)	`https://api.jina.ai/v1/classify`	Classify inputs using a trained few-shot classifier		20 RPM & 200,000 TPM	60 RPM & 1,000,000 TPM	depends on the input size	Tokens counted as: input_tokens	POST
Classifier API	`https://api.jina.ai/v1/train`	Train a classifier using labeled examples		20 RPM & 200,000 TPM	60 RPM & 1,000,000 TPM	depends on the input size	Tokens counted as: input_tokens × num_iters	POST
Segmenter API	`https://segment.jina.ai`	Tokenize and segment long text	20 RPM	200 RPM	1,000 RPM	0.3s	Token is not counted as usage.	GET/POST

Segmenter

What is a Segmenter?

Chunking long documents, lightning fast!

Segmenter API is free!

Segmenter API

FAQ

How to get my API key?

What's the rate limit?

Other questions