Products

For Enterprises

⇧1

World-class multimodal multilingual embeddings.

World-class neural retriever for maximizing search relevancy.

Read URLs and search web for better grounding LLMs.

Zero-shot and few-shot classification for image and text.

Cut long text into chunks and do tokenization.

For Power Users

Premier tool for prompt engineering

More power user tools

Leading AI solution for image captions and video summaries

Blog to banner, without the prompts!

More modality, longer memory, less cost

Ultimate AI decision-making tools

Company

Terms & Conditions

Newsroom

Accelerate search AI, one word at a time.

Featured

Abstract artistic portrait using a montage of colorful squares and scattered text.

October 22, 2024 • 16 minutes read

Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification

New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.

September 18, 2024 • 10 minutes read

Jina Embeddings v3: A Frontier Multilingual Embedding Model

jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.

Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.

September 11, 2024 • 13 minutes read

Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown

Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.

Technical screenshot displaying "REAPER-LM-0.5B/1.5B" with HTML source code for Jina's search grounding feature.

Latest

Abstract digital landscape with wave-like green and pink dunes against a dark background, conveying a tranquil atmosphere.

October 29, 2024 • 11 minutes read

Beyond CLIP: How Jina-CLIP Advances Multimodal Search

A pattern of yellow file icons on a blue background with one icon displaying a smiley face creating an emotive contrast.

October 25, 2024 • 19 minutes read

Finding Optimal Breakpoints in Long Documents Using Small Language Models

Jina developer interface showing "Jina AI was founded in 2020" with controls labeled true and false, and web address on top.

October 15, 2024 • 9 minutes read

Fact-Checking with New Grounding API in Jina Reader

Academic Publications

September 18, 2024

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

September 07, 2024

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models

August 30, 2024

Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Jina CLIP: Your CLIP Model Is Also Your Text Retriever

February 26, 2024

Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings

October 30, 2023

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models

8 publications in total.

Featured

All

Press release

Tech blog

Opinion

Event

October 29, 2024 • 11 minutes read

Beyond CLIP: How Jina-CLIP Advances Multimodal Search

Learn how Jina-CLIP enhances OpenAI's CLIP with better retrieval accuracy and more diverse results through unified text-image embeddings.

Abstract digital landscape with wave-like green and pink dunes against a dark background, conveying a tranquil atmosphere.

October 25, 2024 • 19 minutes read

Finding Optimal Breakpoints in Long Documents Using Small Language Models

We trained three small language models to better segment long documents into chunks, and here are the key lessons we learned.

A pattern of yellow file icons on a blue background with one icon displaying a smiley face creating an emotive contrast.

October 22, 2024 • 16 minutes read

Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification

New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.

Abstract artistic portrait using a montage of colorful squares and scattered text.

October 15, 2024 • 9 minutes read

Fact-Checking with New Grounding API in Jina Reader

With the new g.jina.ai, you can easily ground statements to reduce LLM hallucinations or improve the integrity of human-written content.

Jina developer interface showing "Jina AI was founded in 2020" with controls labeled true and false, and web address on top.

October 09, 2024 • 13 minutes read

Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning

Multilingual models often face a "language gap," where similar phrases in different languages don't align. We show how contrastive learning can bridge this gap, enhancing cross-language performance.

Neon green squares form intricate patterns on a black digital background, creating a dynamic, abstract design.

October 03, 2024 • 9 minutes read

What Late Chunking Really Is & What It’s Not: Part II

Part 2 of our exploration of Late Chunking, a deep dive into why it is the best method for chunk embeddings and improving search/RAG performance.

Slide depicting the "Late Chunking" process, with flow charts and a model highlighting the transition from a "Long Document"

September 27, 2024 • 15 minutes read

Migration From Jina Embeddings v2 to v3

We collected some tips to help you migrate from Jina Embeddings v2 to v3.

A digital upgrade theme with "V3" and a white "2", set against a green and black binary code background, with "Upgrade" centr

September 18, 2024 • 10 minutes read

Jina Embeddings v3: A Frontier Multilingual Embedding Model

jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.

Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.

September 11, 2024 • 13 minutes read

Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown

Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.

Technical screenshot displaying "REAPER-LM-0.5B/1.5B" with HTML source code for Jina's search grounding feature.

August 30, 2024 • 10 minutes read

Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking

Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.

Dark-themed coding interface displaying English and Japanese characters with "JINA COLBERT V2" highlighted in the center.

August 26, 2024 • 13 minutes read

The What and Why of Text-Image Modality Gap in CLIP Models

You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?

Futuristic black image with "modality gap" in 3D purple letters, additional text, and a dynamic glass sphere effect.

August 22, 2024 • 8 minutes read

Late Chunking in Long-Context Embedding Models

Chunking long documents while preserving contextual information is challenging. We introduce the "Late Chunking" that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.

Diagram illustrating the 'Late Chunking' and 'Long Document Model' processes in machine learning on a black background.

Search by title

Filter by product

Filter by author

Offices

Berlin, Germany (HQ)

Prinzessinnenstraße 19-20, 10969 Berlin, Germany

Beijing, China

Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China

Shenzhen, China

402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China

Search Foundation

Get Jina AI API key

Company

Terms

Commercial License

Terms & Conditions

Jina AI GmbH © 2020-2024.