notifications
News
box
Products
keyboard_arrow_down
For Enterprises
⇧1
Embeddings
World-class multimodal multilingual embeddings.
Reranker
World-class neural retriever for maximizing search relevancy.
Reader
Read URLs and search web for better grounding LLMs.
Classifier
Zero-shot and few-shot classification for image and text.
Segmenter
Cut long text into chunks and do tokenization.

For Power Users
PromptPerfect
Premier tool for prompt engineering
More power user tools
keyboard_arrow_down
SceneXplain
Leading AI solution for image captions and video summaries
BestBanner
Blog to banner, without the prompts!
JinaChat
More modality, longer memory, less cost
Rationale
Ultimate AI decision-making tools


Company
keyboard_arrow_down
About us
Contact sales
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms & Conditions


Newsroom

Accelerate search AI, one word at a time.

rss_feedRSS
folder_special
Featured
Abstract artistic portrait using a montage of colorful squares and scattered text.
October 22, 2024 • 16 minutes read
Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification
New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.
Jina AI
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
Jina AI
Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
Jina AI
Technical screenshot displaying "REAPER-LM-0.5B/1.5B" with HTML source code for Jina's search grounding feature.
update
Latest
Abstract digital landscape with wave-like green and pink dunes against a dark background, conveying a tranquil atmosphere.
October 29, 2024 • 11 minutes read
Beyond CLIP: How Jina-CLIP Advances Multimodal Search
Bo Wang
Alex C-G
A pattern of yellow file icons on a blue background with one icon displaying a smiley face creating an emotive contrast.
October 25, 2024 • 19 minutes read
Finding Optimal Breakpoints in Long Documents Using Small Language Models
Alex C-G
Andrei Ungureanu
Jina developer interface showing "Jina AI was founded in 2020" with controls labeled true and false, and web address on top.
October 15, 2024 • 9 minutes read
Fact-Checking with New Grounding API in Jina Reader
Jina AI
school
Academic Publications
arXiv
September 18, 2024
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
arXiv
September 07, 2024
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models
arXiv
August 30, 2024
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever
arXiv
June 21, 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
ICML 2024
May 30, 2024
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
arXiv
February 26, 2024
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings
arXiv
October 30, 2023
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
EMNLP 2023
July 20, 2023
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models
8 publications in total.
folder_special
Featured
All
Press release
Tech blog
Opinion
Event
chevron_leftchevron_right

October 29, 2024 • 11 minutes read
Beyond CLIP: How Jina-CLIP Advances Multimodal Search
Learn how Jina-CLIP enhances OpenAI's CLIP with better retrieval accuracy and more diverse results through unified text-image embeddings.
Bo Wang
Alex C-G
Abstract digital landscape with wave-like green and pink dunes against a dark background, conveying a tranquil atmosphere.
October 25, 2024 • 19 minutes read
Finding Optimal Breakpoints in Long Documents Using Small Language Models
We trained three small language models to better segment long documents into chunks, and here are the key lessons we learned.
Alex C-G
Andrei Ungureanu
A pattern of yellow file icons on a blue background with one icon displaying a smiley face creating an emotive contrast.
October 22, 2024 • 16 minutes read
Jina Classifier API for High Performance Zero-Shot and Few-Shot Classification
New Classifier API offers zero-shot and few-shot classification for text and images. Start classifying content instantly or train it with your own examples.
Jina AI
Abstract artistic portrait using a montage of colorful squares and scattered text.
October 15, 2024 • 9 minutes read
Fact-Checking with New Grounding API in Jina Reader
With the new g.jina.ai, you can easily ground statements to reduce LLM hallucinations or improve the integrity of human-written content.
Jina AI
Jina developer interface showing "Jina AI was founded in 2020" with controls labeled true and false, and web address on top.
October 09, 2024 • 13 minutes read
Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning
Multilingual models often face a "language gap," where similar phrases in different languages don't align. We show how contrastive learning can bridge this gap, enhancing cross-language performance.
Bo Wang
Scott Martens
Alex C-G
Neon green squares form intricate patterns on a black digital background, creating a dynamic, abstract design.
October 03, 2024 • 9 minutes read
What Late Chunking Really Is & What It’s Not: Part II
Part 2 of our exploration of Late Chunking, a deep dive into why it is the best method for chunk embeddings and improving search/RAG performance.
Han Xiao
Slide depicting the "Late Chunking" process, with flow charts and a model highlighting the transition from a "Long Document"
September 27, 2024 • 15 minutes read
Migration From Jina Embeddings v2 to v3
We collected some tips to help you migrate from Jina Embeddings v2 to v3.
Alex C-G
Scott Martens
A digital upgrade theme with "V3" and a white "2", set against a green and black binary code background, with "Upgrade" centr
September 18, 2024 • 10 minutes read
Jina Embeddings v3: A Frontier Multilingual Embedding Model
jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.
Jina AI
Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.
September 11, 2024 • 13 minutes read
Reader-LM: Small Language Models for Cleaning and Converting HTML to Markdown
Reader-LM-0.5B and Reader-LM-1.5B are two novel small language models inspired by Jina Reader, designed to convert raw, noisy HTML from the open web into clean markdown.
Jina AI
Technical screenshot displaying "REAPER-LM-0.5B/1.5B" with HTML source code for Jina's search grounding feature.
August 30, 2024 • 10 minutes read
Jina ColBERT v2: Multilingual Late Interaction Retriever for Embedding and Reranking
Jina ColBERT v2 supports 89 languages with superior retrieval performance, user-controlled output dimensions, and 8192 token-length.
Jina AI
Dark-themed coding interface displaying English and Japanese characters with "JINA COLBERT V2" highlighted in the center.
August 26, 2024 • 13 minutes read
The What and Why of Text-Image Modality Gap in CLIP Models
You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?
Bo Wang
Scott Martens
Futuristic black image with "modality gap" in 3D purple letters, additional text, and a dynamic glass sphere effect.
August 22, 2024 • 8 minutes read
Late Chunking in Long-Context Embedding Models
Chunking long documents while preserving contextual information is challenging. We introduce the "Late Chunking" that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.
Michael Günther
Han Xiao
Diagram illustrating the 'Late Chunking' and 'Long Document Model' processes in machine learning on a black background.
Offices
location_on
Berlin, Germany (HQ)
Prinzessinnenstraße 19-20, 10969 Berlin, Germany
location_on
Beijing, China
Level 5, Building 6, No.48 Haidian West St. Beijing Haidian, China
location_on
Shenzhen, China
402, Floor 4, Fu'an Technology Building, Shenzhen Nanshan, China
Search Foundation
Embeddings
Reranker
Reader
Classifier
Segmenter
Get Jina AI API key
Rate Limit
API Status
Company
About us
Contact sales
Newsroom
Intern program
Join us
open_in_new
Download logo
open_in_new
Terms
Commercial License
Security
Terms & Conditions
Privacy
Manage Cookies
email
Jina AI GmbH © 2020-2024.