Skip to main content


The SubRip file format is described on the Matroska multimedia container format website as β€œperhaps the most basic of all subtitle formats.” SubRip (SubRip Text) files are named with the extension .srt, and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000). The fractional separator used is the comma, since the program was written in France.

How to load data from subtitle (.srt) files

Please, download the example .srt file from here.

%pip install --upgrade --quiet  pysrt
from langchain_community.document_loaders import SRTLoader
loader = SRTLoader(
docs = loader.load()
'<i>Corruption discovered\nat the core of the Banking Clan!</i> <i>Reunited, Rush Clovis\nand Senator A'