# ocr

このトピックのトレンドリポジトリ（7件）

PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

ai4sciencechineseocrdocument-parsingdocument-translationkieocrpaddleocr-vlpdf-extractor-ragpdf-parserpdf2markdownpp-ocrpp-structurerag

opendatalab/MinerU

opendatalab/MinerUOtherPython

71.2k4回登場

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

ai4sciencedocument-analysisdocxextract-datalayout-analysisocrparserpdfpdf-converterpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragpdf-parserpptxpythonxlsx

paperless-ngx/paperless-ngx

paperless-ngx/paperless-ngxOtherPython

41.1k3回登場

A community-supported supercharged document management system: scan, index and archive all your documents

angulararchivingdjangodmsdocument-managementdocument-management-systemhacktoberfestmachine-learningocroptical-character-recognitionpdf

ShareX/ShareX

ShareX/ShareXOtherC#

37.0k

ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.

avaloniacapturecolor-pickercsharpdropboxfile-sharingfile-uploadftpgifgif-recorderimage-annotationocrproductivityregion-capturescreen-capturescreen-recorderscreenshotsharesharexurl-shortener

PDFの中身をAIが読める形に変換！精度No.1のオープンソースPDFパーサー — opendataloader-pdf

opendataloader-project/opendataloader-pdfAIJava

15.8k6回登場

opendataloader-pdfは、PDFファイルの中身（文章・表・画像・数式など）を、AIが理解しやすい形式（Markdown・JSON・HTML）に変換するオープンソースツールです。200件の実際のPDFを使ったベンチマークで総合精

a11yaccessibilityaibounding-boxdocument-parsingeaahtmljsonmarkdownocrocr-recognitionpdfpdf-accessibilitypdf-converterpdf-extractionpdf-parserpdf-uaragtablestagged-pdf

run-llama/liteparse

run-llama/liteparseOtherRust

8.1k3回登場

A fast, helpful, and open-source document parser

document-ocrdocument-processingocrocr-recognitionpdfpdf-parsertext-extraction

複雑な表・手書き・数式もまるごとデジタル化！90言語対応の最先端OCRモデル — chandra

datalab-to/chandraAIPython

7.8k3回登場

Chandra OCR 2は、画像やPDFに含まれる文字を読み取り、表や数式、手書き文字などのレイアウト情報を保ったままMarkdown・HTML・JSON形式に変換できるAI OCR（光学文字認識）モデルです。90以上の言語に対応しており

aiocr