๐ DeepSeek-OCR PDF Parser by Jatevo LLM Inference
Upload a PDF to extract text and convert to Markdown using DeepSeek-OCR.
Each page is processed sequentially and combined into a single markdown document.
โจ Features
- ๐ผ๏ธ Image Embedding - Charts, graphs, and figures embedded directly in markdown
- ๐ Text Extraction - All text content from images and charts extracted
- ๐ Table Support - Tables converted to markdown format
- ๐ Object Detection - Locate specific elements in documents
- ๐ฏ Multiple Models - Choose speed vs. accuracy trade-off
๐ Model Sizes
- Tiny โ Fastest, lower accuracy (512ร512) - Best for large PDFs (30+ pages)
- Small โ Fast, good accuracy (640ร640) - Good for 15-30 pages
- Base โ Balanced performance (1024ร1024) - Good for 10-20 pages
- Large โ Best accuracy, slower (1280ร1280) - Best for <10 pages
- Gundam (Recommended) โ Optimized for documents (1024 base, 640 image, crop mode)
๐ก Tips
- Enable "Embed Images" to include charts/figures (recommended)
- Use Tiny or Small model for large PDFs (20+ pages)
- Processing time: ~2-5 seconds per page depending on model
- Maximum recommended: 50 pages at once
- Image embedding increases file size (~1-2MB per page with images)
๐ฏ Model Size
Use Tiny/Small for large PDFs (20+ pages)
๐ Task Type
Plain text only (faster)
Include charts/figures in output
๐ Processing Status
Watch the progress bar for real-time updates.
Note: Image embedding provides both:
- ๐๏ธ Visual image (embedded as base64)
- ๐ Extracted text content (OCR'd from image)
You get the best of both worlds!
๐ Markdown Output Preview
Upload a PDF and click 'Process PDF' to see results here.
The output will include both images and extracted text.