Advanced Command Line Parameters
SGLang Acceleration Parameter Optimization
Memory Optimization Parameters
Tip
SGLang acceleration mode currently supports running on Turing architecture graphics cards with a minimum of 8GB VRAM, but graphics cards with <24GB VRAM may encounter insufficient memory issues. You can optimize memory usage with the following parameters:
- If you encounter insufficient VRAM when using a single graphics card, you may need to reduce the KV cache size with
--mem-fraction-static 0.5
. If VRAM issues persist, try reducing it further to0.4
or lower. - If you have two or more graphics cards, you can try using tensor parallelism (TP) mode to simply expand available VRAM:
--tp-size 2
Performance Optimization Parameters
Tip
If you can already use SGLang normally for accelerated VLM model inference but still want to further improve inference speed, you can try the following parameters:
- If you have multiple graphics cards, you can use SGLang's multi-card parallel mode to increase throughput:
--dp-size 2
- You can also enable
torch.compile
to accelerate inference speed by approximately 15%:--enable-torch-compile
Parameter Passing Instructions
Tip
- All officially supported SGLang parameters can be passed to MinerU through command line arguments, including the following commands:
mineru
,mineru-sglang-server
,mineru-gradio
,mineru-api
- If you want to learn more about
sglang
parameter usage, please refer to the SGLang official documentation
GPU Device Selection and Configuration
CUDA_VISIBLE_DEVICES Basic Usage
Tip
- In any situation, you can specify visible GPU devices by adding the
CUDA_VISIBLE_DEVICES
environment variable at the beginning of the command line. For example:CUDA_VISIBLE_DEVICES=1 mineru -p <input_path> -o <output_path>
- This specification method is effective for all command line calls, including
mineru
,mineru-sglang-server
,mineru-gradio
, andmineru-api
, and applies to bothpipeline
andvlm
backends.
Common Device Configuration Examples
Tip
Here are some common CUDA_VISIBLE_DEVICES
setting examples:
CUDA_VISIBLE_DEVICES=1 # Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1 # Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES="0,1" # Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3 # Devices 0, 2, 3 will be visible; device 1 is masked
CUDA_VISIBLE_DEVICES="" # No GPU will be visible
Practical Application Scenarios
Tip
Here are some possible usage scenarios:
-
If you have multiple graphics cards and need to specify cards 0 and 1, using multi-card parallelism to start
sglang-server
, you can use the following command:CUDA_VISIBLE_DEVICES=0,1 mineru-sglang-server --port 30000 --dp-size 2
-
If you have multiple GPUs and need to specify GPU 0–3, and start the
sglang-server
using multi-GPU data parallelism and tensor parallelism, you can use the following command:CUDA_VISIBLE_DEVICES=0,1,2,3 mineru-sglang-server --port 30000 --dp-size 2 --tp-size 2
-
If you have multiple graphics cards and need to start two
fastapi
services on cards 0 and 1, listening on different ports respectively, you can use the following commands:# In terminal 1 CUDA_VISIBLE_DEVICES=0 mineru-api --host 127.0.0.1 --port 8000 # In terminal 2 CUDA_VISIBLE_DEVICES=1 mineru-api --host 127.0.0.1 --port 8001