Endpoint operations

Runpod’s endpoint operations allow you to control the complete lifecycle of your Serverless workloads. This guide demonstrates how to submit, monitor, manage, and retrieve results from jobs running on your Serverless endpoints.

Operation overview

/run: Submit an asynchronous job that processes in the background while you receive an immediate job ID.
/runsync: Submit a synchronous job and wait for the complete results in a single response.
/status: Check the current status, execution details, and results of a previously submitted job.
/stream: Receive incremental results from a job as they become available.
/cancel: Stop a job that is in progress or waiting in the queue.
/retry: Requeue a failed or timed-out job using the same job ID and input parameters.
/purge-queue: Clear all pending jobs from the queue without affecting jobs already in progress.
/health: Monitor the operational status of your endpoint, including worker and job statistics.

Submitting jobs

Runpod offers two primary methods for submitting jobs, each suited for different use cases.

Asynchronous jobs (`/run`)

Use asynchronous jobs for longer-running tasks that don’t require immediate results. This approach returns immediately with a job ID and then processes the job in the background. This approach is particularly useful for operations that require significant processing time, or when you want to manage multiple jobs concurrently.

Payload limit: 10 MB
Job availability: Results are available for 30 minutes after completion

cURL
Python
Response

# Submit an asynchronous job
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/run \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer ${API_KEY}' \
    -d '{"input": {"prompt": "Your prompt"}}'

import requests

def submit_async_job(endpoint_id, api_key, input_data):
    # Submit an asynchronous job to a Runpod endpoint.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    api_key: Your Runpod API key
    #    input_data: Dictionary containing the job input
        
    # Returns:
    #    Dictionary containing job ID and status
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    payload = {"input": input_data}
    
    response = requests.post(url, headers=headers, json=payload)
    return response.json()

# Example usage
if __name__ == "__main__":
    endpoint_id = "your-endpoint-id"
    api_key = "your-api-key"
    input_data = {"prompt": "Your prompt"}
    
    result = submit_async_job(endpoint_id, api_key, input_data)
    print(f"Job ID: {result['id']}")
    print(f"Status: {result['status']}")

{
  "id": "eaebd6e7-6a92-4bb8-a911-f996ac5ea99d",
  "status": "IN_QUEUE"
}

Synchronous jobs (`/runsync`)

Use synchronous jobs for shorter tasks where you need immediate results. Synchronous jobs waits for job completion before returning the complete result in a single response. This simplifies your code by eliminating the need for status polling, which works best for quick operations (under 30 seconds).

Payload limit: 20 MB
Job availability: Results are available for 60 seconds after completion

cURL
Python
Response

# Submit a synchronous job
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/runsync \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer ${API_KEY}' \
    -d '{"input": {"prompt": "Your prompt"}}'

import requests

def submit_sync_job(endpoint_id, api_key, input_data):
    # Submit a synchronous job to a Runpod endpoint.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    api_key: Your Runpod API key
    #    input_data: Dictionary containing the job input
        
    # Returns:
    #    Dictionary containing complete job results
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/runsync"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    payload = {"input": input_data}
    
    response = requests.post(url, headers=headers, json=payload)
    return response.json()

{
  "delayTime": 824,      // Time in queue (ms)
  "executionTime": 3391, // Processing time (ms)
  "id": "sync-79164ff4-d212-44bc-9fe3-389e199a5c15",
  "output": [
    {
      "image": "https://image.url",
      "seed": 46578
    }
  ],
  "status": "COMPLETED"
}

Monitoring jobs

Checking job status (`/status`)

For asynchronous jobs, you can check the status at any time using the job ID. The status endpoint provides:

Current job state (IN_QUEUE, IN_PROGRESS, COMPLETED, FAILED, etc.).
Execution statistics (queue delay, processing time).
Job output (if completed).

cURL
Python
Response

# Check job status
curl -X GET https://api.runpod.ai/v2/{endpoint_id}/status/{job_id} \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def check_job_status(endpoint_id, job_id, api_key):
    # Check the status of a Runpod job.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    job_id: The ID of the job to check
    #    api_key: Your Runpod API key
        
    # Returns:
    #    Dictionary containing job status and results (if complete)
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{job_id}"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.get(url, headers=headers)
    return response.json()

{
  "delayTime": 31618,     // Time in queue (ms)
  "executionTime": 1437,  // Processing time (ms)
  "id": "60902e6c-08a1-426e-9cb9-9eaec90f5e2b-u1",
  "output": {
    "input_tokens": 22,
    "output_tokens": 16,
    "text": ["Hello! How can I assist you today?\nUSER: I'm having"]
  },
  "status": "COMPLETED"
}

You can use the /status operation to configure the time-to-live (TTL) for an individual job by appending a TTL parameter when checking the status of a job. For example, https://api.runpod.ai/v2/{endpoint_id}/status/{job_id}?ttl=6000 sets the TTL for the job to 6 seconds. Use this when you want to tell the system to remove a job result sooner than the default retention time.

Streaming results (`/stream`)

For jobs that generate output incrementally or for very large outputs, use the stream endpoint to receive partial results as they become available. This is especially useful for:

Text generation tasks where you want to display output as it’s created
Long-running jobs where you want to show progress
Large outputs that benefit from incremental processing

cURL
Python
Response

# Stream job results
curl -X GET https://api.runpod.ai/v2/{endpoint_id}/stream/{job_id} \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def stream_job_results(endpoint_id, job_id, api_key):
    # Stream results from a Runpod job.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    job_id: The ID of the job to stream
    #    api_key: Your Runpod API key
        
    # Returns:
    #    List of partial results
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/stream/{job_id}"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.get(url, headers=headers)
    return response.json()

[
  {
    "metrics": {
      "avg_gen_throughput": 0,
      "avg_prompt_throughput": 0,
      "cpu_kv_cache_usage": 0,
      "gpu_kv_cache_usage": 0.0016722408026755853,
      "input_tokens": 0,
      "output_tokens": 1,
      "pending": 0,
      "running": 1,
      "scenario": "stream",
      "stream_index": 2,
      "swapped": 0
    },
    "output": {
      "input_tokens": 0,
      "output_tokens": 1,
      "text": [" How"]
    }
  },
  {
    // Additional stream chunks...
  }
]

The maximum size for a single streamed payload chunk is 1 MB. Larger outputs will be split across multiple chunks.

Endpoint health monitoring (`/health`)

The health endpoint provides a quick overview of your endpoint’s operational status. Use it to monitor worker availability, track job queue status, identify potential bottlenecks, and determine if scaling adjustments are needed.

cURL
Python
Response

# Check endpoint health
curl -X GET https://api.runpod.ai/v2/{endpoint_id}/health \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def check_endpoint_health(endpoint_id, api_key):
    # Check the health of a Runpod endpoint.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    api_key: Your Runpod API key
        
    # Returns:
    #    Dictionary containing endpoint health information
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/health"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.get(url, headers=headers)
    return response.json()

{
  "jobs": {
    "completed": 1,
    "failed": 5,
    "inProgress": 0,
    "inQueue": 2,
    "retried": 0
  },
  "workers": {
    "idle": 0,
    "running": 0
  }
}

Managing jobs

Cancelling jobs (`/cancel`)

Cancel jobs that are no longer needed or taking too long to complete. This operation stops jobs that are in progress, removes jobs from the queue if they are not yet started, and returns immediately with the job’s canceled status.

cURL
Python
Response

# Cancel a job
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/cancel/{job_id} \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def cancel_job(endpoint_id, job_id, api_key):
    # Cancel a Runpod job.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    job_id: The ID of the job to cancel
    #    api_key: Your Runpod API key
        
    # Returns:
    #    Dictionary containing job status after cancellation
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/cancel/{job_id}"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.post(url, headers=headers)
    return response.json()

{
  "id": "724907fe-7bcc-4e42-998d-52cb93e1421f-u1",
  "status": "CANCELLED"
}

Retrying failed jobs (`/retry`)

Retry jobs that have failed or timed out without having to submit a new job request. This operation maintains the same job ID for tracking and requeues the job with the original input parameters, removing the previous output (if any). It can only be used for jobs with a FAILED or TIMED_OUT status.

cURL
Python
Response

# Retry a failed job
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/retry/{job_id} \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def retry_job(endpoint_id, job_id, api_key):
    # Retry a failed or timed-out Runpod job.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    job_id: The ID of the job to retry
    #    api_key: Your Runpod API key
        
    # Returns:
    #    Dictionary containing job ID and new status
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/retry/{job_id}"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.post(url, headers=headers)
    return response.json()

{
  "id": "60902e6c-08a1-426e-9cb9-9eaec90f5e2b-u1",
  "status": "IN_QUEUE"
}

Job results expire after a set period:

Asynchronous jobs (/run): Results available for 30 minutes
Synchronous jobs (/runsync): Results available for 1 minute

Once expired, jobs cannot be retried.

Purging the queue (`/purge-queue`)

Clear all pending jobs from the queue when you need to reset or cancel multiple jobs at once. This is useful for error recovery, clearing outdated requests, resetting after configuration changes, and managing resource allocation.

cURL
Python
Response

# Purge the job queue
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/purge-queue \
    -H 'Authorization: Bearer ${API_KEY}'

import requests

def purge_queue(endpoint_id, api_key):
    # Purge all queued jobs for a Runpod endpoint.
    
    # Args:
    #    endpoint_id: Your Runpod endpoint ID
    #    api_key: Your Runpod API key
        
    # Returns:
    #    Dictionary with number of removed jobs and status
    
    url = f"https://api.runpod.ai/v2/{endpoint_id}/purge-queue"
    headers = {"Authorization": f"Bearer {api_key}"}
    
    response = requests.post(url, headers=headers)
    return response.json()

{
  "removed": 2,
  "status": "completed"
}

The purge-queue operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.

Rate limits and quotas

Runpod enforces rate limits to ensure fair platform usage. These limits apply per endpoint and operation:

Operation	Method	Rate Limit	Concurrent Limit
`/run`	POST	1000 requests per 10 seconds	200 concurrent
`/runsync`	POST	2000 requests per 10 seconds	400 concurrent
`/status`, `/status-sync`, `/stream`	GET/POST	2000 requests per 10 seconds	400 concurrent
`/cancel`	POST	100 requests per 10 seconds	20 concurrent
`/purge-queue`	POST	2 requests per 10 seconds	N/A
`/openai/*`	POST	2000 requests per 10 seconds	400 concurrent
`/requests`	GET	10 requests per 10 seconds	2 concurrent

Requests will receive a 429 (Too Many Requests) status if:

Queue size exceeds 50 jobs AND
Queue size exceeds endpoint.WorkersMax * 500

Exceeding these limits will result in HTTP 429 (Too Many Requests) responses. Implement appropriate retry logic with exponential backoff in your applications to handle rate limiting gracefully.

Best practices

Use asynchronous endpoints for jobs that take more than a few seconds to complete.
Implement polling with backoff when checking status of asynchronous jobs.
Set appropriate timeouts in your client applications.
Monitor endpoint health regularly to detect issues early.
Implement error handling for all API calls.
Use webhooks for notification-based workflows instead of polling. See Send requests for implementation details.
Cancel unneeded jobs to free up resources and reduce costs.

Troubleshooting

Issue	Possible Causes	Solutions
Job stuck in queue	No available workers, max workers limit reached	Increase max workers, check endpoint health
Timeout errors	Job takes longer than execution timeout	Increase timeout in job policy, optimize job processing
Failed jobs	Worker errors, input validation issues	Check endpoint logs, verify input format, retry with fixed input
Rate limiting	Too many requests in short time	Implement backoff strategy, batch requests when possible
Missing results	Results expired	Retrieve results within expiration window (30 min for async, 1 min for sync)

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Reference

Operation overview

Submitting jobs

Asynchronous jobs (`/run`)

Synchronous jobs (`/runsync`)

Monitoring jobs

Checking job status (`/status`)

Streaming results (`/stream`)

Endpoint health monitoring (`/health`)

Managing jobs

Cancelling jobs (`/cancel`)

Retrying failed jobs (`/retry`)

Purging the queue (`/purge-queue`)

Rate limits and quotas

Best practices

Troubleshooting

Get started

Serverless

Hub

Pods

Instant Clusters

Fine-tuning

Reference

​Operation overview

​Submitting jobs

​Asynchronous jobs (/run)

​Synchronous jobs (/runsync)

​Monitoring jobs

​Checking job status (/status)

​Streaming results (/stream)

​Endpoint health monitoring (/health)

​Managing jobs

​Cancelling jobs (/cancel)

​Retrying failed jobs (/retry)

​Purging the queue (/purge-queue)

​Rate limits and quotas

​Best practices

​Troubleshooting

​Related resources

Operation overview

Submitting jobs

Asynchronous jobs (`/run`)

Synchronous jobs (`/runsync`)

Monitoring jobs

Checking job status (`/status`)

Streaming results (`/stream`)

Endpoint health monitoring (`/health`)

Managing jobs

Cancelling jobs (`/cancel`)

Retrying failed jobs (`/retry`)

Purging the queue (`/purge-queue`)

Rate limits and quotas

Best practices

Troubleshooting

Related resources