Skip to main content

REC-003: API Endpoint Discovery

CategoryReconnaissance
FrameworksATLAS: Discover ML Artifacts

Enumerate AI service endpoints. Model serving frameworks expose predictable paths for health, models, and inference.

Technique

# Common AI service endpoints
/v1/models # OpenAI-compatible
/api/generate # Ollama
/api/tags # Ollama model list
/health # Model server health
/v1/embeddings # Embedding service
/collections # Qdrant vector DB
/api/2.0/mlflow/* # MLflow tracking

Key Concepts

  • AI serving frameworks follow predictable URL patterns because they implement standardized APIs (OpenAI-compatible, REST conventions). This makes endpoint enumeration trivial with a short wordlist tailored to ML infrastructure.
  • Health and model listing endpoints frequently lack authentication because they are designed for internal monitoring and orchestration. A single unauthenticated /v1/models response reveals model names, versions, and capabilities.
  • Discovering the serving framework (Ollama, vLLM, TGI, MLflow) immediately narrows the CVE search space. Each framework has known vulnerabilities, and version information is often included in response headers or health endpoints.
  • Vector database endpoints like Qdrant's /collections expose the RAG architecture, including collection names, vector dimensions, and document counts, which directly enables RAG pipeline attacks.
  • MLflow tracking endpoints can expose experiment metadata, model artifacts, and registered model versions, providing a roadmap of the ML development pipeline.

Detection

  • Monitor web server access logs for sequential requests to known AI framework paths, especially when hitting multiple framework-specific endpoints in rapid succession.
  • Alert on unauthenticated access to model management, health, and administrative endpoints from external IP ranges.
  • Deploy honeypot endpoints that mimic common AI service paths to detect scanning activity early.

Mitigation

  • Place all AI service endpoints behind authentication, including health checks and model listing endpoints. Internal-only services should not be exposed to the public internet.
  • Use a reverse proxy or API gateway to normalize and restrict the exposed endpoint surface, hiding framework-specific paths from external clients.
  • Disable or restrict administrative and debugging endpoints (MLflow UI, Ollama API, Qdrant dashboard) in production deployments, and segment them onto an internal network.

Example Output

Ollama Model Enumeration

$ curl -s http://10.10.14.50:11434/api/tags | jq .
{
"models": [
{
"name": "llama3:8b",
"model": "llama3:8b",
"modified_at": "2024-09-12T08:41:22.138Z",
"size": 4661211808,
"digest": "a2af76f05e2dc7f3adbc596ee21e498e3a6b2e2e4b5b391d2e8e0f1bc3a2f7e9d1",
"details": {
"parent_model": "",
"format": "gguf",
"family": "llama",
"families": ["llama"],
"parameter_size": "8.0B",
"quantization_level": "Q4_0"
}
},
{
"name": "mistral:7b",
"model": "mistral:7b",
"modified_at": "2024-09-10T14:23:05.441Z",
"size": 4108916384,
"digest": "d4c5b2e9f71a3c8e90b2d1f4a6e8c3b5d7f9a1c3e5",
"details": {
"parent_model": "",
"format": "gguf",
"family": "mistral",
"families": ["mistral"],
"parameter_size": "7.2B",
"quantization_level": "Q4_0"
}
},
{
"name": "nomic-embed-text:latest",
"model": "nomic-embed-text:latest",
"modified_at": "2024-09-08T19:55:41.207Z",
"size": 274302450,
"digest": "f1b2c3d4e5a6b7c8d9e0f1a2b3c4d5e6f7a8b9c0d1",
"details": {
"parent_model": "",
"format": "gguf",
"family": "nomic-bert",
"families": ["nomic-bert"],
"parameter_size": "137M",
"quantization_level": "F16"
}
}
]
}

The presence of an embedding model (nomic-embed-text) alongside chat models strongly suggests a RAG pipeline. The embedding model generates vectors that are likely stored in a nearby vector database.

Qdrant Collection Enumeration

$ curl -s http://10.10.14.50:6333/collections | jq .
{
"result": {
"collections": [
{
"name": "documents"
},
{
"name": "user_embeddings"
}
]
},
"status": "ok",
"time": 0.000042
}
# Get details on a specific collection
$ curl -s http://10.10.14.50:6333/collections/documents | jq .
{
"result": {
"status": "green",
"optimizer_status": "ok",
"vectors_count": 14832,
"indexed_vectors_count": 14832,
"points_count": 14832,
"segments_count": 6,
"config": {
"params": {
"vectors": {
"size": 768,
"distance": "Cosine"
},
"shard_number": 1,
"replication_factor": 1
}
}
},
"status": "ok",
"time": 0.000118
}

14,832 points with 768-dimension vectors confirms a nomic-embed-text-sized embedding pipeline. The user_embeddings collection likely contains user-specific data worth investigating separately.

MLflow Model Registry Enumeration

$ curl -s http://10.10.14.50:5000/api/2.0/mlflow/registered-models/list | jq .
{
"registered_models": [
{
"name": "fraud-detection-prod",
"creation_timestamp": 1709251200000,
"last_updated_timestamp": 1725984000000,
"description": "Production fraud detection classifier - DO NOT DELETE",
"latest_versions": [
{
"name": "fraud-detection-prod",
"version": "12",
"creation_timestamp": 1725984000000,
"current_stage": "Production",
"source": "s3://ml-artifacts-internal/fraud-detection/v12/model",
"run_id": "a3b8c2d1e4f5a6b7c8d9e0f1a2b3c4d5",
"status": "READY"
}
]
},
{
"name": "customer-churn-v2",
"creation_timestamp": 1717200000000,
"last_updated_timestamp": 1724601600000,
"description": "",
"latest_versions": [
{
"name": "customer-churn-v2",
"version": "7",
"creation_timestamp": 1724601600000,
"current_stage": "Staging",
"source": "s3://ml-artifacts-internal/churn-model/v7/model",
"run_id": "f9e8d7c6b5a4f3e2d1c0b9a8f7e6d5c4",
"status": "READY"
}
]
}
]
}

The source field leaks the internal S3 bucket path and artifact structure. The model names and descriptions reveal business logic, and the run_id values can be used to pull full experiment metadata including parameters, metrics, and training data paths via /api/2.0/mlflow/runs/get.