{"id":2324954,"date":"2026-03-12T16:08:19","date_gmt":"2026-03-12T16:08:19","guid":{"rendered":"https:\/\/celebrity.land\/en\/?p=2324954"},"modified":"2026-03-12T16:08:19","modified_gmt":"2026-03-12T16:08:19","slug":"multimodal-embeddings-at-scale-ai-data-lake-for-media-and-entertainment-workloads","status":"publish","type":"post","link":"https:\/\/celebrity.land\/en\/multimodal-embeddings-at-scale-ai-data-lake-for-media-and-entertainment-workloads\/","title":{"rendered":"Multimodal embeddings at scale: AI data lake for media and entertainment workloads"},"content":{"rendered":"<p><\/p>\n<div id=\"\">\n<p>This post shows you how to build a scalable multimodal video search system that enables natural language search across large video datasets using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/nova\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Nova<\/a> models and <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/opensearch-service\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon OpenSearch Service<\/a>. You will learn how to move beyond manual tagging and keyword-based searches to enable semantic search that captures the full richness of video content.<\/p>\n<p>We demonstrate this at scale by processing 792,270 videos from two <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/registry.opendata.aws\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Open Data Registry<\/a> datasets: <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/registry.opendata.aws\/multimedia-commons\/\" target=\"_blank\" rel=\"noopener noreferrer\">Multimedia Commons<\/a> (787,479 videos, 37-second average) and <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/registry.opendata.aws\/mevadata\/\" target=\"_blank\" rel=\"noopener noreferrer\">MEVA<\/a> (4,791 videos, 5-minute average). Processing 8,480 hours of video content (30.5M seconds) took 41 hours. First-year total cost: $27,328 (with OpenSearch on-demand) or $23,632 (with OpenSearch Service Reserved Instances). The cost consisted of one-time ingestion ($18,088) and annual Amazon OpenSearch Service ($9,240 on-demand or $5,544 Reserved).<\/p>\n<p>The ingestion breakdown is as follows:<\/p>\n<ul>\n<li>Amazon Elastic Compute Cloud (Amazon EC2) compute (4\u00d7 c7i.48xlarge spot at $2.57\/hour \u00d7 41 hours): $421<\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/bedrock\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Bedrock<\/a> Nova Multimodal Embeddings (30.5M seconds \u00d7 $0.00056\/second batch pricing): $17,096<\/li>\n<li>Nova Pro tagging (792K videos \u00d7 600 tokens(avg.)): $571<\/li>\n<\/ul>\n<p>The solution generates audio-visual embeddings using <code>AUDIO_VIDEO_COMBINED<\/code> mode (see <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/embeddings-schema.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Multimodal Embeddings API schema<\/a>), stores them in OpenSearch Service, and supports text-to-video, video-to-video, and hybrid search.<\/p>\n<h2>Solution overview<\/h2>\n<p>The architecture consists of two main workflows\u2014ingestion and search\u2014that work together to enable multimodal video search at scale:<\/p>\n<p><strong>Video ingestion pipeline:<\/strong><\/p>\n<p>The ingestion pipeline uses four <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/ec2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon EC2<\/a> c7i.48xlarge instances with 600 parallel workers to process 19,400 videos per hour. The async API has a concurrency limit of 30 concurrent jobs per account (see <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/quotas.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Bedrock quotas<\/a>), so the pipeline implements a job queue with polling. Workers submit jobs up to the concurrency limit, poll for completion, and submit new jobs as slots become available. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/nova-embeddings.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Nova Multimodal Embeddings<\/a> handles video processing asynchronously, segmenting videos into 15-second chunks (optimized for capturing scene changes while keeping embedding counts manageable) and generating 1024-dimensional embeddings. Those embeddings were chosen over 3072-dimensional for 3x cost savings from the storage point of view with minimal accuracy impact. The embedding generation cost is agnostic to embedding dimensions. <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/prompting-video-understanding.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Nova Pro<\/a> adds 10-15 descriptive tags per video from a predefined taxonomy.<\/p>\n<p>Note: <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/aws\/introducing-amazon-nova-2-lite-a-fast-cost-effective-reasoning-model\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Nova 2 Lite<\/a> offers improved accuracy at lower cost for tagging tasks. We recommend that you consider it for new deployments. The system stores embeddings in an <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/knn.html\" target=\"_blank\" rel=\"noopener noreferrer\">OpenSearch k-NN index<\/a> for semantic search and metadata tags in a separate text index for keyword matching. For search, you can query videos three ways: convert natural language to embeddings for text-to-video search, compare video embeddings directly for video-to-video search, or combine both approaches in hybrid search.<\/p>\n<p><strong>Types of searches enabled by this solution:<\/strong><\/p>\n<ol>\n<li><strong>Text-to-video Search<\/strong> \u2013 Natural language queries converted to embeddings for semantic similarity matching<\/li>\n<li><strong>Video-to-video Search<\/strong> \u2013 Find similar content by comparing video embeddings directly<\/li>\n<li><strong>Hybrid search<\/strong> \u2013 Combines vector similarity (70% weight) with keyword matching (30% weight) for maximum accuracy<\/li>\n<\/ol>\n<h3>Video ingestion pipeline<\/h3>\n<p>The following diagram illustrates the video ingestion and processing pipeline:<\/p>\n<p><\/p>\n<p><em>Figure 1: Video ingestion pipeline showing the flow from S3 video storage through Nova Multimodal Embeddings and Nova Pro to dual OpenSearch indexes<\/em><\/p>\n<p>The video processing workflow is as follows:<\/p>\n<ol>\n<li>Upload videos to <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Simple Storage Service (Amazon S3)<\/a>.<\/li>\n<li>Process videos using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/APIReference\/API_runtime_StartAsyncInvoke.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Multimodal Embeddings async API<\/a>, which automatically segments videos and generates embeddings. An orchestrator polls for job completion (async API has a 30 concurrent job limit per account, see <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/userguide\/quotas.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Bedrock quotas<\/a>) and retrieves results from Amazon S3.<\/li>\n<li>Generate descriptive tags using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/prompting-video-understanding.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Pro<\/a> (or <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/ai\/responsible-ai\/nova-2-lite\/overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Lite<\/a> for better accuracy at lower cost) from a predefined taxonomy for enhanced search capabilities.<\/li>\n<li>Index embeddings in <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/knn.html\" target=\"_blank\" rel=\"noopener noreferrer\">OpenSearch k-NN index<\/a> and tags in text index.<\/li>\n<\/ol>\n<h3>Video search architecture<\/h3>\n<p>The following diagram shows the complete search architecture:<\/p>\n<p><img decoding=\"async\" style=\"margin: 10px 0px 10px 0px;border: 1px solid #cccccc\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2026\/03\/09\/l-ml-200122.png\"\/><\/p>\n<p><em>Figure 2: Video search architecture demonstrating three search modes \u2013 text-to-video, video-to-video, and hybrid search combining k-NN and BM25<\/em><\/p>\n<p>The search architecture enables three modes:<\/p>\n<ol>\n<li><strong>Text-to-video \u2013<\/strong> Natural language queries<\/li>\n<li><strong>Video-to-video<\/strong> \u2013 Similar content discovery<\/li>\n<li><strong>Hybrid<\/strong> \u2013 Combined semantic and keyword matching<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<p>Before you begin, you will need:<\/p>\n<ol>\n<li>An <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/free\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS account<\/a> with access to <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/bedrock\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Bedrock<\/a> in <code>us-east-1<\/code> (Nova models are enabled by default with appropriate IAM permissions)<\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.python.org\/downloads\/\" target=\"_blank\" rel=\"noopener noreferrer\">Python<\/a> 3.9 or later installed<\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/cli\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Command Line Interface (AWS CLI)<\/a> configured with appropriate credentials<\/li>\n<li>An Amazon OpenSearch Service domain (r6g.large or larger recommended)<\/li>\n<li>An <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/s3\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon S3<\/a> bucket for video storage and embedding outputs<\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/iam\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Identity and Access Management (IAM)<\/a> for Amazon Bedrock, OpenSearch Service, and Amazon S3<\/li>\n<\/ol>\n<p>The solution uses:<\/p>\n<ol start=\"7\">\n<li>Amazon Bedrock with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/nova-embeddings.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Multimodal Embeddings<\/a> (amazon.nova-2-multimodal-embeddings-v1:0)<\/li>\n<li>Amazon Bedrock with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/nova\/latest\/userguide\/prompting-video-understanding.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Pro<\/a> (us.amazon.nova-pro-v1:0) or <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/ai\/responsible-ai\/nova-2-lite\/overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Lite<\/a> (us.amazon.nova-2-lite-v1:0) for tagging<\/li>\n<li>Amazon OpenSearch Service 2.11 or later with <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/knn.html\" target=\"_blank\" rel=\"noopener noreferrer\">k-NN plugin<\/a><\/li>\n<li>Amazon S3 for video and embedding storage<\/li>\n<\/ol>\n<h2>Walkthrough<\/h2>\n<h3>Step 1: Create IAM roles and policies<\/h3>\n<p>Create an <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/id_roles.html\" target=\"_blank\" rel=\"noopener noreferrer\">IAM role<\/a> with permissions to invoke Amazon Bedrock models, write to OpenSearch indexes, and read\/write S3 objects.<\/p>\n<pre><code class=\"lang-json\">{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"bedrock:InvokeModel\",\n        \"bedrock:StartAsyncInvoke\",\n        \"bedrock:GetAsyncInvoke\",\n        \"bedrock:ListAsyncInvoke\"\n      ],\n      \"Resource\": [\n        \"arn:aws:bedrock:us-east-1::foundation-model\/amazon.nova-2-multimodal-embeddings-v1:0\",\n        \"arn:aws:bedrock:us-east-1::foundation-model\/us.amazon.nova-pro-v1:0\"\n      ]\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"es:ESHttpPost\",\n        \"es:ESHttpPut\",\n        \"es:ESHttpGet\"\n      ],\n      \"Resource\": \"arn:aws:es:us-east-1:ACCOUNT_ID:domain\/DOMAIN_NAME\/*\"\n    },\n    {\n      \"Effect\": \"Allow\",\n      \"Action\": [\n        \"s3:GetObject\",\n        \"s3:PutObject\"\n      ],\n      \"Resource\": [\n        \"arn:aws:s3:::amzn-s3-demo-video-bucket\/*\",\n        \"arn:aws:s3:::amzn-s3-demo-embedding-bucket\/*\"\n      ]\n    }\n  ]\n}\n<\/code><\/pre>\n<h3>Step 2: Set up OpenSearch Service indexes<\/h3>\n<p>Create two <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/createupdateindex.html\" target=\"_blank\" rel=\"noopener noreferrer\">OpenSearch Service indexes<\/a>: one for vector embeddings (<a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/knn.html\" target=\"_blank\" rel=\"noopener noreferrer\">k-NN<\/a>) and one for text metadata. This architecture supports semantic search and hybrid queries.<\/p>\n<pre><code class=\"lang-python\">from opensearchpy import OpenSearch, RequestsHttpConnection\nfrom requests_aws4auth import AWS4Auth\nimport boto3\n\nsession = boto3.Session()\ncredentials = session.get_credentials()\nawsauth = AWS4Auth(\n    credentials.access_key,\n    credentials.secret_key,\n    session.region_name,\n    'es',\n    session_token=credentials.token\n)\n\nopensearch_client = OpenSearch(\n    hosts=[{'host': 'YOUR_OPENSEARCH_ENDPOINT', 'port': 443}],\n    http_auth=awsauth,\n    use_ssl=True,\n    verify_certs=True,\n    connection_class=RequestsHttpConnection\n)\n\n# Create k-Nearest Neighbors (k-NN) index for embeddings\nknn_index_body = {\n    \"settings\": {\n        \"index.knn\": True,\n        \"number_of_shards\": 2,\n        \"number_of_replicas\": 1\n    },\n    \"mappings\": {\n        \"properties\": {\n            \"video_id\": {\"type\": \"keyword\"},\n            \"segment_index\": {\"type\": \"integer\"},\n            \"timestamp\": {\"type\": \"float\"},\n            \"embedding\": {\n                \"type\": \"knn_vector\",\n                \"dimension\": 1024,\n                \"method\": {\n                    \"name\": \"hnsw\",\n                    \"space_type\": \"cosinesimilarity\",\n                    \"engine\": \"faiss\"\n                }\n            },\n            \"s3_uri\": {\"type\": \"keyword\"}\n        }\n    }\n}\n\nopensearch_client.indices.create(\n    index=\"video-embeddings-knn\",\n    body=knn_index_body\n)\n\n# Create text index for metadata\ntext_index_body = {\n    \"settings\": {\n        \"number_of_shards\": 2,\n        \"number_of_replicas\": 1\n    },\n    \"mappings\": {\n        \"properties\": {\n            \"video_id\": {\"type\": \"keyword\"},\n            \"segment_index\": {\"type\": \"integer\"},\n            \"tags\": {\"type\": \"text\", \"analyzer\": \"standard\"}\n        }\n    }\n}\n\nopensearch_client.indices.create(\n    index=\"video-embeddings-text\",\n    body=text_index_body\n)<\/code><\/pre>\n<h3>Step 3: Process videos with Nova Multimodal Embeddings<\/h3>\n<p>The <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/bedrock\/latest\/APIReference\/API_runtime_StartAsyncInvoke.html\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Bedrock async API<\/a> processes videos and generates embeddings. It segments videos into 15-second chunks and combines audio and visual information.<\/p>\n<pre><code class=\"lang-python\">import boto3\nimport json\nimport time\n\nbedrock = boto3.client('bedrock-runtime', region_name=\"us-east-1\")\n\ndef generate_video_embeddings(video_s3_uri, output_s3_uri):\n    \"\"\"Generate embeddings for a video using Nova MME async API.\"\"\"\n    \n    # Start async job\n    response = bedrock.start_async_invoke(\n        modelId=\"amazon.nova-2-multimodal-embeddings-v1:0\",\n        modelInput={\n            \"taskType\": \"SEGMENTED_EMBEDDING\",\n            \"segmentedEmbeddingParams\": {\n                \"embeddingPurpose\": \"GENERIC_INDEX\",\n                \"embeddingDimension\": 1024,\n                \"video\": {\n                    \"format\": \"mp4\",\n                    \"embeddingMode\": \"AUDIO_VIDEO_COMBINED\",\n                    \"source\": {\"s3Location\": {\"uri\": video_s3_uri}},\n                    \"segmentationConfig\": {\"durationSeconds\": 15}\n                }\n            }\n        },\n        outputDataConfig={\"s3OutputDataConfig\": {\"s3Uri\": output_s3_uri}}\n    )\n    \n    # Poll for completion\n    invocation_arn = response[\"invocationArn\"]\n    while True:\n        job = bedrock.get_async_invoke(invocationArn=invocation_arn)\n        if job[\"status\"] == \"Completed\":\n            return read_embeddings_from_s3(job[\"outputDataConfig\"][\"s3OutputDataConfig\"][\"s3Uri\"])\n        elif job[\"status\"] in [\"Failed\", \"Expired\"]:\n            raise RuntimeError(f\"Job failed: {job.get('failureMessage')}\")\n        time.sleep(10)\n\ndef manage_concurrent_jobs(bedrock_client, video_queue, max_concurrent=30):\n    \"\"\"Manage 30 concurrent async jobs within quota limits.\"\"\"\n    active_jobs = {}\n    \n    while video_queue or active_jobs:\n        # Submit new jobs up to limit (uses same start_async_invoke call as above)\n        while len(active_jobs) <\/code><\/pre>\n<h3>Step 4: Generate metadata tags with Nova Pro or Nova Lite<\/h3>\n<p>Generate descriptive tags for videos using Nova Pro (or <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/ai\/responsible-ai\/nova-2-lite\/overview.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nova Lite<\/a> for better accuracy at lower cost) to enable hybrid search that combines semantic and keyword matching.<\/p>\n<pre><code class=\"lang-python\">VALID_TAGS = [\n    \"person\", \"vehicle\", \"animal\", \"building\", \"nature\", \"indoor\", \"outdoor\",\n    \"walking\", \"running\", \"sitting\", \"standing\", \"talking\", \"driving\",\n    \"day\", \"night\", \"sunny\", \"cloudy\", \"urban\", \"rural\", \"beach\", \"forest\",\n    \"sports\", \"music\", \"food\", \"technology\", \"crowd\", \"solo\"\n]\n\ndef generate_tags(video_s3_uri, sample_frame_count=3):\n    \"\"\"Generate descriptive tags using Nova Pro or Nova Lite.\"\"\"\n    \n    prompt = f\"\"\"Analyze this video and select 10-15 tags from this predefined list that best describe the content:\n{', '.join(VALID_TAGS)}\n\nOnly return tags from this list as a comma-separated list. Do not invent new tags.\"\"\"\n    \n    response = bedrock.converse(\n        modelId=\"us.amazon.nova-pro-v1:0\",  # Or use us.amazon.nova-2-lite-v1:0\n        messages=[{\n            \"role\": \"user\",\n            \"content\": [{\n                \"video\": {\n                    \"format\": \"mp4\",\n                    \"source\": {\"s3Location\": {\"uri\": video_s3_uri}}\n                }\n            }, {\n                \"text\": prompt\n            }]\n        }]\n    )\n    \n    # Parse tags from response and validate against taxonomy\n    tags_text = response['output']['message']['content'][0]['text']\n    tags = [tag.strip().lower() for tag in tags_text.split(',')]\n    \n    # Filter to only valid tags from our taxonomy\n    valid_tags = [tag for tag in tags if tag in VALID_TAGS]\n    \n    return valid_tags\n<\/code><\/pre>\n<h3>Step 5: Index embeddings and tags in OpenSearch Service<\/h3>\n<p>Store the generated embeddings and tags in OpenSearch Service using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/indexing.html\" target=\"_blank\" rel=\"noopener noreferrer\">bulk indexing<\/a> for efficiency.<\/p>\n<pre><code class=\"lang-python\">from opensearchpy import helpers\n\ndef index_video_data(video_id, s3_uri, embeddings, tags):\n    \"\"\"Index embeddings and tags in OpenSearch.\"\"\"\n    \n    # Prepare bulk actions for k-NN index\n    knn_actions = []\n    for idx, emb in enumerate(embeddings):\n        doc_id = f\"{video_id}_{idx}\"\n        knn_actions.append({\n            \"_index\": \"video-embeddings-knn\",\n            \"_id\": doc_id,\n            \"_source\": {\n                \"video_id\": video_id,\n                \"segment_index\": idx,\n                \"timestamp\": emb['start_time'],\n                \"embedding\": emb['embedding'],\n                \"s3_uri\": s3_uri\n            }\n        })\n    \n    # Bulk index embeddings\n    helpers.bulk(opensearch_client, knn_actions)\n    \n    # Prepare bulk actions for text index\n    text_actions = []\n    for idx in range(len(embeddings)):\n        doc_id = f\"{video_id}_{idx}\"\n        text_actions.append({\n            \"_index\": \"video-embeddings-text\",\n            \"_id\": doc_id,\n            \"_source\": {\n                \"video_id\": video_id,\n                \"segment_index\": idx,\n                \"tags\": \" \".join(tags)\n            }\n        })\n    \n    # Bulk index tags\n    helpers.bulk(opensearch_client, text_actions)\n    \n    print(f\"Indexed {len(embeddings)} segments for video {video_id}\")\n<\/code><\/pre>\n<h3>Step 6: Implement search functionality<\/h3>\n<p>After ingestion completes, search the indexed videos three ways. The implementation targets low-latency queries.<\/p>\n<h4>Initialize OpenSearch Service client for search<\/h4>\n<p>First, create the OpenSearch Service client for search operations:<\/p>\n<pre><code class=\"lang-python\">from opensearchpy import OpenSearch, RequestsHttpConnection\nfrom requests_aws4auth import AWS4Auth\nimport boto3\n\ndef create_opensearch_client():\n    \"\"\"Create OpenSearch client with AWS authentication.\"\"\"\n    session = boto3.Session(region_name=\"us-east-1\")\n    credentials = session.get_credentials()\n    awsauth = AWS4Auth(\n        credentials.access_key,\n        credentials.secret_key,\n        'us-east-1',\n        'es',\n        session_token=credentials.token\n    )\n    \n    return OpenSearch(\n        hosts=[{'host': 'YOUR_OPENSEARCH_ENDPOINT', 'port': 443}],\n        http_auth=awsauth,\n        use_ssl=True,\n        verify_certs=True,\n        connection_class=RequestsHttpConnection,\n        timeout=30\n    )\n\n# Create client\nopensearch_client = create_opensearch_client()\n<\/code><\/pre>\n<h4>Text-to-video semantic search<\/h4>\n<p>Convert natural language queries to embeddings using the sync API, then perform a k-NN similarity search:<\/p>\n<pre><code class=\"lang-python\">def search_text_to_video(query_text, opensearch_client, k=10):\n    \"\"\"Search videos using natural language query converted to embedding.\"\"\"\n    \n    bedrock_client = boto3.client('bedrock-runtime', region_name=\"us-east-1\")\n    \n    # Use SINGLE_EMBEDDING task type for text-to-embedding conversion\n    # VIDEO_RETRIEVAL purpose optimizes embeddings for searching video content\n    request_body = {\n        \"taskType\": \"SINGLE_EMBEDDING\",\n        \"singleEmbeddingParams\": {\n            \"embeddingPurpose\": \"VIDEO_RETRIEVAL\",\n            \"embeddingDimension\": 1024,\n            \"text\": {\n                \"truncationMode\": \"END\",\n                \"value\": query_text\n            }\n        }\n    }\n    \n    response = bedrock_client.invoke_model(\n        modelId='amazon.nova-2-multimodal-embeddings-v1:0',\n        body=json.dumps(request_body),\n        accept=\"application\/json\",\n        contentType=\"application\/json\"\n    )\n    \n    response_body = json.loads(response['body'].read())\n    # Response structure: {\"embeddings\": [{\"embeddingType\": \"TEXT\", \"embedding\": [...]}]}\n    query_embedding = response_body['embeddings'][0]['embedding']\n    \n    # Perform k-NN search against video embeddings\n    search_body = {\n        \"query\": {\n            \"knn\": {\n                \"embedding\": {\n                    \"vector\": query_embedding,\n                    \"k\": k\n                }\n            }\n        },\n        \"size\": k,\n        \"_source\": [\"video_id\", \"segment_index\", \"timestamp\", \"s3_uri\"]\n    }\n    \n    response = opensearch_client.search(\n        index=\"video-embeddings-knn\",\n        body=search_body\n    )\n    \n    # Extract results\n    return [{'score': hit['_score'], \n             'video_id': hit['_source']['video_id'],\n             'segment_index': hit['_source']['segment_index'],\n             'timestamp': hit['_source'].get('timestamp', 0)} \n            for hit in response['hits']['hits']]\n<\/code><\/pre>\n<h4>Text search with BM25 (keyword matching)<\/h4>\n<p>Use the OpenSearch BM25 scoring for keyword matching on tags without generating embeddings:<\/p>\n<pre><code class=\"lang-python\">def search_text_bm25(search_term, opensearch_client, k=10):\n    \"\"\"Search videos using BM25 keyword matching on tags field.\"\"\"\n    \n    # Search text index using match query on tags\n    search_body = {\n        \"query\": {\n            \"match\": {\n                \"tags\": search_term\n            }\n        },\n        \"size\": k,\n        \"_source\": [\"video_id\", \"segment_index\", \"tags\"]\n    }\n    \n    response = opensearch_client.search(\n        index=\"video-embeddings-text\",\n        body=search_body\n    )\n    \n    return response['hits']['hits']  # Extract results (same pattern as above)\n<\/code><\/pre>\n<h4>Video-to-video search<\/h4>\n<p>Retrieve an existing video\u2019s embedding from OpenSearch Service and search for similar content\u2014no Amazon Bedrock API call needed:<\/p>\n<pre><code class=\"lang-python\">def search_video_to_video(query_video_id, query_segment_index, opensearch_client, k=10):\n    \"\"\"Find similar videos using a reference video segment.\"\"\"\n    \n    # Get the embedding from the reference video segment\n    sample_query = {\n        \"query\": {\n            \"bool\": {\n                \"must\": [\n                    {\"term\": {\"video_id\": query_video_id}},\n                    {\"term\": {\"segment_index\": query_segment_index}}\n                ]\n            }\n        },\n        \"_source\": [\"video_id\", \"segment_index\", \"embedding\"]\n    }\n    \n    sample_response = opensearch_client.search(\n        index=\"video-embeddings-knn\",\n        body=sample_query\n    )\n    \n    if not sample_response['hits']['hits']:\n        return []\n    \n    sample_doc = sample_response['hits']['hits'][0]['_source']\n    query_embedding = sample_doc.get('embedding')\n    \n    # Perform k-NN search with the embedding\n    search_body = {\n        \"query\": {\n            \"knn\": {\n                \"embedding\": {\n                    \"vector\": query_embedding,\n                    \"k\": k\n                }\n            }\n        },\n        \"size\": k,\n        \"_source\": [\"video_id\", \"segment_index\", \"timestamp\"]\n    }\n    \n    response = opensearch_client.search(\n        index=\"video-embeddings-knn\",\n        body=search_body\n    )\n    \n    return response['hits']['hits']  # Extract results as needed\n<\/code><\/pre>\n<h4>Hybrid search<\/h4>\n<p>Combine semantic k-NN and BM25 keyword matching by retrieving results from both indexes and merging with weighted scoring:<\/p>\n<pre><code class=\"lang-python\">def search_hybrid(query_text, opensearch_client, k=10, vector_weight=0.7, text_weight=0.3):\n    \"\"\"Hybrid search combining k-NN semantic search and BM25 text matching.\"\"\"\n    \n    # Generate query embedding (use same code as search_text_to_video above)\n    query_embedding = generate_query_embedding(query_text)  # See text-to-video example\n    \n    # Get k-NN results (same query as search_text_to_video)\n    knn_response = opensearch_client.search(\n        index=\"video-embeddings-knn\",\n        body={\"query\": {\"knn\": {\"embedding\": {\"vector\": query_embedding, \"k\": 20}}}, \"size\": 20}\n    )\n    \n    # Get BM25 text results (same query as search_text_bm25)\n    text_response = opensearch_client.search(\n        index=\"video-embeddings-text\",\n        body={\"query\": {\"match\": {\"tags\": query_text}}, \"size\": 20}\n    )\n    \n    # Combine results with weighted scoring\n    knn_hits = knn_response['hits']['hits']\n    text_hits = text_response['hits']['hits']\n    \n    combined = {}\n    \n    for hit in knn_hits:\n        vid = hit['_source']['video_id']\n        seg = hit['_source']['segment_index']\n        key = f\"{vid}_{seg}\"\n        combined[key] = {\n            'video_id': vid,\n            'segment_index': seg,\n            'tags': hit['_source'].get('tags', ''),\n            'vector_score': hit['_score'],\n            'text_score': 0,\n            'combined_score': hit['_score'] * vector_weight\n        }\n    \n    for hit in text_hits:\n        vid = hit['_source']['video_id']\n        seg = hit['_source']['segment_index']\n        key = f\"{vid}_{seg}\"\n        if key in combined:\n            combined[key]['text_score'] = hit['_score']\n            combined[key]['combined_score'] += hit['_score'] * text_weight\n        else:\n            combined[key] = {\n                'video_id': vid,\n                'segment_index': seg,\n                'tags': hit['_source'].get('tags', ''),\n                'vector_score': 0,\n                'text_score': hit['_score'],\n                'combined_score': hit['_score'] * text_weight\n            }\n    \n    # Sort by combined score and return top k\n    sorted_results = sorted(combined.values(), key=lambda x: x['combined_score'], reverse=True)[:k]\n    \n    return sorted_results\n\n# Usage example - search with natural language query\nquery = \"person walking on beach at sunset\"\nhybrid_results = search_hybrid(query, opensearch_client, k=10)\n\nfor r in hybrid_results:\n    print(f\"Combined: {r['combined_score']:.4f} (Vector: {r['vector_score']:.4f}, Text: {r['text_score']:.4f})\")\n    print(f\"  Video: {r['video_id']}, Segment: {r['segment_index']}\")\n    print(f\"  Tags: {r['tags']}\\n\")\n<\/code><\/pre>\n<h3>Search performance at scale<\/h3>\n<p>After indexing all 792,218 videos, we measured search performance across all three methods.<\/p>\n<p><strong>The measured query latencies at 792,218 videos are as follows:<\/strong><\/p>\n<ul>\n<li>Semantic k-NN search: ~76ms (using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/choose-the-k-nn-algorithm-for-your-billion-scale-use-case-with-opensearch\/\" target=\"_blank\" rel=\"noopener noreferrer\">HNSW<\/a> logarithmic scaling)<\/li>\n<li><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.opensearch.org\/latest\/im-plugin\/similarity\/\" target=\"_blank\" rel=\"noopener noreferrer\">BM25<\/a> text search: ~30ms<\/li>\n<li>Hybrid search: ~106ms<\/li>\n<\/ul>\n<p>After indexing and storing all 792,218 videos and generating embeddings, the storage requirements are as follows:<\/p>\n<ul>\n<li>k-NN index: 28.8 GB for 792K videos<\/li>\n<li>Text index: 1.0 GB for 792K videos<\/li>\n<li>Total: 29.8 GB (manageable on modern OpenSearch clusters)<\/li>\n<\/ul>\n<p>The Hierarchical Navigable Small World (HNSW) algorithm used for k-NN search provides logarithmic time complexity, which means search times grow slowly as the dataset increases. All three search methods maintain sub-200 ms response times even at 792K video scale, meeting production requirements for interactive search applications.<\/p>\n<h2>Things to know<\/h2>\n<h3>Performance and cost considerations<\/h3>\n<p>Video processing time depends on video length. In our testing, a 45-second video took approximately 70 seconds to process using the async API. The processing includes automatic segmentation, embedding generation for each segment, and output to Amazon S3. Search operations scale efficiently\u2014our testing shows that even at 792K videos, semantic search completes in under 80 ms, text search in under 30 ms, and hybrid search in under 11 0ms.Use 1024-dimensional embeddings instead of 3072 to reduce storage costs while maintaining accuracy. Nova Multimodal Embeddings charges per second of video input ($0.00056\/second batch), so video duration\u2014not embedding dimension or segmentation\u2014determines processing cost. The async API is more cost-effective than processing frames individually. For OpenSearch Service, using r6g instances provides better price-performance than earlier instance types, and you can implement tiering to move cold data to Amazon S3 for additional savings.<\/p>\n<h3>Scaling to production<\/h3>\n<p>For production deployments with large video libraries, consider using <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/batch\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Batch<\/a> to process videos in parallel across multiple compute instances. You can partition your video dataset and assign subsets to different workers. Monitor <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/managedomains-cloudwatchmetrics.html\" target=\"_blank\" rel=\"noopener noreferrer\">OpenSearch Service cluster health<\/a> and scale data nodes as your index grows. The two-index architecture scales well because k-NN and text searches can be optimized independently.<\/p>\n<h3>Search accuracy tuning<\/h3>\n<p>Tune hybrid search weights based on your use case. The default 0.7\/0.3 split (vector\/text) favors semantic similarity for most scenarios. If you have high-quality metadata tags, increasing the text weight to 0.5 can improve results. We recommend that you test different configurations with your specific content to find a balance.<\/p>\n<h2>Cleanup<\/h2>\n<p>To avoid ongoing charges, delete the resources that you created:<\/p>\n<ol>\n<li>Delete the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/opensearch-service\/latest\/developerguide\/delete-domain.html\" target=\"_blank\" rel=\"noopener noreferrer\">OpenSearch Service domain<\/a> from the Amazon OpenSearch Service console<\/li>\n<li>Empty and delete the <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/userguide\/delete-bucket.html\" target=\"_blank\" rel=\"noopener noreferrer\">S3 buckets<\/a> used for videos and embeddings<\/li>\n<li>Delete any <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/id_roles_manage_delete.html\" target=\"_blank\" rel=\"noopener noreferrer\">IAM roles<\/a> created specifically for this solution<\/li>\n<\/ol>\n<p>Note that Amazon Bedrock charges are based on usage, so no cleanup is needed for the Amazon Bedrock models themselves.<\/p>\n<h2>Conclusion<\/h2>\n<p>This walkthrough covered building a multimodal video search system for natural language queries across video content. The solution uses Amazon Bedrock Nova models to generate embeddings. These embeddings capture both audio and visual information, stores them efficiently in OpenSearch Service using a two-index architecture, and provides three search modes for different use cases.The async processing approach scales to handle large video libraries, and the hybrid search capability combines semantic and keyword-based matching for maximum accuracy. You can extend this foundation by adding features like video-to-video similarity search, implementing caching for frequently searched queries, or integrating with AWS Batch for parallel processing of large datasets.<\/p>\n<p>To learn more about the technologies used in this solution, see <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/aws\/amazon-nova-multimodal-embeddings-now-available-in-amazon-bedrock\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon Nova Multimodal Embeddings<\/a> and <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/hybrid-search-with-amazon-opensearch-service\/\" target=\"_blank\" rel=\"noopener noreferrer\">Hybrid Search with Amazon OpenSearch Service<\/a>.<\/p>\n<hr\/>\n<h2>About the authors<\/h2>\n<footer>\n<div class=\"blog-author-box\">\n<div class=\"blog-author-image\">\n          <img decoding=\"async\" loading=\"lazy\" class=\"alignnone wp-image-125849 size-thumbnail\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2026\/03\/09\/l-ml-200123-100x133.png\" alt=\"\" width=\"100\" height=\"133\"\/>\n         <\/div>\n<h3 class=\"lb-h4\">Hammad Ausaf<\/h3>\n<p>Hammad is a Principal Solutions Architect in Media and Entertainment. He is a passionate builder and strives to provide the best solutions to AWS customers.<\/p>\n<\/p><\/div>\n<div class=\"blog-author-box\">\n<div class=\"blog-author-image\">\n          <img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-thumbnail wp-image-125850\" src=\"https:\/\/d2908q01vomqb2.cloudfront.net\/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59\/2026\/03\/09\/l-ml-200124-100x90.png\" alt=\"\" width=\"100\" height=\"90\"\/>\n         <\/div>\n<h3 class=\"lb-h4\">Rajat Jain<\/h3>\n<p>Rajat is a Technical Account Manager in Media and Entertainment. He is a GenAI\/ML enthusiast and loves to build new solutions.<\/p>\n<\/p><\/div>\n<\/footer>\n<p>       <!-- '\"` -->\n      <\/div>\n<p><em> \u2018 The preceding article may include information circulated by third parties \u2019 <\/em><\/p>\n<p><em> \u2018 Some details of this article were extracted from the following source aws.amazon.com \u2019 <\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post shows you how to build a scalable multimodal video search system that enables natural language search across large video datasets using Amazon Nova models and Amazon OpenSearch Service. You will learn how to move beyond manual tagging and keyword-based searches to enable semantic search that captures the full richness of video content. We [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2324955,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_jetpack_memberships_contains_paid_content":false,"jnews-multi-image_gallery":[],"jnews_single_post":[],"jnews_primary_category":[],"jnews_social_meta":[],"footnotes":""},"categories":[25172],"tags":[],"class_list":["post-2324954","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-entertainment"],"jetpack_featured_media_url":"https:\/\/celebrity.land\/en\/wp-content\/uploads\/2026\/03\/Multimodal-embeddings-at-scale-AI-data-lake-for-media-and.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/posts\/2324954","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/comments?post=2324954"}],"version-history":[{"count":1,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/posts\/2324954\/revisions"}],"predecessor-version":[{"id":2324956,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/posts\/2324954\/revisions\/2324956"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/media\/2324955"}],"wp:attachment":[{"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/media?parent=2324954"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/categories?post=2324954"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/celebrity.land\/en\/wp-json\/wp\/v2\/tags?post=2324954"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}