ドキュメントレイアウトスキル

Document Layoutスキルは、Foundry ToolsのAzureドキュメントインテリジェンスのlayoutモデルを用いて文書を分析し、その構造や特徴を検出し、Markdownやテキスト形式での構文的表現を作成します。このスキルはテキストと画像の抽出に対応しており、画像は文書内の位置を保持する位置メタデータを含みます。画像が関連コンテンツに近接することは、検索拡張生成(RAG)やマルチモーダル検索シナリオにおいて有益です。

1日あたり1人あたり20件を超えるトランザクションの場合、このスキルは請求可能なMicrosoft Foundryリソースをスキルセットに付与する必要があります。組み込みスキルの実行は、既存の Foundry Tools標準価格で請求されます。

この記事はドキュメントレイアウトスキルの参考ドキュメントです。使用情報については「ドキュメントレイアウトごとにチャンクとベクトル化する方法」をご覧ください。

Tip

このスキルは、PDFのような構成や画像のあるコンテンツでよく使われます。マルチモーダルのチュートリアルでは、2つの異なるデータチャンク戦略を用いた画像言語化を実演します。

Limitations

このスキルには以下の制限があります:

このスキルは、Azure Document Intelligenceレイアウトモデルで5分以上の処理が必要な大規模な文書には適していません。スキルはタイムアウトしますが、請求目的でスキルセットに紐づけられているファウンドリーリソースには請求が適用されます。書類が処理制限内に収まるよう最適化し、不要なコストを避けましょう。
このスキルはAzureドキュメントインテリジェンスレイアウトモデルを呼び出しているため、異なるドキュメントタイプ<0>サービス振る舞いが出力に適用されます。例えば、Word(DOCX)ファイルとPDFファイルは、画像の扱い方の違いにより異なる結果を生むことがあります。 DOCXとPDFで一貫した画像動作が必要な場合は、ドキュメントをPDFに変換するか、マルチモーダル検索ドキュメントをレビューして代替方法を検討してください。

Supported regions

ドキュメントレイアウトスキルは、Azure Document Intelligence REST APIのv4.0(2024-11-30)を呼び出します。

対応地域はモダリティやスキルがAzureドキュメントインテリジェンスのレイアウトモデルとどのように連携するかによって異なります。現在、実装されたレイアウトモデルは 21Vianet リージョンをサポートしていません。

Approach	Requirement
インポートデータウィザード	以下の地域のいずれかで、Azure AI 検索サービスとAzure AIマルチサービスアカウントを作成してください:東米、西ヨーロッパ2、または北中部アメリカのいずれかです。
請求にMicrosoft Foundryリソースキーを使ったプログラム的	同じ地域内でAzure AI 検索サービスとMicrosoft Foundryリソースを作成しましょう。地域はAzure AI 検索とAzure文書インテリジェンスの両方を支持しなければなりません。
プログラム的で、請求にMicrosoft Entra ID認証(プレビュー)を使った	同じ地域での要件はありません。各サービスがある地域で、Azure AI 検索サービスとFoundryリソースMicrosoft作成してください。

対応ファイル形式

このスキルは以下のファイル形式を認識します:

.PDF
.JPEG
.JPG
.PNG
.BMP
.TIFF
.DOCX
.XLSX
.PPTX
.HTML

Supported languages

印刷されたテキストについては、Azureドキュメントインテリジェンスレイアウトモデル対応言語を参照してください。

@odata.type

Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill

Data limits

PDF および TIFF の場合、最大 2,000 ページを処理できます (Free レベルのサブスクリプションでは、最初の 2 ページのみが処理されます)。
文書分析用のファイルサイズがAzure文書インテリジェンス有料(S0)ティアで500MB、Azureドキュメントインテリジェンス無料(F0)ティアで4MBであっても、インデックス作成は検索サービスティアのインデクサー制限の対象となります。
画像の寸法は50ピクセル×50ピクセルまたは10,000ピクセル×10,000ピクセルの間でなければなりません。
PDFがパスワードロックされている場合は、インデクサーを起動する前にロックを解除してください。

Skill parameters

パラメータは大文字・小文字を区別します。

Parameter name	Allowed values	Description
`outputMode`	`oneToMany`	スキルによって生み出される出力の濃度を制御します。
`markdownHeaderDepth`	`h1`、 `h2`、 `h3`、 `h4`、 `h5`、 `h6` (デフォルト)	`outputFormat`が`markdown`に設定されている場合のみ適用されます。このパラメータは考慮すべき最も深い入れ子レベルを表します。例えば、 `markdownHeaderDepth` が `h3`の場合、 `h4`のようにより深い部分は `h3`に巻き込まれます。
`outputFormat`	`markdown` (デフォルト)、 `text`	スキルによって生成される出力のフォーマットを制御します。
`extractionOptions`	`["images"]`、 `["images", "locationMetadata"]`、 `["locationMetadata"]`	文書から抽出された余分な内容を特定してください。出力に含まれる内容に対応する列挙式を定義します。例えば、 `extractionOptions` が `["images", "locationMetadata"]`されている場合、出力には画像や位置メタデータが含まれ、ページ番号やセクションなど、コンテンツが抽出された場所に関連するページ位置情報を提供します。このパラメータは両方の出力フォーマットに適用されます。
`chunkingProperties`	以下の表を参照してください。	`outputFormat`が`text`に設定されている場合のみ適用されます。テキストコンテンツをチャンク化しつつ、他のメタデータを再計算する方法をカプセル化するオプション。

`chunkingProperties` パラメータ	Allowed values	Description
`unit`	`characters`	チャンク単位の濃度を制御します。チャンクの長さは単語やトークンではなく文字数で測定されます。
`maximumLength`	300から50000の整数です。	String.Lengthで測定される文字単位の最大チャンク長。
`overlapLength`	`maximumLength`の半分以下の整数です。	2つのテキストチャンク間の重複の長さ。

Skill inputs

Input name	Description
`file_data`	その内容を抽出すべきファイルです。

「file_data」入力は次のように定義されるオブジェクトでなければなりません。

{
  "$type": "file",
  "data": "BASE64 encoded string of the file"
}

あるいは、次のように定義することもできます:

{
  "$type": "file",
  "url": "URL to download file",
  "sasToken": "OPTIONAL: SAS token for authentication if the URL provided is for a file in blob storage"
}

ファイル参照オブジェクトは、以下のいずれかの方法で生成できます:

インデクサーの定義で allowSkillsetToReadFileData パラメータをtrueに設定します。この設定は、Blobデータソースからダウンロードした元のファイルデータを表すオブジェクトのパス /document/file_data を作成します。このパラメータはAzure Blobストレージ内のファイルにのみ適用されます。
カスタムスキルでJSONオブジェクト定義を返し、 $type、 data、 url 、 sastokenを提供します。 $typeパラメータはfileに設定され、dataファイル内容の64バイト配列のベースでなければなりません。 urlパラメータは、その場所でファイルをダウンロードできる有効なURLでなければなりません。

Skill outputs

Output name	Description
`markdown_document`	`outputFormat`が`markdown`に設定されている場合のみ適用されます。 Markdownドキュメント内の各セクションを表す「セクション」オブジェクトの集合です。
`text_sections`	`outputFormat`が`text`に設定されている場合のみ適用されます。テキストチャンクオブジェクトの集合で、ページの範囲内(チャンク構成を含む)内のテキストを表し、セクションヘッダー自体も含みます。テキストチャンクオブジェクトには該当する場合は `locationMetadata` も含まれます。
`normalized_images`	`outputFormat`が`text`に設定されていて、`extractionOptions`が`images`を含む場合にのみ適用されます。文書から抽出された画像のコレクションで、該当する場合は `locationMetadata` も含まれます。

マークダウン出力モードのサンプル定義

{
  "skills": [
    {
      "description": "Analyze a document",
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "context": "/document",
      "outputMode": "oneToMany", 
      "markdownHeaderDepth": "h3", 
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        {
          "name": "markdown_document", 
          "targetName": "markdown_document" 
        }
      ]
    }
  ]
}

マークダウン出力モードのサンプル出力

{
  "markdown_document": [
    { 
      "content": "Hi this is Jim \r\nHi this is Joe", 
      "sections": { 
        "h1": "Foo", 
        "h2": "Bar", 
        "h3": "" 
      },
      "ordinal_position": 0
    }, 
    { 
      "content": "Hi this is Lance",
      "sections": { 
         "h1": "Foo", 
         "h2": "Bar", 
         "h3": "Boo" 
      },
      "ordinal_position": 1,
    } 
  ] 
}

markdownHeaderDepthの値は「セクション」辞書内のキー数を制御します。スキルの定義例では、 markdownHeaderDepth が「h3」であるため、「sections」辞書には3つのキーがあります:h1、h2、h3。

テキスト出力モードおよび画像およびメタデータ抽出の例

この例は、テキストコンテンツを固定サイズのチャンクで出力し、画像や位置メタデータを文書から抽出する方法を示しています。

テキスト出力モードおよび画像およびメタデータ抽出のためのサンプル定義

{
  "skills": [
    {
      "description": "Analyze a document",
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": ["images", "locationMetadata"],
      "chunkingProperties": {     
          "unit": "characters",
          "maximumLength": 2000, 
          "overlapLength": 200
      },
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data"
        }
      ],
      "outputs": [
        { 
          "name": "text_sections", 
          "targetName": "text_sections" 
        }, 
        { 
          "name": "normalized_images", 
          "targetName": "normalized_images" 
        } 
      ]
    }
  ]
}

テキスト出力モードおよび画像・メタデータ抽出のためのサンプル出力

{
  "text_sections": [
      {
        "id": "1_7e6ef1f0-d2c0-479c-b11c-5d3c0fc88f56",
        "content": "the effects of analyzers using Analyze Text (REST). For more information about analyzers, see Analyzers for text processing.During indexing, an indexer only checks field names and types. There's no validation step that ensures incoming content is correct for the corresponding search field in the index.Create an indexerWhen you're ready to create an indexer on a remote search service, you need a search client. A search client can be the Azure portal, a REST client, or code that instantiates an indexer client. We recommend the Azure portal or REST APIs for early development and proof-of-concept testing.Azure portal1. Sign in to the Azure portal 2, then find your search service.2. On the search service Overview page, choose from two options:· Import data wizard: The wizard is unique in that it creates all of the required elements. Other approaches require a predefined data source and index.All services > Azure Al services | Al Search >demo-search-svc Search serviceSearchAdd indexImport dataImport and vectorize dataOverviewActivity logEssentialsAccess control (IAM)Get startedPropertiesUsageMonitoring· Add indexer: A visual editor for specifying an indexer definition.",
        "locationMetadata": {
          "pageNumber": 1,
          "ordinalPosition": 0,
          "boundingPolygons": "[[{\"x\":1.5548,\"y\":0.4036},{\"x\":6.9691,\"y\":0.4033},{\"x\":6.9691,\"y\":0.8577},{\"x\":1.5548,\"y\":0.8581}],[{\"x\":1.181,\"y\":1.0627},{\"x\":7.1393,\"y\":1.0626},{\"x\":7.1393,\"y\":1.7363},{\"x\":1.181,\"y\":1.7365}],[{\"x\":1.1923,\"y\":2.1466},{\"x\":3.4585,\"y\":2.1496},{\"x\":3.4582,\"y\":2.4251},{\"x\":1.1919,\"y\":2.4221}],[{\"x\":1.1813,\"y\":2.6518},{\"x\":7.2464,\"y\":2.6375},{\"x\":7.2486,\"y\":3.5913},{\"x\":1.1835,\"y\":3.6056}],[{\"x\":1.3349,\"y\":3.9489},{\"x\":2.1237,\"y\":3.9508},{\"x\":2.1233,\"y\":4.1128},{\"x\":1.3346,\"y\":4.111}],[{\"x\":1.5705,\"y\":4.5322},{\"x\":5.801,\"y\":4.5326},{\"x\":5.801,\"y\":4.7311},{\"x\":1.5704,\"y\":4.7307}]]"
        },
        "sections": []
      },
      {
        "id": "2_25134f52-04c3-415a-ab3d-80729bd58e67",
        "content": "All services > Azure Al services | Al Search >demo-search-svc | Indexers Search serviceSearch0«Add indexerRefreshDelete:selected: TagsFilter by name ...:selected: Diagnose and solve problemsSearch managementStatusNameIndexesIndexers*Data sourcesRun the indexerBy default, an indexer runs immediately when you create it on the search service. You can override this behavior by setting disabled to true in the indexer definition. Indexer execution is the moment of truth where you find out if there are problems with connections, field mappings, or skillset construction.There are several ways to run an indexer:· Run on indexer creation or update (default).. Run on demand when there are no changes to the definition, or precede with reset for full indexing. For more information, see Run or reset indexers.· Schedule indexer processing to invoke execution at regular intervals.Scheduled execution is usually implemented when you have a need for incremental indexing so that you can pick up the latest changes. As such, scheduling has a dependency on change detection.Indexers are one of the few subsystems that make overt outbound calls to other Azure resources. In terms of Azure roles, indexers don't have separate identities; a connection from the search engine to another Azure resource is made using the system or user- assigned managed identity of a search service. If the indexer connects to an Azure resource on a virtual network, you should create a shared private link for that connection. For more information about secure connections, see Security in Azure Al Search.Check results",
        "locationMetadata": {
          "pageNumber": 2,
          "ordinalPosition": 1,
          "boundingPolygons": "[[{\"x\":2.2041,\"y\":0.4109},{\"x\":4.3967,\"y\":0.4131},{\"x\":4.3966,\"y\":0.5505},{\"x\":2.204,\"y\":0.5482}],[{\"x\":2.5042,\"y\":0.6422},{\"x\":4.8539,\"y\":0.6506},{\"x\":4.8527,\"y\":0.993},{\"x\":2.5029,\"y\":0.9845}],[{\"x\":2.3705,\"y\":1.1496},{\"x\":2.6859,\"y\":1.15},{\"x\":2.6858,\"y\":1.2612},{\"x\":2.3704,\"y\":1.2608}],[{\"x\":3.7418,\"y\":1.1709},{\"x\":3.8082,\"y\":1.171},{\"x\":3.8081,\"y\":1.2508},{\"x\":3.7417,\"y\":1.2507}],[{\"x\":3.9692,\"y\":1.1445},{\"x\":4.0541,\"y\":1.1445},{\"x\":4.0542,\"y\":1.2621},{\"x\":3.9692,\"y\":1.2622}],[{\"x\":4.5326,\"y\":1.2263},{\"x\":5.1065,\"y\":1.229},{\"x\":5.106,\"y\":1.346},{\"x\":4.5321,\"y\":1.3433}],[{\"x\":5.5508,\"y\":1.2267},{\"x\":5.8992,\"y\":1.2268},{\"x\":5.8991,\"y\":1.3408},{\"x\":5.5508,\"y\":1.3408}]]"
        },
        "sections": []
       }
    ],
    "normalized_images": [ 
        { 
            "id": "1_550e8400-e29b-41d4-a716-446655440000", 
            "data": "SGVsbG8sIFdvcmxkIQ==", 
            "imagePath": "aHR0cHM6Ly9henNyb2xsaW5nLmJsb2IuY29yZS53aW5kb3dzLm5ldC9tdWx0aW1vZGFsaXR5L0NyZWF0ZUluZGV4ZXJwNnA3LnBkZg2/normalized_images_0.jpg",  
            "locationMetadata": {
              "pageNumber": 1,
              "ordinalPosition": 0,
              "boundingPolygons": "[[{\"x\":2.0834,\"y\":6.2245},{\"x\":7.1818,\"y\":6.2244},{\"x\":7.1816,\"y\":7.9375},{\"x\":2.0831,\"y\":7.9377}]]"
            }
        },
        { 
            "id": "2_123e4567-e89b-12d3-a456-426614174000", 
            "data": "U29tZSBtb3JlIGV4YW1wbGUgdGV4dA==", 
            "imagePath": "aHR0cHM6Ly9henNyb2xsaW5nLmJsb2IuY29yZS53aW5kb3dzLm5ldC9tdWx0aW1vZGFsaXR5L0NyZWF0ZUluZGV4ZXJwNnA3LnBkZg2/normalized_images_1.jpg",  
            "locationMetadata": {
              "pageNumber": 2,
              "ordinalPosition": 1,
              "boundingPolygons": "[[{\"x\":2.0784,\"y\":0.3734},{\"x\":7.1837,\"y\":0.3729},{\"x\":7.183,\"y\":2.8611},{\"x\":2.0775,\"y\":2.8615}]]"
            } 
        }
    ] 
}

上記のサンプル出力の “sections” は空白のように見えることに注意してください。セクションを埋めるためには、セクションが適切に埋められているようにmarkdownoutputFormatを設定した追加のスキルを追加する必要があります。

このスキルはAzureドキュメントインテリジェンスを使って位置メタデータを計算します。ページやバウンディングポリゴン座標の定義方法については、Azure Document Intelligence layout modelを参照してください。

imagePathは保存された画像の相対的な経路を表します。スキルセットでナレッジストアファイルの投影が設定されている場合、このパスはナレッジストアに保存された画像の相対パスと一致します。

フィードバック

このページはお役に立ちましたか?

Last updated on 2026-05-12