- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

HarmonyOS开发：通用文字识别与多语言支持

Jack20 发表于 2026/06/21 12:06:51 2026/06/21

【摘要】 HarmonyOS开发：通用文字识别与多语言支持核心要点：深入掌握HarmonyOS通用文字识别服务在多语言、多场景下的完整应用方案，包括中英日韩等多语种混合识别、文档表格结构化提取、多页文档连续识别、识别结果翻译与语义分析等进阶实战，以及性能优化与多语言场景下的最佳实践。一、背景与动机“这段日文说明书写的什么？”“这个韩文菜单上哪道菜是不辣的？”“这份中英混排的合同里，关键条款在哪一页...

HarmonyOS开发：通用文字识别与多语言支持

核心要点：深入掌握HarmonyOS通用文字识别服务在多语言、多场景下的完整应用方案，包括中英日韩等多语种混合识别、文档表格结构化提取、多页文档连续识别、识别结果翻译与语义分析等进阶实战，以及性能优化与多语言场景下的最佳实践。

一、背景与动机

“这段日文说明书写的什么？”

“这个韩文菜单上哪道菜是不辣的？”

“这份中英混排的合同里，关键条款在哪一页？”

这些问题，都不是身份证OCR或银行卡OCR能解决的。它们需要的是一种更通用的文字识别能力——不限定版式、不限定语言、不限定场景，只要是图片中的文字，就能识别出来。

这就是通用文字识别（General OCR）的使命。

与专项OCR（身份证、银行卡）不同，通用OCR面对的是一个"开放世界"——你不知道用户会拍什么，不知道图片中有什么语言的文字，不知道文字是印刷体还是手写体，不知道是横排还是竖排。这种开放性带来了更大的技术挑战，但也赋予了通用OCR更广泛的应用场景：

翻译辅助：出国旅行时，拍一下路牌、菜单、说明书，即时翻译
文档数字化：把纸质文件、书籍扫描成可编辑的电子文档
信息采集：从海报、传单、名片中提取关键信息
无障碍辅助：帮助视障用户"读出"周围环境中的文字

HarmonyOS的通用文字识别服务支持中、英、日、韩、泰、越等多种语言，并且支持多语言混合识别——一张图片中同时出现中文和英文，也能正确识别。今天这篇文章，我们就把通用OCR在多语言场景下的实战方案彻底讲透。

二、核心原理

2.1 多语言OCR的技术挑战

多语言OCR远比单语言OCR复杂，主要挑战有三个：

挑战一：字符集爆炸

中文有2万+常用汉字，日文有平假名、片假名和汉字，韩文有谚文，泰文有独特的拼字规则……把这些语言的字符集加在一起，模型需要区分的字符类别可能超过10万。这对模型的容量和训练数据都提出了极高要求。

挑战二：语言混排

实际场景中，多语言混排非常普遍。比如一份技术文档，正文是中文，代码和变量名是英文，引用的论文标题可能还有日文。模型必须能够在同一行文字中正确切换语言模式。

挑战三：文字方向多样

中文和英文是横排的（从左到右），传统日文是竖排的（从上到下），蒙文是竖排的（从左到右），阿拉伯文是从右到左的……文字方向的多样性，要求检测算法能够正确判断每个文本块的排列方向。

2.2 多语言OCR处理流程

flowchart TD
    A[多语言图片输入] --> B[图像预处理]
    B --> C[多方向文字检测]
    
    C --> C1[横排文本检测]
    C --> C2[竖排文本检测]
    C --> C3[旋转文本检测]
    
    C1 --> D[语言分类器]
    C2 --> D
    C3 --> D
    
    D --> D1[中文分支]
    D --> D2[英文分支]
    D --> D3[日文分支]
    D --> D4[韩文分支]
    D --> D5[其他语言分支]
    
    D1 --> E[多语言识别引擎]
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    
    E --> F[识别结果融合]
    F --> F1[语言标注]
    F --> F2[置信度评估]
    F --> F3[排版还原]
    
    F1 --> G[结构化输出]
    F2 --> G
    F3 --> G

    classDef primary fill:#4FC3F7,stroke:#0288D1,color:#000
    classDef warning fill:#FFB74D,stroke:#F57C00,color:#000
    classDef error fill:#EF5350,stroke:#C62828,color:#fff
    classDef info fill:#81C784,stroke:#388E3C,color:#000
    classDef purple fill:#CE93D8,stroke:#7B1FA2,color:#000

    class A,B primary
    class C,C1,C2,C3 warning
    class D,D1,D2,D3,D4,D5 purple
    class E,F error
    class F1,F2,F3,G info

2.3 语言分类与识别策略

HarmonyOS的多语言OCR采用了一种"先分类后识别"的策略：

文字检测阶段：不区分语言，只负责找到所有文本区域
语言分类阶段：对每个文本区域做语言判断，输出最可能的语言标签
文字识别阶段：根据语言标签，选择对应的识别分支进行精细识别
结果融合阶段：将不同语言的识别结果合并，保留语言标签和置信度

这种策略的好处是：不同语言的识别模型可以独立优化，不会互相干扰。坏处是：语言分类如果出错，后续识别也会跟着错。不过在实际测试中，语言分类的准确率通常在98%以上，对最终结果的影响很小。

2.4 支持的语言列表

语言	代码	支持版本	备注
中文	zh	API 12+	简体/繁体
英文	en	API 12+	—
日文	ja	API 12+	含汉字/假名
韩文	ko	API 12+	谚文
泰语	th	API 14+	HarmonyOS 6新增
越南语	vi	API 14+	HarmonyOS 6新增
俄语	ru	API 14+	HarmonyOS 6新增

三、代码实战

3.1 多语言通用OCR识别

// 多语言通用OCR识别页面
import { textRecognition } from '@kit.AI.Intelligent';
import { image } from '@kit.ImageKit';
import { picker } from '@kit.CoreFileKit';

// 多语言识别结果
interface MultiLangOcrResult {
  textBlocks: TextBlockWithLang[];
  totalText: string;
  languages: string[];      // 检测到的语言列表
  dominantLanguage: string; // 主要语言
  processingTime: number;
}

interface TextBlockWithLang {
  text: string;
  language: string;     // 语言标签
  confidence: number;
  bounds: Rect;
  direction: TextDirection;
}

enum TextDirection {
  HORIZONTAL = '横排',
  VERTICAL = '竖排',
  ROTATED = '旋转'
}

interface Rect {
  left: number;
  top: number;
  width: number;
  height: number;
}

@Entry
@Component
struct MultiLangOcrPage {
  @State ocrResult: MultiLangOcrResult | null = null;
  @State isProcessing: boolean = false;
  @State imageUri: string = '';
  @State selectedLanguage: string = 'auto';  // auto为自动检测

  // 语言选项
  private languageOptions: Array<{ label: string; value: string }> = [
    { label: '自动检测', value: 'auto' },
    { label: '中文', value: 'zh' },
    { label: '英文', value: 'en' },
    { label: '日文', value: 'ja' },
    { label: '韩文', value: 'ko' },
    { label: '中英混合', value: 'zh_en' },
  ];

  // 选择图片并识别
  async selectAndRecognize(): Promise<void> {
    this.isProcessing = true;
    const startTime = Date.now();

    try {
      // 选择图片
      const photoSelectOptions = new picker.PhotoSelectOptions();
      photoSelectOptions.MIMEType = picker.PhotoViewMIMEType.IMAGE_TYPE;
      photoSelectOptions.maxSelectNumber = 1;

      const photoViewPicker = new picker.PhotoViewPicker();
      const result = await photoViewPicker.select(photoSelectOptions);

      if (result.photoUris.length === 0) {
        this.isProcessing = false;
        return;
      }

      this.imageUri = result.photoUris[0];

      // 创建图片源
      const imageSource = image.createImageSource(this.imageUri);
      const pixelMap = await imageSource.createPixelMap();

      // 创建识别引擎
      const engine = textRecognition.TextRecognitionEngine.create(
        textRecognition.TextRecognitionPreset.PRECISE  // 多语言场景建议用PRECISE
      );

      // 配置识别参数
      const config: textRecognition.TextRecognitionConfig = {
        isTextDetectionEnabled: true,
        isDirectionDetectionEnabled: true,  // 启用方向检测，支持竖排文字
      };

      // 执行识别
      const rawResult = await engine.recognizeText(pixelMap, config);

      // 解析结果
      const textBlocks: TextBlockWithLang[] = [];
      const langSet = new Set<string>();
      const allTexts: string[] = [];

      for (const block of rawResult.textBlocks) {
        // 判断文字方向
        const corners = block.corners;
        let direction = TextDirection.HORIZONTAL;
        if (corners && corners.length >= 4) {
          const width = Math.abs(corners[1].x - corners[0].x);
          const height = Math.abs(corners[3].y - corners[0].y);
          if (height > width * 1.5) {
            direction = TextDirection.VERTICAL;
          }
        }

        // 推断语言（基于字符特征）
        const lang = this.detectLanguage(block.textValue);
        langSet.add(lang);

        textBlocks.push({
          text: block.textValue,
          language: lang,
          confidence: block.confidence,
          bounds: this.extractBounds(block),
          direction: direction
        });

        allTexts.push(block.textValue);
      }

      // 统计主要语言
      const langCount = new Map<string, number>();
      for (const block of textBlocks) {
        langCount.set(block.language, (langCount.get(block.language) || 0) + 1);
      }
      let dominantLang = 'unknown';
      let maxCount = 0;
      langCount.forEach((count, lang) => {
        if (count > maxCount) {
          maxCount = count;
          dominantLang = lang;
        }
      });

      this.ocrResult = {
        textBlocks,
        totalText: allTexts.join('\n'),
        languages: Array.from(langSet),
        dominantLanguage: dominantLang,
        processingTime: Date.now() - startTime
      };

      engine.close();
    } catch (error) {
      console.error(`识别失败: ${JSON.stringify(error)}`);
    } finally {
      this.isProcessing = false;
    }
  }

  // 基于字符特征推断语言
  private detectLanguage(text: string): string {
    if (!text) return 'unknown';

    let cjkCount = 0;    // 中日韩字符
    let latinCount = 0;  // 拉丁字符
    let hiraganaCount = 0; // 平假名
    let katakanaCount = 0; // 片假名
    let hangulCount = 0;   // 韩文

    for (const char of text) {
      const code = char.charCodeAt(0);

      // CJK统一汉字
      if (code >= 0x4E00 && code <= 0x9FFF) cjkCount++;
      // 平假名
      else if (code >= 0x3040 && code <= 0x309F) hiraganaCount++;
      // 片假名
      else if (code >= 0x30A0 && code <= 0x30FF) katakanaCount++;
      // 韩文谚文
      else if (code >= 0xAC00 && code <= 0xD7AF) hangulCount++;
      // 拉丁字符
      else if ((code >= 0x0041 && code <= 0x005A) || (code >= 0x0061 && code <= 0x007A)) latinCount++;
    }

    // 判断逻辑
    if (hangulCount > 0 && hangulCount > cjkCount) return 'ko';
    if (hiraganaCount > 0 || katakanaCount > 0) return 'ja';
    if (cjkCount > 0 && latinCount > 0) return 'zh_en';
    if (cjkCount > 0) return 'zh';
    if (latinCount > 0) return 'en';

    return 'unknown';
  }

  // 提取边界矩形
  private extractBounds(block: textRecognition.TextBlock): Rect {
    const corners = block.corners;
    if (!corners || corners.length < 4) {
      return { left: 0, top: 0, width: 0, height: 0 };
    }
    const xCoords = corners.map(c => c.x);
    const yCoords = corners.map(c => c.y);
    const minX = Math.min(...xCoords);
    const maxX = Math.max(...xCoords);
    const minY = Math.min(...yCoords);
    const maxY = Math.max(...yCoords);
    return { left: minX, top: minY, width: maxX - minX, height: maxY - minY };
  }

  // 语言标签转显示名称
  private langToLabel(lang: string): string {
    const map: Record<string, string> = {
      'zh': '🇨🇳 中文',
      'en': '🇬🇧 英文',
      'ja': '🇯🇵 日文',
      'ko': '🇰🇷 韩文',
      'zh_en': '🇨🇳🇬🇧 中英混合',
      'unknown': '❓ 未知'
    };
    return map[lang] || lang;
  }

  build() {
    Scroll() {
      Column() {
        Text('多语言通用OCR')
          .fontSize(24)
          .fontWeight(FontWeight.Bold)
          .fontColor('#e0e0e0')
          .margin({ bottom: 20 })

        // 语言选择
        Row() {
          ForEach(this.languageOptions, (option: { label: string; value: string }) => {
            Button(option.label)
              .height(36)
              .fontSize(13)
              .backgroundColor(this.selectedLanguage === option.value ? '#4FC3F7' : '#333')
              .fontColor(this.selectedLanguage === option.value ? '#000' : '#aaa')
              .margin({ right: 8 })
              .onClick(() => { this.selectedLanguage = option.value; })
          }, (option: { value: string }) => option.value)
        }
        .width('100%')
        .wrap(true)
        .margin({ bottom: 16 })

        // 图片预览
        if (this.imageUri) {
          Image(this.imageUri)
            .width('90%')
            .height(200)
            .objectFit(ImageFit.Contain)
            .borderRadius(12)
            .margin({ bottom: 16 })
        }

        // 识别按钮
        Button(this.isProcessing ? '识别中...' : '选择图片并识别')
          .width('90%')
          .height(48)
          .backgroundColor('#4FC3F7')
          .fontColor('#000')
          .enabled(!this.isProcessing)
          .onClick(() => this.selectAndRecognize())

        // 识别结果
        if (this.ocrResult) {
          // 统计信息
          Column() {
            Text('📊 识别统计')
              .fontSize(16)
              .fontWeight(FontWeight.Bold)
              .fontColor('#4FC3F7')
              .margin({ bottom: 8 })

            Row() {
              Text(`检测语言: `)
                .fontSize(13)
                .fontColor('#888')
              Text(this.ocrResult.languages.map(l => this.langToLabel(l)).join(', '))
                .fontSize(13)
                .fontColor('#e0e0e0')
            }
            .margin({ bottom: 4 })

            Row() {
              Text(`主要语言: `)
                .fontSize(13)
                .fontColor('#888')
              Text(this.langToLabel(this.ocrResult.dominantLanguage))
                .fontSize(13)
                .fontColor('#4FC3F7')
                .fontWeight(FontWeight.Bold)
            }
            .margin({ bottom: 4 })

            Row() {
              Text(`文本块数: `)
                .fontSize(13)
                .fontColor('#888')
              Text(`${this.ocrResult.textBlocks.length}`)
                .fontSize(13)
                .fontColor('#e0e0e0')
            }
            .margin({ bottom: 4 })

            Row() {
              Text(`耗时: `)
                .fontSize(13)
                .fontColor('#888')
              Text(`${this.ocrResult.processingTime}ms`)
                .fontSize(13)
                .fontColor('#e0e0e0')
            }
          }
          .width('100%')
          .padding(16)
          .backgroundColor('#1a1a2e')
          .borderRadius(12)
          .margin({ top: 16 })

          // 逐块结果
          List() {
            ForEach(this.ocrResult.textBlocks, (block: TextBlockWithLang, index: number) => {
              ListItem() {
                Column() {
                  Row() {
                    Text(this.langToLabel(block.language))
                      .fontSize(12)
                      .fontColor('#4FC3F7')
                    Text(` | `)
                      .fontSize(12)
                      .fontColor('#555')
                    Text(block.direction)
                      .fontSize(12)
                      .fontColor('#CE93D8')
                    Text(` | `)
                      .fontSize(12)
                      .fontColor('#555')
                    Text(`${(block.confidence * 100).toFixed(0)}%`)
                      .fontSize(12)
                      .fontColor(block.confidence > 0.8 ? '#81C784' : '#FFB74D')
                  }
                  .margin({ bottom: 6 })

                  Text(block.text)
                    .fontSize(15)
                    .fontColor('#e0e0e0')
                    .lineHeight(22)
                }
                .width('100%')
                .padding(12)
                .backgroundColor('#1a1a2e')
                .borderRadius(8)
                .margin({ bottom: 8 })
              }
            }, (block: TextBlockWithLang, index: number) => `${index}`)
          }
          .width('100%')
          .margin({ top: 12 })
        }
      }
      .width('100%')
      .padding(16)
    }
    .width('100%')
    .height('100%')
    .backgroundColor('#0d0d1a')
  }
}

3.2 文档表格结构化提取

在很多业务场景中，我们需要从图片中提取的不只是"文字"，还有"结构"——比如一个表格，需要保留行列关系。

// 文档表格结构化提取工具
import { textRecognition } from '@kit.AI.Intelligent';
import { image } from '@kit.ImageKit';

// 表格结构
interface TableStructure {
  rows: TableRow[];
  columnCount: number;
  rowCount: number;
}

interface TableRow {
  cells: TableCell[];
}

interface TableCell {
  text: string;
  confidence: number;
  colSpan: number;  // 跨列数
  rowSpan: number;  // 跨行数
}

class TableExtractor {
  /**
   * 从OCR结果中提取表格结构
   * 核心思路：根据文本块的坐标位置，推断行列关系
   */
  static extractTable(
    result: textRecognition.TextRecognitionResult,
    imageWidth: number,
    imageHeight: number
  ): TableStructure | null {
    if (!result.textBlocks || result.textBlocks.length === 0) {
      return null;
    }

    // 第一步：收集所有文本行的位置信息
    interface LineInfo {
      text: string;
      confidence: number;
      centerX: number;
      centerY: number;
      left: number;
      top: number;
      width: number;
      height: number;
    }

    const lines: LineInfo[] = [];

    for (const block of result.textBlocks) {
      for (const line of block.textLines) {
        const corners = line.corners;
        if (!corners || corners.length < 4) continue;

        const xCoords = corners.map(c => c.x);
        const yCoords = corners.map(c => c.y);

        const left = Math.min(...xCoords);
        const right = Math.max(...xCoords);
        const top = Math.min(...yCoords);
        const bottom = Math.max(...yCoords);

        lines.push({
          text: line.textValue,
          confidence: line.confidence,
          centerX: (left + right) / 2,
          centerY: (top + bottom) / 2,
          left: left,
          top: top,
          width: right - left,
          height: bottom - top
        });
      }
    }

    if (lines.length === 0) return null;

    // 第二步：按Y坐标聚类，确定行
    const rowTolerance = imageHeight * 0.02;  // Y方向2%的容差
    lines.sort((a, b) => a.centerY - b.centerY);

    const rowGroups: LineInfo[][] = [];
    let currentRow: LineInfo[] = [lines[0]];

    for (let i = 1; i < lines.length; i++) {
      if (Math.abs(lines[i].centerY - currentRow[0].centerY) < rowTolerance) {
        currentRow.push(lines[i]);
      } else {
        rowGroups.push(currentRow);
        currentRow = [lines[i]];
      }
    }
    rowGroups.push(currentRow);

    // 第三步：每行内按X坐标排序，确定列
    const rows: TableRow[] = [];
    let maxCols = 0;

    for (const group of rowGroups) {
      group.sort((a, b) => a.centerX - b.centerX);
      const cells: TableCell[] = group.map(line => ({
        text: line.text,
        confidence: line.confidence,
        colSpan: 1,
        rowSpan: 1
      }));
      rows.push({ cells });
      maxCols = Math.max(maxCols, cells.length);
    }

    return {
      rows,
      columnCount: maxCols,
      rowCount: rows.length
    };
  }

  /**
   * 将表格转换为Markdown格式
   */
  static toMarkdown(table: TableStructure): string {
    if (table.rows.length === 0) return '';

    const lines: string[] = [];

    // 表头行
    const headerRow = table.rows[0];
    const headerCells = headerRow.cells.map(c => c.text.trim());
    lines.push(`| ${headerCells.join(' | ')} |`);

    // 分隔行
    const separator = headerRow.cells.map(() => '---').join(' | ');
    lines.push(`| ${separator} |`);

    // 数据行
    for (let i = 1; i < table.rows.length; i++) {
      const row = table.rows[i];
      const cells = row.cells.map(c => c.text.trim());
      // 补齐列数
      while (cells.length < table.columnCount) {
        cells.push('');
      }
      lines.push(`| ${cells.join(' | ')} |`);
    }

    return lines.join('\n');
  }

  /**
   * 将表格转换为CSV格式
   */
  static toCSV(table: TableStructure): string {
    const lines: string[] = [];

    for (const row of table.rows) {
      const cells = row.cells.map(c => {
        const text = c.text.trim();
        // 如果包含逗号或引号，用引号包裹
        if (text.includes(',') || text.includes('"')) {
          return `"${text.replace(/"/g, '""')}"`;
        }
        return text;
      });
      lines.push(cells.join(','));
    }

    return lines.join('\n');
  }
}

3.3 多页文档连续识别

对于多页文档，我们需要支持连续拍摄和批量识别，并在识别完成后合并所有页面的结果。

// 多页文档连续识别服务
import { textRecognition } from '@kit.AI.Intelligent';
import { image } from '@kit.ImageKit';
import { camera } from '@kit.CameraKit';

interface DocumentPage {
  pageIndex: number;
  imageUri: string;
  recognizedText: string;
  confidence: number;
  processingTime: number;
}

interface DocumentResult {
  pages: DocumentPage[];
  fullText: string;
  totalPages: number;
  totalProcessingTime: number;
  averageConfidence: number;
}

class MultiPageOcrService {
  private engine: textRecognition.TextRecognitionEngine | null = null;
  private pages: DocumentPage[] = [];

  /**
   * 初始化服务
   */
  init(): void {
    this.engine = textRecognition.TextRecognitionEngine.create(
      textRecognition.TextRecognitionPreset.PRECISE
    );
    this.pages = [];
  }

  /**
   * 识别单页
   */
  async recognizePage(imageUri: string, pageIndex: number): Promise<DocumentPage> {
    if (!this.engine) {
      throw new Error('服务未初始化，请先调用init()');
    }

    const startTime = Date.now();

    try {
      const imageSource = image.createImageSource(imageUri);
      const pixelMap = await imageSource.createPixelMap();

      const config: textRecognition.TextRecognitionConfig = {
        isTextDetectionEnabled: true,
        isDirectionDetectionEnabled: true
      };

      const result = await this.engine.recognizeText(pixelMap, config);

      // 提取所有文本
      const textLines: string[] = [];
      let totalConfidence = 0;
      let lineCount = 0;

      for (const block of result.textBlocks) {
        for (const line of block.textLines) {
          textLines.push(line.textValue);
          totalConfidence += line.confidence;
          lineCount++;
        }
      }

      const page: DocumentPage = {
        pageIndex,
        imageUri,
        recognizedText: textLines.join('\n'),
        confidence: lineCount > 0 ? totalConfidence / lineCount : 0,
        processingTime: Date.now() - startTime
      };

      this.pages.push(page);
      return page;
    } catch (error) {
      const page: DocumentPage = {
        pageIndex,
        imageUri,
        recognizedText: `[第${pageIndex + 1}页识别失败]`,
        confidence: 0,
        processingTime: Date.now() - startTime
      };
      this.pages.push(page);
      return page;
    }
  }

  /**
   * 批量识别多页
   */
  async recognizeBatch(imageUris: string[]): Promise<DocumentResult> {
    const startTime = Date.now();

    for (let i = 0; i < imageUris.length; i++) {
      await this.recognizePage(imageUris[i], i);
    }

    return this.getDocumentResult(Date.now() - startTime);
  }

  /**
   * 获取完整文档结果
   */
  getDocumentResult(totalTime: number): DocumentResult {
    const fullText = this.pages
      .sort((a, b) => a.pageIndex - b.pageIndex)
      .map(p => p.recognizedText)
      .join('\n\n---\n\n');

    const totalConfidence = this.pages.reduce((sum, p) => sum + p.confidence, 0);
    const avgConfidence = this.pages.length > 0 ? totalConfidence / this.pages.length : 0;

    return {
      pages: this.pages,
      fullText,
      totalPages: this.pages.length,
      totalProcessingTime: totalTime,
      averageConfidence: avgConfidence
    };
  }

  /**
   * 释放资源
   */
  destroy(): void {
    this.engine?.close();
    this.engine = null;
    this.pages = [];
  }
}

// 使用示例组件
@Entry
@Component
struct MultiPageOcrDemo {
  @State documentResult: DocumentResult | null = null;
  @State currentPage: number = 0;
  @State isProcessing: boolean = false;
  private ocrService: MultiPageOcrService = new MultiPageOcrService();

  aboutToAppear(): void {
    this.ocrService.init();
  }

  aboutToDisappear(): void {
    this.ocrService.destroy();
  }

  build() {
    Column() {
      Text('多页文档识别')
        .fontSize(22)
        .fontWeight(FontWeight.Bold)
        .fontColor('#e0e0e0')
        .margin({ bottom: 20 })

      if (this.documentResult) {
        // 页码导航
        Row() {
          Button('上一页')
            .enabled(this.currentPage > 0)
            .onClick(() => { this.currentPage--; })
          Text(`${this.currentPage + 1} / ${this.documentResult.totalPages}`)
            .fontSize(16)
            .fontColor('#e0e0e0')
            .margin({ left: 16, right: 16 })
          Button('下一页')
            .enabled(this.currentPage < this.documentResult.totalPages - 1)
            .onClick(() => { this.currentPage++; })
        }
        .margin({ bottom: 16 })

        // 当前页内容
        if (this.documentResult.pages[this.currentPage]) {
          Scroll() {
            Text(this.documentResult.pages[this.currentPage].recognizedText)
              .fontSize(15)
              .fontColor('#e0e0e0')
              .lineHeight(24)
              .padding(16)
              .backgroundColor('#1a1a2e')
              .borderRadius(12)
          }
          .layoutWeight(1)
        }
      } else {
        Text('请拍摄文档页面进行识别')
          .fontSize(16)
          .fontColor('#888')
          .layoutWeight(1)
      }
    }
    .width('100%')
    .height('100%')
    .padding(16)
    .backgroundColor('#0d0d1a')
  }
}

四、踩坑与注意事项

4.1 多语言混排的识别精度

多语言混排是通用OCR最常见的场景，也是最容易出问题的场景。典型问题：

中日文混淆：中文和日文共享大量汉字，但某些汉字的写法在中文和日文中不同（如"歩"vs"步"）。如果语言判断错误，可能导致汉字选择错误。
中英混排的空格问题：中文不需要空格分词，但英文需要。OCR在混排场景下，经常在中文和英文之间插入多余的空格，或者丢失英文单词之间的空格。
竖排日文的处理：传统日文是竖排的，但现代日文也经常横排。如果方向检测出错，竖排文字会被识别成乱码。

解决方案：

在识别前指定主要语言，帮助引擎选择最优识别策略
对识别结果做后处理：修复中英文之间的空格问题
启用方向检测，确保竖排文字被正确处理

4.2 大图片的内存问题

一张4000×3000的照片，解码为PixelMap后大约占用48MB内存（4000×3000×4字节）。如果连续处理多张图片，内存压力会非常大。

优化策略：

// 大图片缩放后再识别
async function recognizeWithResize(
  imageUri: string,
  maxWidth: number = 1920
): Promise<textRecognition.TextRecognitionResult> {
  const imageSource = image.createImageSource(imageUri);
  const imageInfo = await imageSource.getImageInfo();

  // 计算缩放比例
  let scale = 1;
  if (imageInfo.size.width > maxWidth) {
    scale = maxWidth / imageInfo.size.width;
  }

  // 创建缩放后的PixelMap
  const decodingOptions: image.DecodingOptions = {
    desiredSize: {
      width: Math.round(imageInfo.size.width * scale),
      height: Math.round(imageInfo.size.height * scale)
    }
  };

  const pixelMap = await imageSource.createPixelMap(decodingOptions);

  // 执行识别
  const engine = textRecognition.TextRecognitionEngine.create(
    textRecognition.TextRecognitionPreset.PRECISE
  );

  const config: textRecognition.TextRecognitionConfig = {
    isTextDetectionEnabled: true,
    isDirectionDetectionEnabled: true
  };

  const result = await engine.recognizeText(pixelMap, config);

  // 及时释放PixelMap
  pixelMap.release();
  engine.close();

  return result;
}

4.3 竖排文字的识别

竖排文字（如传统日文、中文竖排书籍）的识别需要特别注意：

必须启用方向检测：isDirectionDetectionEnabled: true
结果拼接顺序：竖排文字的阅读顺序是从上到下、从右到左，与横排相反
坐标映射：竖排文字的corners坐标排列方式与横排不同

4.4 识别结果的后处理

通用OCR的识别结果通常需要做以下后处理：

后处理类型	说明	示例
空格修复	中英文之间的空格规范化	“Hello世界”→“Hello 世界”
标点修复	全角/半角标点统一	“Hello,世界”→“Hello，世界”
数字修复	全角/半角数字统一	“２０２４”→“2024”
换行修复	去除不必要的换行	合并被换行打断的句子
编码修复	修复乱码字符	替换不可见字符

五、HarmonyOS 6适配

5.1 新增语言支持

HarmonyOS 6新增了泰语、越南语、俄语的支持。如果你的应用面向东南亚或俄罗斯市场，这是一个重要的更新。

// API 14 新增语言配置
const config: textRecognition.TextRecognitionConfig = {
  isTextDetectionEnabled: true,
  isDirectionDetectionEnabled: true,
  // 新增：指定识别语言范围，提升识别精度
  languageHints: ['th', 'vi', 'ru']  // 泰语、越南语、俄语
};

5.2 置信度精度提升

API 14将置信度精度从2位小数提升到4位小数，这对于需要精细质量控制的应用非常有用。例如，你可以设置更精确的阈值来过滤低质量结果。

5.3 异步引擎创建

// API 14 推荐的异步创建方式
const engine = await textRecognition.TextRecognitionEngine.createAsync(
  textRecognition.TextRecognitionPreset.PRECISE
);

5.4 迁移建议

如果你的应用支持多语言，升级到HarmonyOS 6后可以利用languageHints参数提升识别精度
对于东南亚市场，新增的泰语和越南语支持是关键卖点
异步引擎创建可以避免主线程卡顿，建议所有应用都迁移

六、总结

mindmap
  root((通用OCR))
    多语言支持
      中英日韩核心语言
      泰越俄新增语言
      混排识别策略
      语言自动检测
    文档结构化
      表格行列提取
      Markdown/CSV导出
      排版还原
    多页文档
      连续拍摄识别
      批量处理
      结果合并
    性能优化
      大图缩放预处理
      PixelMap及时释放
      模式按需选择
    后处理
      空格修复
      标点统一
      数字规范化
      换行修复
    注意事项
      中日文汉字差异
      竖排文字处理
      内存管理
      方向检测启用
    HarmonyOS 6
      新增泰越俄语言
      languageHints参数
      置信度4位精度
      异步引擎创建

核心知识点回顾：

多语言是通用OCR的核心能力：HarmonyOS支持中英日韩泰越俄等多种语言，以及多语言混排识别，这是与专项OCR最大的区别。
语言检测是第一步：先判断文本的语言，再选择对应的识别策略，这是多语言OCR的基本范式。
文档结构化是增值能力：通过坐标分析推断行列关系，可以将OCR结果从"一坨文字"变成"结构化表格"。
多页文档需要服务化管理：引擎复用、内存控制、结果合并，是处理多页文档的三个关键点。
后处理不可忽视：空格、标点、数字、换行的规范化处理，直接影响最终结果的可用性。
大图片必须缩放：直接处理原始分辨率的照片，不仅浪费算力，还可能导致内存溢出。

下一篇，也是本系列的最后一篇，我们将进入手写文字识别的世界，看看HarmonyOS如何处理手写体的识别与笔迹分析。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

HarmonyOS开发：通用文字识别与多语言支持

HarmonyOS开发：通用文字识别与多语言支持

一、背景与动机

二、核心原理

2.1 多语言OCR的技术挑战

2.2 多语言OCR处理流程

2.3 语言分类与识别策略

2.4 支持的语言列表

三、代码实战

3.1 多语言通用OCR识别

3.2 文档表格结构化提取

3.3 多页文档连续识别

四、踩坑与注意事项

4.1 多语言混排的识别精度

4.2 大图片的内存问题

4.3 竖排文字的识别

4.4 识别结果的后处理

五、HarmonyOS 6适配

5.1 新增语言支持

5.2 置信度精度提升

5.3 异步引擎创建

5.4 迁移建议

六、总结

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

HarmonyOS开发：通用文字识别与多语言支持

HarmonyOS开发：通用文字识别与多语言支持

一、背景与动机

二、核心原理

2.1 多语言OCR的技术挑战

2.2 多语言OCR处理流程

2.3 语言分类与识别策略

2.4 支持的语言列表

三、代码实战

3.1 多语言通用OCR识别

3.2 文档表格结构化提取

3.3 多页文档连续识别

四、踩坑与注意事项

4.1 多语言混排的识别精度

4.2 大图片的内存问题

4.3 竖排文字的识别

4.4 识别结果的后处理

五、HarmonyOS 6适配

5.1 新增语言支持

5.2 置信度精度提升

5.3 异步引擎创建

5.4 迁移建议

六、总结

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品