- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

HarmonyOS APP开发：内容推荐与特征工程

Jack20 发表于 2026/06/21 14:09:41 2026/06/21

【摘要】 HarmonyOS APP开发：内容推荐与特征工程核心要点：特征工程是推荐系统的"地基"，决定了推荐效果的上限。本文从内容推荐的核心原理出发，深入讲解标签体系构建、TF-IDF特征提取、向量化表示与相似度匹配，并在HarmonyOS端侧实现完整的内容推荐管线，涵盖特征提取、向量索引、相似检索与推荐生成全流程。项目说明开发语言ArkTS关键能力特征提取、向量计算、标签匹配、倒排索引一、背景...

HarmonyOS APP开发：内容推荐与特征工程

核心要点：特征工程是推荐系统的"地基"，决定了推荐效果的上限。本文从内容推荐的核心原理出发，深入讲解标签体系构建、TF-IDF特征提取、向量化表示与相似度匹配，并在HarmonyOS端侧实现完整的内容推荐管线，涵盖特征提取、向量索引、相似检索与推荐生成全流程。

项目	说明
开发语言	ArkTS
关键能力	特征提取、向量计算、标签匹配、倒排索引

一、背景与动机

你有没有这样的体验：刚看完一篇关于"鸿蒙开发"的文章，首页就推荐了更多ArkTS教程？这不是巧合，这是内容推荐在工作。

和协同过滤"靠人推荐"不同，内容推荐是"靠物推荐"——分析你喜欢的物品有什么特征，然后找具有相似特征的其他物品。这种方法的优势非常明显：

第一，没有冷启动问题。 新物品只要有内容特征，立刻可以被推荐，不需要等用户产生行为数据。

第二，可解释性强。 “因为你喜欢科技类文章，所以推荐这篇”——用户能理解为什么被推荐。

第三，端侧友好。 不需要全局用户行为数据，单个设备就能独立运行。

但内容推荐也有短板：信息茧房。如果只推荐相似内容，用户永远看不到"意外之喜"。所以特征工程的质量，直接决定了内容推荐能否突破这个瓶颈。

二、核心原理

2.1 内容推荐的整体架构

flowchart TB
    classDef primary fill:#4A90D9,stroke:#2C5F8A,color:#fff,font-weight:bold
    classDef warning fill:#F5A623,stroke:#C7841A,color:#fff,font-weight:bold
    classDef error fill:#D0021B,stroke:#9B0214,color:#fff,font-weight:bold
    classDef info fill:#7B68EE,stroke:#5B48CE,color:#fff,font-weight:bold
    classDef purple fill:#9B59B6,stroke:#7D3C98,color:#fff,font-weight:bold

    A[原始内容数据]:::primary --> B[特征提取]:::info
    B --> C[标签体系映射]:::info
    B --> D[TF-IDF向量化]:::warning
    B --> E[语义向量表示]:::purple

    C --> F[倒排索引构建]:::primary
    D --> G[向量相似度计算]:::warning
    E --> G

    F --> H[候选集召回]:::error
    G --> H

    H --> I[排序与多样性调整]:::purple
    I --> J[推荐结果输出]:::primary

    K[用户画像]:::info --> H
    K --> I

2.2 特征工程的层次

特征工程不是简单的"打标签"，它有多个层次：

层次	特征类型	示例	提取方法
L1 基础特征	结构化属性	分类、作者、发布时间	直接读取
L2 标签特征	关键词标签	#科技 #AI #鸿蒙	人工标注/自动提取
L3 统计特征	词频统计	TF-IDF向量	统计计算
L4 语义特征	语义向量	Word2Vec/BERT嵌入	模型推理

端侧推荐通常使用L1+L2+L3，L4需要模型推理，HarmonyOS 6的MindSpore可以支持。

2.3 TF-IDF原理

TF-IDF（词频-逆文档频率）是内容推荐的经典算法：

TF（Term Frequency）：词在文档中出现的频率，衡量词对文档的重要性
IDF（Inverse Document Frequency）：词在所有文档中的稀有程度，越稀有越重要

$TF\text{-}IDF(t,d) = TF(t,d) \times IDF(t) = \frac{f_{t,d}}{\sum_{t'}f_{t',d}} \times \log\frac{N}{1 + n_t}$

其中：

$f_{t,d}$ ：词t在文档d中出现的次数
$N$ ：文档总数
$n_t$ ：包含词t的文档数

2.4 向量相似度检索

将物品和用户都表示为向量后，推荐就变成了"最近邻搜索"问题：

方法	时间复杂度	适用场景
暴力搜索	O(N×D)	物品数<1000
倒排索引	O(K)	标签特征匹配
LSH近似搜索	O(1)	高维向量检索
乘积量化	O(N/M)	超大规模检索

端侧推荐物品量通常在千级别，暴力搜索或倒排索引即可满足需求。

三、代码实战

3.1 特征提取与TF-IDF计算

先实现核心的特征提取和TF-IDF计算模块。

// FeatureExtractor.ets - 特征提取与TF-IDF计算

/**
 * 文档数据结构
 */
export interface Document {
  id: string
  title: string
  content: string
  tags: string[]       // 预标注标签
  category: string     // 分类
  publishTime: number  // 发布时间
}

/**
 * TF-IDF向量（稀疏表示）
 */
export interface SparseVector {
  docId: string
  values: Map<string, number>  // term -> tfidf值
  norm: number                  // 向量模长（预计算）
}

/**
 * 特征提取器
 * 负责从原始内容中提取特征并计算TF-IDF
 */
export class FeatureExtractor {
  // 文档集合
  private documents: Map<string, Document> = new Map()
  // TF-IDF向量集合
  private tfidfVectors: Map<string, SparseVector> = new Map()
  // 文档频率：每个term出现在多少个文档中
  private documentFrequency: Map<string, number> = new Map()
  // 标签倒排索引
  private tagInvertedIndex: Map<string, Set<string>> = new Map()
  // 是否已构建索引
  private isIndexed: boolean = false

  // 中文停用词表（简化版）
  private static readonly STOP_WORDS: Set<string> = new Set([
    '的', '了', '在', '是', '我', '有', '和', '就', '不', '人',
    '都', '一', '一个', '上', '也', '很', '到', '说', '要', '去',
    '你', '会', '着', '没有', '看', '好', '自己', '这', '他', '她',
    'the', 'a', 'an', 'is', 'are', 'was', 'were', 'be', 'have', 'has',
    'do', 'does', 'did', 'will', 'would', 'could', 'should', 'may', 'might',
  ])

  /**
   * 加载文档集合
   */
  loadDocuments(docs: Document[]): void {
    this.documents.clear()
    this.tfidfVectors.clear()
    this.documentFrequency.clear()
    this.tagInvertedIndex.clear()
    this.isIndexed = false

    for (const doc of docs) {
      this.documents.set(doc.id, doc)

      // 构建标签倒排索引
      for (const tag of doc.tags) {
        if (!this.tagInvertedIndex.has(tag)) {
          this.tagInvertedIndex.set(tag, new Set())
        }
        this.tagInvertedIndex.get(tag)!.add(doc.id)
      }
    }

    console.info(`[FeatureExtractor] 加载 ${docs.length} 篇文档`)
  }

  /**
   * 构建TF-IDF索引
   * 这是特征提取的核心步骤
   */
  buildIndex(): void {
    if (this.documents.size === 0) {
      console.warn('[FeatureExtractor] 无文档数据')
      return
    }

    const totalDocs = this.documents.size

    // 第一步：计算每个文档的词频（TF）
    const termFrequencies: Map<string, Map<string, number>> = new Map()

    this.documents.forEach((doc, docId) => {
      const terms = this.tokenize(doc.title + ' ' + doc.content)
      const tf: Map<string, number> = new Map()

      for (const term of terms) {
        tf.set(term, (tf.get(term) || 0) + 1)
      }

      // 归一化TF
      const maxTf = Math.max(...Array.from(tf.values()))
      tf.forEach((count, term) => {
        tf.set(term, count / maxTf)
      })

      termFrequencies.set(docId, tf)

      // 更新文档频率
      tf.forEach((_, term) => {
        this.documentFrequency.set(term, (this.documentFrequency.get(term) || 0) + 1)
      })
    })

    // 第二步：计算TF-IDF向量
    this.tfidfVectors.clear()

    termFrequencies.forEach((tf, docId) => {
      const vector: SparseVector = {
        docId,
        values: new Map(),
        norm: 0,
      }

      let normSquared = 0

      tf.forEach((tfValue, term) => {
        const df = this.documentFrequency.get(term) || 1
        const idf = Math.log(totalDocs / (1 + df))
        const tfidf = tfValue * idf

        // 过滤掉极低权重的term，节省存储
        if (tfidf > 0.01) {
          vector.values.set(term, tfidf)
          normSquared += tfidf * tfidf
        }
      })

      vector.norm = Math.sqrt(normSquared)
      this.tfidfVectors.set(docId, vector)
    })

    this.isIndexed = true
    console.info(`[FeatureExtractor] TF-IDF索引构建完成，${this.tfidfVectors.size} 个向量`)
  }

  /**
   * 分词器（简化版）
   * 实际项目中应使用专业分词库
   */
  private tokenize(text: string): string[] {
    // 转小写
    const lowerText = text.toLowerCase()
    // 按空格和标点分割
    const tokens = lowerText.split(/[\s,.;:!?，。；：！？、\n\r\t]+/)
    // 过滤停用词和短词
    return tokens.filter(t => t.length > 1 && !FeatureExtractor.STOP_WORDS.has(t))
  }

  /**
   * 计算两个文档的余弦相似度
   */
  cosineSimilarity(docIdA: string, docIdB: string): number {
    const vecA = this.tfidfVectors.get(docIdA)
    const vecB = this.tfidfVectors.get(docIdB)

    if (!vecA || !vecB || vecA.norm === 0 || vecB.norm === 0) return 0

    let dotProduct = 0
    vecA.values.forEach((valueA, term) => {
      const valueB = vecB.values.get(term)
      if (valueB !== undefined) {
        dotProduct += valueA * valueB
      }
    })

    return dotProduct / (vecA.norm * vecB.norm)
  }

  /**
   * 基于标签匹配查找相似文档
   * 比TF-IDF计算更快，适合端侧快速召回
   */
  findByTags(tags: string[], excludeDocIds: string[] = [], topK: number = 20): Array<{
    docId: string; score: number; matchedTags: string[]
  }> {
    const scores: Map<string, { score: number; matchedTags: string[] }> = new Map()
    const excludeSet = new Set(excludeDocIds)

    for (const tag of tags) {
      const docIds = this.tagInvertedIndex.get(tag)
      if (!docIds) continue

      for (const docId of docIds) {
        if (excludeSet.has(docId)) continue

        const existing = scores.get(docId)
        if (existing) {
          existing.score += 1
          existing.matchedTags.push(tag)
        } else {
          scores.set(docId, { score: 1, matchedTags: [tag] })
        }
      }
    }

    // 按匹配标签数排序
    const results = Array.from(scores.entries())
      .map(([docId, data]) => ({ docId, ...data }))
      .sort((a, b) => b.score - a.score)

    return results.slice(0, topK)
  }

  /**
   * 基于TF-IDF向量查找相似文档
   */
  findBySimilarity(queryDocId: string, excludeDocIds: string[] = [], topK: number = 20): Array<{
    docId: string; similarity: number
  }> {
    if (!this.isIndexed) return []

    const results: Array<{ docId: string; similarity: number }> = []
    const excludeSet = new Set(excludeDocIds)

    this.tfidfVectors.forEach((_, docId) => {
      if (docId === queryDocId || excludeSet.has(docId)) return

      const sim = this.cosineSimilarity(queryDocId, docId)
      if (sim > 0.01) {
        results.push({ docId, similarity: sim })
      }
    })

    results.sort((a, b) => b.similarity - a.similarity)
    return results.slice(0, topK)
  }

  /**
   * 从查询文本生成TF-IDF向量
   * 用于"搜索式"推荐
   */
  queryByText(queryText: string, topK: number = 20): Array<{
    docId: string; similarity: number
  }> {
    if (!this.isIndexed || this.documents.size === 0) return []

    const totalDocs = this.documents.size
    const terms = this.tokenize(queryText)
    const queryTf: Map<string, number> = new Map()

    for (const term of terms) {
      queryTf.set(term, (queryTf.get(term) || 0) + 1)
    }

    // 归一化
    const maxTf = Math.max(...Array.from(queryTf.values()), 1)
    let queryNorm = 0
    const queryVector: Map<string, number> = new Map()

    queryTf.forEach((count, term) => {
      const tf = count / maxTf
      const df = this.documentFrequency.get(term) || 1
      const idf = Math.log(totalDocs / (1 + df))
      const tfidf = tf * idf
      queryVector.set(term, tfidf)
      queryNorm += tfidf * tfidf
    })
    queryNorm = Math.sqrt(queryNorm)

    if (queryNorm === 0) return []

    // 计算与所有文档的相似度
    const results: Array<{ docId: string; similarity: number }> = []

    this.tfidfVectors.forEach((docVec) => {
      let dotProduct = 0
      queryVector.forEach((qVal, term) => {
        const dVal = docVec.values.get(term)
        if (dVal !== undefined) {
          dotProduct += qVal * dVal
        }
      })

      const similarity = dotProduct / (queryNorm * docVec.norm)
      if (similarity > 0.01) {
        results.push({ docId: docVec.docId, similarity })
      }
    })

    results.sort((a, b) => b.similarity - a.similarity)
    return results.slice(0, topK)
  }

  /**
   * 获取文档的标签集合
   */
  getDocumentTags(docId: string): string[] {
    return this.documents.get(docId)?.tags || []
  }

  /**
   * 获取标签倒排索引统计
   */
  getTagStats(): Map<string, number> {
    const stats: Map<string, number> = new Map()
    this.tagInvertedIndex.forEach((docIds, tag) => {
      stats.set(tag, docIds.size)
    })
    return stats
  }
}

3.2 用户画像与兴趣建模

用户画像的构建是内容推荐的关键环节。

// UserProfileBuilder.ets - 用户画像构建器

import { FeatureExtractor, Document } from './FeatureExtractor'

/**
 * 用户兴趣标签
 */
export interface InterestTag {
  tag: string
  weight: number      // 兴趣权重
  lastActive: number  // 最后活跃时间
  source: string      // 来源：click/view/collect
}

/**
 * 用户画像
 */
export interface UserProfile {
  userId: string
  interestTags: Map<string, InterestTag>  // 兴趣标签映射
  readHistory: string[]                    // 阅读历史
  preferredCategories: Map<string, number> // 偏好分类
  readingPatterns: ReadingPattern          // 阅读模式
  lastUpdateTime: number
}

/**
 * 阅读模式统计
 */
export interface ReadingPattern {
  avgReadDuration: number   // 平均阅读时长（秒）
  peakHour: number          // 活跃高峰时段（0-23）
  preferredLength: string   // 偏好文章长度：short/medium/long
  dailyReadCount: number    // 日均阅读数
}

/**
 * 用户画像构建器
 * 从用户行为数据中提取兴趣特征
 */
export class UserProfileBuilder {
  private featureExtractor: FeatureExtractor
  private profileCache: Map<string, UserProfile> = new Map()

  // 行为类型权重
  private static readonly BEHAVIOR_WEIGHTS: Record<string, number> = {
    view: 1.0,
    click: 2.0,
    read_complete: 3.0,
    collect: 4.0,
    share: 3.5,
  }

  // 兴趣衰减半衰期（毫秒）
  private static readonly INTEREST_HALFLIFE = 7 * 24 * 3600 * 1000 // 7天

  constructor(featureExtractor: FeatureExtractor) {
    this.featureExtractor = featureExtractor
  }

  /**
   * 从行为记录构建用户画像
   */
  buildProfile(
    userId: string,
    behaviors: Array<{
      docId: string
      behaviorType: string
      timestamp: number
      duration?: number
    }>
  ): UserProfile {
    // 检查缓存
    const cached = this.profileCache.get(userId)
    if (cached && Date.now() - cached.lastUpdateTime < 3600000) {
      return cached
    }

    const interestTags: Map<string, InterestTag> = new Map()
    const readHistory: string[] = []
    const categoryCount: Map<string, number> = new Map()
    let totalDuration = 0
    let durationCount = 0
    const hourCounts: number[] = new Array(24).fill(0)

    for (const behavior of behaviors) {
      const docTags = this.featureExtractor.getDocumentTags(behavior.docId)
      const weight = UserProfileBuilder.BEHAVIOR_WEIGHTS[behavior.behaviorType] || 1.0

      // 时间衰减
      const timeDecay = this.calculateTimeDecay(behavior.timestamp)

      // 累加标签兴趣权重
      for (const tag of docTags) {
        const existing = interestTags.get(tag)
        const newWeight = weight * timeDecay

        if (existing) {
          existing.weight += newWeight
          existing.lastActive = Math.max(existing.lastActive, behavior.timestamp)
          // 保留最高权重来源
          if (weight > UserProfileBuilder.BEHAVIOR_WEIGHTS[existing.source]) {
            existing.source = behavior.behaviorType
          }
        } else {
          interestTags.set(tag, {
            tag,
            weight: newWeight,
            lastActive: behavior.timestamp,
            source: behavior.behaviorType,
          })
        }
      }

      readHistory.push(behavior.docId)

      // 统计分类偏好
      // （简化处理，实际需要从文档获取分类）

      // 统计阅读时长
      if (behavior.duration) {
        totalDuration += behavior.duration
        durationCount++
      }

      // 统计活跃时段
      const hour = new Date(behavior.timestamp).getHours()
      hourCounts[hour]++
    }

    // 归一化兴趣权重（0-1范围）
    const maxWeight = Math.max(
      ...Array.from(interestTags.values()).map(t => t.weight),
      1
    )
    interestTags.forEach(tag => {
      tag.weight = tag.weight / maxWeight
    })

    // 计算活跃高峰时段
    let peakHour = 0
    let maxHourCount = 0
    hourCounts.forEach((count, hour) => {
      if (count > maxHourCount) {
        maxHourCount = count
        peakHour = hour
      }
    })

    const profile: UserProfile = {
      userId,
      interestTags,
      readHistory,
      preferredCategories: categoryCount,
      readingPatterns: {
        avgReadDuration: durationCount > 0 ? totalDuration / durationCount : 0,
        peakHour,
        preferredLength: 'medium',
        dailyReadCount: behaviors.length > 0 ? Math.ceil(behaviors.length / 7) : 0,
      },
      lastUpdateTime: Date.now(),
    }

    this.profileCache.set(userId, profile)
    console.info(`[ProfileBuilder] 构建用户画像: ${interestTags.size} 个兴趣标签`)

    return profile
  }

  /**
   * 计算时间衰减因子
   */
  private calculateTimeDecay(timestamp: number): number {
    const hoursPassed = (Date.now() - timestamp) / (1000 * 3600)
    return Math.pow(0.5, hoursPassed / (UserProfileBuilder.INTEREST_HALFLIFE / 3600000))
  }

  /**
   * 获取用户TopN兴趣标签
   */
  getTopInterests(profile: UserProfile, topN: number = 10): InterestTag[] {
    return Array.from(profile.interestTags.values())
      .sort((a, b) => b.weight - a.weight)
      .slice(0, topN)
  }

  /**
   * 计算用户与文档的匹配度
   */
  calculateMatchScore(profile: UserProfile, docId: string): {
    score: number
    matchedTags: string[]
  } {
    const docTags = this.featureExtractor.getDocumentTags(docId)
    let score = 0
    const matchedTags: string[] = []

    for (const tag of docTags) {
      const interest = profile.interestTags.get(tag)
      if (interest) {
        score += interest.weight
        matchedTags.push(tag)
      }
    }

    // 归一化
    if (matchedTags.length > 0) {
      score = score / matchedTags.length
    }

    return { score, matchedTags }
  }

  /**
   * 导出用户画像为JSON
   */
  exportProfile(profile: UserProfile): string {
    const obj: Record<string, Object> = {
      userId: profile.userId,
      interestTags: Object.fromEntries(profile.interestTags),
      readHistory: profile.readHistory,
      lastUpdateTime: profile.lastUpdateTime,
    }
    return JSON.stringify(obj)
  }
}

3.3 内容推荐页面完整实现

将特征提取、用户画像和推荐逻辑整合到UI中。

// ContentRecommendationPage.ets - 内容推荐页面
import { FeatureExtractor, Document } from './FeatureExtractor'
import { UserProfileBuilder, UserProfile, InterestTag } from './UserProfileBuilder'

@Entry
@Component
struct ContentRecommendationPage {
  private featureExtractor: FeatureExtractor = new FeatureExtractor()
  private profileBuilder: UserProfileBuilder = new UserProfileBuilder(this.featureExtractor)

  @State recommendResults: Array<{
    docId: string; score: number; matchedTags: string[]; method: string
  }> = []
  @State userProfile: UserProfile | null = null
  @State topInterests: InterestTag[] = []
  @State isBuildingIndex: boolean = false
  @State searchQuery: string = ''
  @State activeTab: number = 0 // 0-兴趣推荐 1-相似推荐 2-搜索推荐

  // 模拟文档数据
  private mockDocuments: Document[] = [
    {
      id: 'doc_001', title: 'HarmonyOS ArkTS开发入门',
      content: 'ArkTS是HarmonyOS的应用开发语言 基于TypeScript扩展 适合移动应用开发',
      tags: ['HarmonyOS', 'ArkTS', '移动开发', '入门教程'],
      category: '开发', publishTime: Date.now() - 3600000,
    },
    {
      id: 'doc_002', title: '鸿蒙分布式架构深度解析',
      content: 'HarmonyOS分布式软总线技术实现多设备协同 分布式数据管理 分布式任务调度',
      tags: ['HarmonyOS', '分布式', '架构', '深度解析'],
      category: '架构', publishTime: Date.now() - 7200000,
    },
    {
      id: 'doc_003', title: '推荐系统算法实践指南',
      content: '协同过滤 内容推荐 深度学习推荐系统 TF-IDF算法实现 推荐效果评估',
      tags: ['推荐系统', '算法', '机器学习', '实践指南'],
      category: '算法', publishTime: Date.now() - 86400000,
    },
    {
      id: 'doc_004', title: 'ArkUI声明式开发范式',
      content: 'ArkUI采用声明式开发范式 组件化开发 状态管理V2 响应式更新机制',
      tags: ['ArkUI', 'HarmonyOS', '声明式', '组件化'],
      category: '开发', publishTime: Date.now() - 1800000,
    },
    {
      id: 'doc_005', title: '端侧机器学习模型部署',
      content: 'MindSpore Lite端侧推理 模型量化压缩 端侧AI应用开发 推理性能优化',
      tags: ['端侧AI', 'MindSpore', '模型部署', '性能优化'],
      category: 'AI', publishTime: Date.now() - 43200000,
    },
    {
      id: 'doc_006', title: 'Flutter跨平台开发对比',
      content: 'Flutter与React Native对比 跨平台开发框架选择 性能对比分析',
      tags: ['Flutter', '跨平台', 'React Native', '对比'],
      category: '开发', publishTime: Date.now() - 172800000,
    },
    {
      id: 'doc_007', title: 'HarmonyOS数据持久化方案',
      content: '关系型数据库 Preferences数据存储 分布式数据同步 数据加密方案',
      tags: ['HarmonyOS', '数据持久化', '数据库', '加密'],
      category: '开发', publishTime: Date.now() - 10800000,
    },
    {
      id: 'doc_008', title: '深度学习推荐模型演进',
      content: '从DeepFM到DIN 推荐系统深度学习模型 Attention机制 特征交叉',
      tags: ['推荐系统', '深度学习', 'DeepFM', 'Attention'],
      category: '算法', publishTime: Date.now() - 259200000,
    },
  ]

  // 模拟用户行为
  private mockBehaviors = [
    { docId: 'doc_001', behaviorType: 'read_complete', timestamp: Date.now() - 3600000, duration: 300 },
    { docId: 'doc_002', behaviorType: 'click', timestamp: Date.now() - 7200000, duration: 120 },
    { docId: 'doc_004', behaviorType: 'collect', timestamp: Date.now() - 1800000 },
    { docId: 'doc_007', behaviorType: 'view', timestamp: Date.now() - 10800000, duration: 60 },
    { docId: 'doc_001', behaviorType: 'share', timestamp: Date.now() - 600000 },
  ]

  aboutToAppear(): void {
    this.initEngine()
  }

  private initEngine(): void {
    this.isBuildingIndex = true

    setTimeout(() => {
      // 加载文档
      this.featureExtractor.loadDocuments(this.mockDocuments)
      // 构建TF-IDF索引
      this.featureExtractor.buildIndex()
      // 构建用户画像
      this.userProfile = this.profileBuilder.buildProfile('user_001', this.mockBehaviors)
      this.topInterests = this.profileBuilder.getTopInterests(this.userProfile, 8)

      this.isBuildingIndex = false
    }, 500)
  }

  /**
   * 生成推荐
   */
  private generateRecommendations(): void {
    if (!this.userProfile) return

    const readHistory = this.userProfile.readHistory

    switch (this.activeTab) {
      case 0: // 兴趣推荐
        this.recommendResults = this.generateInterestBased(readHistory)
        break
      case 1: // 相似推荐
        this.recommendResults = this.generateSimilarityBased(readHistory)
        break
      case 2: // 搜索推荐
        if (this.searchQuery.trim()) {
          this.recommendResults = this.generateSearchBased()
        }
        break
    }
  }

  /**
   * 基于兴趣标签的推荐
   */
  private generateInterestBased(excludeIds: string[]): Array<{
    docId: string; score: number; matchedTags: string[]; method: string
  }> {
    if (!this.userProfile) return []

    const topTags = this.topInterests.map(t => t.tag)
    const tagResults = this.featureExtractor.findByTags(topTags, excludeIds, 20)

    return tagResults.map(r => ({
      docId: r.docId,
      score: r.score,
      matchedTags: r.matchedTags,
      method: '兴趣匹配',
    }))
  }

  /**
   * 基于TF-IDF相似度的推荐
   */
  private generateSimilarityBased(excludeIds: string[]): Array<{
    docId: string; score: number; matchedTags: string[]; method: string
  }> {
    if (!this.userProfile || this.userProfile.readHistory.length === 0) return []

    // 以最近阅读的文档为查询
    const latestDocId = this.userProfile.readHistory[this.userProfile.readHistory.length - 1]
    const simResults = this.featureExtractor.findBySimilarity(latestDocId, excludeIds, 10)

    return simResults.map(r => ({
      docId: r.docId,
      score: r.similarity,
      matchedTags: this.featureExtractor.getDocumentTags(r.docId),
      method: '内容相似',
    }))
  }

  /**
   * 基于搜索的推荐
   */
  private generateSearchBased(): Array<{
    docId: string; score: number; matchedTags: string[]; method: string
  }> {
    const results = this.featureExtractor.queryByText(this.searchQuery, 10)
    return results.map(r => ({
      docId: r.docId,
      score: r.similarity,
      matchedTags: this.featureExtractor.getDocumentTags(r.docId),
      method: '搜索匹配',
    }))
  }

  build() {
    Navigation() {
      Column() {
        // 用户兴趣标签展示
        this.InterestTagBar()

        // 推荐模式切换
        this.ModeTabBar()

        // 搜索框（搜索模式）
        if (this.activeTab === 2) {
          this.SearchBar()
        }

        // 推荐结果
        if (this.isBuildingIndex) {
          this.LoadingView()
        } else {
          this.RecommendListView()
        }
      }
      .width('100%')
      .height('100%')
      .backgroundColor('#0f0f1a')
    }
    .title('内容推荐')
    .titleMode(NavigationTitleMode.Mini)
    .navBarStyle(NavigationBarStyle.Constant)
  }

  // ======== 子组件 ========

  @Builder
  InterestTagBar() {
    Column() {
      Text('你的兴趣画像')
        .fontSize(13)
        .fontColor('#888888')
        .margin({ bottom: 8 })

      Flex({ wrap: FlexWrap.Wrap, justifyContent: FlexAlign.Start }) {
        ForEach(this.topInterests, (interest: InterestTag) => {
          Text(`${interest.tag} ${(interest.weight * 100).toFixed(0)}%`)
            .fontSize(12)
            .fontColor('#E0E0E0')
            .padding({ left: 10, right: 10, top: 4, bottom: 4 })
            .borderRadius(12)
            .backgroundColor(`rgba(123,104,238,${0.1 + interest.weight * 0.3})`)
            .margin({ right: 6, bottom: 6 })
        })
      }
    }
    .width('100%')
    .padding({ left: 16, right: 16, top: 12, bottom: 8 })
  }

  @Builder
  ModeTabBar() {
    Row() {
      ForEach(['兴趣推荐', '相似推荐', '搜索推荐'], (tab: string, index: number) => {
        Text(tab)
          .fontSize(14)
          .fontColor(this.activeTab === index ? '#7B68EE' : '#999999')
          .fontWeight(this.activeTab === index ? FontWeight.Bold : FontWeight.Normal)
          .padding({ left: 14, right: 14, top: 6, bottom: 6 })
          .borderRadius(14)
          .backgroundColor(this.activeTab === index ? 'rgba(123,104,238,0.15)' : 'transparent')
          .onClick(() => {
            this.activeTab = index
            this.generateRecommendations()
          })
      })
    }
    .width('100%')
    .justifyContent(FlexAlign.Center)
    .padding({ top: 8, bottom: 8 })
  }

  @Builder
  SearchBar() {
    Row() {
      TextInput({ placeholder: '输入关键词搜索...' })
        .fontSize(14)
        .fontColor('#E0E0E0')
        .placeholderColor('#666666')
        .backgroundColor('rgba(255,255,255,0.08)')
        .borderRadius(20)
        .padding({ left: 16, right: 16 })
        .layoutWeight(1)
        .onChange((value: string) => {
          this.searchQuery = value
        })

      Button('搜索')
        .fontSize(14)
        .fontColor('#FFFFFF')
        .backgroundColor('#7B68EE')
        .borderRadius(20)
        .padding({ left: 16, right: 16, top: 6, bottom: 6 })
        .margin({ left: 8 })
        .onClick(() => this.generateRecommendations())
    }
    .width('100%')
    .padding({ left: 16, right: 16, top: 4, bottom: 8 })
  }

  @Builder
  LoadingView() {
    Column() {
      LoadingProgress()
        .width(40)
        .height(40)
        .color('#7B68EE')
      Text('正在构建特征索引...')
        .fontSize(13)
        .fontColor('#999999')
        .margin({ top: 8 })
    }
    .width('100%')
    .height('50%')
    .justifyContent(HorizontalAlign.Center)
  }

  @Builder
  RecommendListView() {
    if (this.recommendResults.length === 0) {
      Column() {
        Text('🔍')
          .fontSize(40)
        Text('切换推荐模式查看结果')
          .fontSize(14)
          .fontColor('#999999')
          .margin({ top: 8 })
      }
      .width('100%')
      .height('50%')
      .justifyContent(HorizontalAlign.Center)
    } else {
      List({ space: 10 }) {
        ForEach(this.recommendResults, (result: {
          docId: string; score: number; matchedTags: string[]; method: string
        }, index: number) => {
          ListItem() {
            this.ResultCard(result, index + 1)
          }
        })
      }
      .width('100%')
      .layoutWeight(1)
      .padding({ left: 16, right: 16, top: 8 })
    }
  }

  @Builder
  ResultCard(result: {
    docId: string; score: number; matchedTags: string[]; method: string
  }, rank: number) {
    Column() {
      Row() {
        Text(`#${rank}`)
          .fontSize(14)
          .fontWeight(FontWeight.Bold)
          .fontColor(rank <= 3 ? '#F5A623' : '#666666')
          .width(32)

        Column() {
          Text(result.docId)
            .fontSize(15)
            .fontColor('#E0E0E0')
            .fontWeight(FontWeight.Medium)

          Text(`来源: ${result.method}`)
            .fontSize(11)
            .fontColor('#7B68EE')
            .margin({ top: 2 })
        }
        .layoutWeight(1)
        .alignItems(HorizontalAlign.Start)

        Text(result.score.toFixed(3))
          .fontSize(14)
          .fontColor('#F5A623')
          .fontWeight(FontWeight.Medium)
      }

      // 匹配标签
      if (result.matchedTags.length > 0) {
        Flex({ wrap: FlexWrap.Wrap, justifyContent: FlexAlign.Start }) {
          ForEach(result.matchedTags.slice(0, 5), (tag: string) => {
            Text(tag)
              .fontSize(11)
              .fontColor('#7B68EE')
              .padding({ left: 6, right: 6, top: 2, bottom: 2 })
              .borderRadius(8)
              .backgroundColor('rgba(123,104,238,0.1)')
              .margin({ right: 4, top: 6 })
          })
        }
        .margin({ top: 4 })
      }
    }
    .width('100%')
    .padding(14)
    .borderRadius(12)
    .backgroundColor('rgba(255,255,255,0.06)')
    .backdropBlur(20)
  }
}

四、踩坑与注意事项

4.1 中文分词问题

坑：ArkTS没有内置中文分词库，简单的空格分词对中文完全无效。

解：

短期方案：依赖预标注标签，不依赖内容文本分词
中期方案：使用基于词典的正向最大匹配分词
长期方案：HarmonyOS 6的MindSpore加载轻量分词模型

// 简易正向最大匹配分词
class SimpleChineseTokenizer {
  private dictionary: Set<string> = new Set()

  loadDictionary(words: string[]): void {
    words.forEach(w => this.dictionary.add(w))
  }

  tokenize(text: string, maxLen: number = 4): string[] {
    const result: string[] = []
    let i = 0
    while (i < text.length) {
      let matched = false
      for (let len = maxLen; len > 1; len--) {
        const word = text.substring(i, i + len)
        if (this.dictionary.has(word)) {
          result.push(word)
          i += len
          matched = true
          break
        }
      }
      if (!matched) {
        // 单字作为fallback
        i++
      }
    }
    return result
  }
}

4.2 TF-IDF索引构建耗时

坑：文档数量超过500篇时，TF-IDF索引构建可能需要数秒，阻塞UI线程。

解：

使用Worker线程进行后台计算
增量构建：新文档加入时只更新相关向量
缓存构建结果到本地存储

// 增量更新TF-IDF
addDocument(doc: Document): void {
  this.documents.set(doc.id, doc)
  // 只重新计算受影响的部分
  // 1. 更新文档频率
  const terms = this.tokenize(doc.title + ' ' + doc.content)
  for (const term of new Set(terms)) {
    this.documentFrequency.set(term, (this.documentFrequency.get(term) || 0) + 1)
  }
  // 2. 计算新文档的TF-IDF向量
  this.computeSingleTFIDF(doc)
  // 3. 标记需要重新归一化
}

4.3 特征维度爆炸

坑：TF-IDF的term空间可能非常大（数万维），导致向量存储和计算开销巨大。

解：

过滤低频词（出现次数<3的term）
过滤高频词（出现在>50%文档的term）
截断每个文档向量只保留TopN=100个最高权重term
使用特征哈希（Feature Hashing）降维

4.4 标签体系的一致性

坑：不同来源的标签命名不统一，如"AI"和"人工智能"、“鸿蒙"和"HarmonyOS”，导致匹配失败。

解：

建立标签同义词映射表
标签归一化处理

// 标签同义词映射
const TAG_SYNONYMS: Map<string, string> = new Map([
  ['鸿蒙', 'HarmonyOS'],
  ['AI', '人工智能'],
  ['ML', '机器学习'],
  ['DL', '深度学习'],
  ['NLP', '自然语言处理'],
])

function normalizeTag(tag: string): string {
  return TAG_SYNONYMS.get(tag) || tag
}

4.5 推荐结果的多样性

坑：纯内容推荐容易导致"信息茧房"，用户只看到同一主题的内容。

解：

**MMR（Maximal Marginal Relevance）**算法：在相关性和多样性之间取平衡
分类配额：限制同一分类的推荐数量
随机探索：以5-10%的概率插入随机内容

// MMR多样性调整
function diversifyWithMMR(
  candidates: RecommendationItem[],
  lambda: number = 0.7,
  topK: number = 20
): RecommendationItem[] {
  const selected: RecommendationItem[] = []
  const remaining = [...candidates]

  // 选择第一个（得分最高的）
  if (remaining.length > 0) {
    selected.push(remaining.shift()!)
  }

  while (selected.length < topK && remaining.length > 0) {
    let bestIdx = 0
    let bestMMR = -Infinity

    for (let i = 0; i < remaining.length; i++) {
      const relevance = remaining[i].score
      // 计算与已选集合的最大相似度
      const maxSim = Math.max(
        ...selected.map(s => computeSimilarity(remaining[i], s))
      )
      const mmr = lambda * relevance - (1 - lambda) * maxSim

      if (mmr > bestMMR) {
        bestMMR = mmr
        bestIdx = i
      }
    }

    selected.push(remaining.splice(bestIdx, 1)[0])
  }

  return selected
}

五、HarmonyOS 6适配

5.1 版本差异

特性	HarmonyOS 5.0	HarmonyOS 6
文本处理	手动分词	新增NLP基础能力
向量计算	手动实现	新增Vector API
端侧ML	无	MindSpore Lite
数据存储	RDB/Preferences	新增向量数据库

5.2 迁移指南

1. 使用NLP基础能力

HarmonyOS 6新增了基础NLP能力，可以辅助分词和关键词提取：

// HarmonyOS 6 NLP能力（概念代码）
import { nlp } from '@kit.AiKit'

async function extractKeywords(text: string): Promise<string[]> {
  const result = await nlp.extractKeywords(text, { topK: 10 })
  return result.keywords
}

2. 向量数据库

HarmonyOS 6支持向量数据库，可以直接存储和检索特征向量：

// HarmonyOS 6 向量数据库（概念代码）
import { vectorStore } from '@kit.ArkData'

const collection = await vectorStore.createCollection({
  name: 'item_features',
  dimension: 128,  // 向量维度
  metric: 'cosine', // 相似度度量
})

// 插入向量
await collection.insert({
  id: 'doc_001',
  vector: new Float32Array([...]),
  metadata: { category: '开发', tags: ['ArkTS'] },
})

// 向量检索
const results = await collection.search(queryVector, { topK: 20 })

3. MindSpore语义向量

使用MindSpore在端侧生成语义向量，替代TF-IDF：

// HarmonyOS 6 端侧语义向量（概念代码）
import { mindSpore } from '@kit.MindSporeKit'

async function getSemanticVector(text: string): Promise<Float32Array> {
  const model = await mindSpore.loadModel('text_embedding.ms')
  const input = encodeText(text)
  const output = await model.predict(input)
  return new Float32Array(output)
}

六、总结

本文完整实现了HarmonyOS端侧的内容推荐系统，从特征提取到推荐生成全流程覆盖：

模块	核心功能	关键技术
特征提取器	TF-IDF计算、标签索引	稀疏向量、倒排索引
用户画像	兴趣标签提取、阅读模式分析	时间衰减、权重归一化
推荐生成	兴趣匹配、相似推荐、搜索推荐	多策略融合、多样性调整

核心要点回顾：

🏷️ 标签体系是端侧推荐的基石，比TF-IDF更轻量、更可控
📊 TF-IDF适合文本相似度，但需注意中文分词和维度爆炸问题
👤 用户画像需要时间衰减，7天半衰期是常用的起点
🎯 兴趣匹配 + 相似推荐 + 搜索推荐三管齐下，覆盖不同场景
🌈 MMR算法解决信息茧房，在相关性和多样性间取平衡
📱 HarmonyOS 6的NLP和向量数据库将大幅简化特征工程

下一篇我们将深入实时推荐与流式计算，讲解如何在用户交互过程中实时更新推荐结果。

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入

HarmonyOS APP开发：内容推荐与特征工程

HarmonyOS APP开发：内容推荐与特征工程

一、背景与动机

二、核心原理

2.1 内容推荐的整体架构

2.2 特征工程的层次

2.3 TF-IDF原理

2.4 向量相似度检索

三、代码实战

3.1 特征提取与TF-IDF计算

3.2 用户画像与兴趣建模

3.3 内容推荐页面完整实现

四、踩坑与注意事项

4.1 中文分词问题

4.2 TF-IDF索引构建耗时

4.3 特征维度爆炸

4.4 标签体系的一致性

4.5 推荐结果的多样性

五、HarmonyOS 6适配

5.1 版本差异

5.2 迁移指南

六、总结

全部回复

设置昵称

关于作者

目录

加入云驻计划，成为创作者

HarmonyOS APP开发：内容推荐与特征工程

HarmonyOS APP开发：内容推荐与特征工程

一、背景与动机

二、核心原理

2.1 内容推荐的整体架构

2.2 特征工程的层次

2.3 TF-IDF原理

2.4 向量相似度检索

三、代码实战

3.1 特征提取与TF-IDF计算

3.2 用户画像与兴趣建模

3.3 内容推荐页面完整实现

四、踩坑与注意事项

4.1 中文分词问题

4.2 TF-IDF索引构建耗时

4.3 特征维度爆炸

4.4 标签体系的一致性

4.5 推荐结果的多样性

五、HarmonyOS 6适配

5.1 版本差异

5.2 迁移指南

六、总结

全部回复

设置昵称

关于作者

目录

热门推荐查看更多

相关文章

加入云驻计划，成为创作者

相关产品