HuffmanTree

举报
tianchou 发表于 2023/11/14 19:28:09 2023/11/14
【摘要】 HuffmanTree的常用操作及例题

title: 【HuffmanTree】
date: 2023-09-28 20:15:27
tags: 数据结构

哈夫曼树

定义

通过==最小堆(最小堆存放树的根结点)==来实现,每次拿出两个权值==最小的二叉树==进行合并,合并后的新树插入最小堆

注意

  1. 哈夫曼 属于 ==树==,也是链式存储
  2. 构建HuffmanTree之前必须先构建MinHeap

特点

image-20230916123759336

  1. ==权值的个数即为叶子节点的个数==
  2. HuffmanTree编码的**码字均在叶结点**上

哈夫曼树是一种带权路径长度最短的树,在一个度为m的哈夫曼树中,其叶结点个数为n,则非叶结点的个数为(n-1) / (m-1)
答:叶结点即度为0的结点有n个;假设度为m的结点个数为x,则x+n=mx+1;也就是x=n-1/m-1;
若n-1不能被整除,即所给数据不能直接构造最优m叉树,这时需要加一些不影响建树的数据,可以添0;添加的个数为(m-1)-((n-1)%(m-1))。所以最终x应该为⌈n-1/m-1⌉ ,即向上取整;

联想:信息论进行m元Huffman编码进行压缩,每m个压缩一次后减少了m-1个,最后一次压缩可能需要补零。q+t=k(m-1)+m

操作

对象

typedef struct node 
{
	int weight;
    struct node* left,right;
}*HuffmanTree;

typedef struct HeapNode 
{
	HuffmanTree data[Maxsize];		//Attention!!
    int size;
}* MinHeap;

	HuffmanTree T=CreateHuffman();

哈夫曼树的创建(初始化)

HuffmanTree CreateHuffman()
{
	Huffman T=(Huffman)malloc(sizeof(struct TreeNode));
	T->left=T->right=NULL;
	T->weight=0;
	return T;
}

哈夫曼树的建立

HuffmanTree BuildHuffman(MinHeap H)
{
    HuffmanTree T;
    for(int i=1;i<H->size;i++)		/*做H->Size-1次合并*/
    {
        T=malloc(sizeof(struct node));		/*建立新结点*/
        T->left=Delete(H);			/*从最小堆中删除一个结点,作为新T的左子结点*/
        T->right=Delete(H);			/*再从最小堆中删除一个结点,作为新T的右子结点*/
        T->weight=T->left->weight+T->right->weight;		/*计算新权值*/
        Insert(H,T);		/*将新T插入最小堆*/
    }
    T=Delete(H);
    return T;
}

需要调用的MinHeap函数

最小堆CreatHeap
MinHeap CreatHeap()
{
    MinHeap H=(MinHeap)malloc(sizeof(struct HeapNode));
	H->data[0]=(Huffman)malloc(sizeof(struct TreeNode));
	H->data[0]->left=H->data[0]->right=NULL;
	H->data[0]->weight=-1;	//哨兵H->data[0]的值最小
	H->size=0;
    return H;
}


最小堆BuildHeap函数
void BuidHeap (MinHeap H)
{    
    HuffmanTree t=(HuffmanTree)malloc(sizeof(struct node));
    t->left=t->right=NULL;
    for(int i=0;i<n;i++)
    {
        cin>>t->weight;
        Insert(H,t);
    }
    return H;
}

最小堆Delete函数
  1. 需要把ElementType改变成HuffmanTree
  2. H->data[child] 改为H->data[child]->weight
  3. t改为t->weight
HuffmanTree Delete(MinHeap H)	//Attention!!!
{
//可有可无
//    if(H->size==0)
//        return;
    
    HuffmanTree min=H->data[1];	//取出根节点最小值,最后return
   /* 用最小堆中最后一个元素从根结点开始向上过滤下层结点 */ 
    HuffmanTree t=H->data[H->size--];
    int parent,child;
    for(parent=1;parent*2<=H->size;parent=child)	//若parent*2>H->size说明parent没有左儿子,也就更没有右儿子
    {
        child=parent*2;		//child指向左右儿子最小的那个,先初始赋值左儿子
        if(child!=H->size&&H->data[child]->weight > H->data[child+1]->weight)	//child!=H->size说明有右儿子
            child++;	
        if(t->weight <= H->data[child]->weight)	break;
        else
            H->data[parent]=H->data[child];
    }
    H->data[parent]=t;
    return min;
}
  1. 将H->Elements[]按==权值==H->Elements[]->weight调整为最小堆
  2. 第11行parent*=2错误,必须是parent=child,作用是parent索引变成儿子索引,向下交换
  3. 第9行~~data[H->size]~~错误,必须是data[H->size--]

最小堆Insert函数
  1. 需要把ElementType改变成HuffmanTree
  2. H->data[i/2] 改为H->data[i/2]->weight
  3. t改为t->weight
void Insert(MinHeap H,HuffmanTree t)
{
    int i;
    if(H->size>=Maxsize)
        return NULL;
    for(i=++H->size;t->weight < H->data[i/2]->weight;i/=2)
        H->data[i]=H->data[i/2];
    H->data[i]=t;
}

WPL的计算

int WPL(HuffmanTree T,int depth)
{
//注意:哈夫曼树没有度为1的节点 
	if(T->left==NULL&&T->right==NULL)	//左右子树都为空
		return depth*T->weight;
	else//递归去左右子树求权重,而且深度加1
		return WPL(T->left,depth+1)+WPL(T->right,depth+1);
}
  1. 初次调用WPL时:int sum=WPL(T,0);使depth的初始值为==0==;(因为==根节点的深度为0==)。这里的深度depth实际是码字长度
  2. 递归出口为根节点T->left==NULL&&T->right==NULL
  3. 递归关系为WPL(T->left,depth+1)+WPL(T->right,depth+1);
  4. 哈夫曼树没有度为1的节点

例题 05-树9 Huffman Codes

In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0’s and '1’s.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

代码长度限制

16 KB

时间限制

400 ms

内存限制

64 MB

[解题思路](数据结构_中国大学MOOC(慕课) (icourse163.org))

分析

image-20220923151214229

image-20220923151404979

image-20220923151427266

代码

#include <iostream>
#include <unordered_map>
#define Maxsize 64
using namespace std;

//构建MinHeap和HeapNode结构体
typedef struct node 
{
	int weight;
    struct node* left, *right;
}*HuffmanTree;
typedef struct HeapNode 
{
	HuffmanTree data[Maxsize];		//Attention!!
    int size;
}* MinHeap;

//定义全局变量
int n,m,min_length;	char c; 	
unordered_map<char,int>cnt;

//函数声明
MinHeap CreatHeap();
HuffmanTree CreateHuffman();
void Insert(MinHeap H,HuffmanTree t);
HuffmanTree Delete(MinHeap H);
int WPL(HuffmanTree T,int depth);
bool judge();
 
int main()
{
	cin>>n;
    
//	建立最小堆 
	MinHeap H=CreatHeap();
    for(int i=0;i<n;i++)
    {
    	HuffmanTree t=CreateHuffman(); 
        cin>>c>>t->weight;
        cnt[c]=t->weight;
        Insert(H,t);
    }
    
//  建立哈夫曼树 
    HuffmanTree T;
    int n=H->size; 	
//  for(int i=1;i<H->size;i++)		 错误,H->size的值会在循环中改变 
    for(int i=1;i<n;i++)			/*做n-1次合并*/
    {
        T=(HuffmanTree)malloc(sizeof(struct node));		
        T->left=Delete(H);			
        T->right=Delete(H);			
        T->weight=T->left->weight+T->right->weight;		/*计算新权值*/
        Insert(H,T);		/*将新T插入最小堆*/
    }
    T=Delete(H);
    
//  计算最短长度 
	min_length=WPL(T,0);
//	判断 
	cin>>m;
	for(int i=0;i<m;i++)	
	{
		if(judge())		printf("Yes\n");
		else			printf("No\n");
	}
	return 0;
}

MinHeap CreatHeap()
{
    MinHeap H=(MinHeap)malloc(sizeof(struct HeapNode));
	H->data[0]=(HuffmanTree)malloc(sizeof(struct node));
	H->data[0]->left=H->data[0]->right=NULL;
	H->data[0]->weight=-1;			//哨兵H->data[0]的值最小
	H->size=0;
    return H;
}

HuffmanTree CreateHuffman()
{
	HuffmanTree T=(HuffmanTree)malloc(sizeof(struct node));
	T->left=T->right=NULL;
	T->weight=0;	//Attention!!
	return T;
}

void Insert(MinHeap H,HuffmanTree t)
{
//    可有可无 
//    if(H->size>=Maxsize)
//        return;
	int i;
    for(i=++H->size;t->weight < H->data[i/2]->weight;i/=2)
        H->data[i]=H->data[i/2];
    H->data[i]=t;
}

HuffmanTree Delete(MinHeap H)	
{
    if(H->size==0)
        return NULL;
    HuffmanTree min=H->data[1];		//取出根节点(weight最小),最后return
    
   /* 用最小堆中最后一个元素从根结点开始向上过滤下层结点 */ 
    HuffmanTree t=H->data[H->size--];
    int parent,child;
    for(parent=1;parent*2<=H->size;parent=child)	//若parent*2>H->size说明parent没有左儿子,也就更没有右儿子
    {
        child=parent*2;		//child指向左右儿子最小的那个,先初始赋值左儿子
        if(child!=H->size&&H->data[child]->weight > H->data[child+1]->weight)	//child!=H->size说明有右儿子
            child++;	
        if(t->weight<=H->data[child]->weight)	break;
        else
            H->data[parent]=H->data[child];
    }
    H->data[parent]=t;
    return min;
}

int WPL(HuffmanTree T,int depth)
{
//注意:哈夫曼树没有度为1的节点 
	if((T->left==NULL)&&(T->right==NULL))//左右子树都为空
		return depth*T->weight;
	else	//递归去左右子树求权重,而且深度加1
		return WPL(T->left,depth+1)+WPL(T->right,depth+1);
}

bool judge()
{
	int len=0;
    bool flag=1;
	string codes; 
	HuffmanTree T=CreateHuffman();      //模拟建树
	for(int i=0;i<n;i++)
	{
		cin>>c>>codes;
		if(codes.length()>=n)		//也可以没有这个判断
			return 0;
		else
		{
			HuffmanTree p = T;
			for(int j=0;j<codes.length();j++)
			{
				if(codes[j]=='0') 
				{
					if(!p->left)
						p->left = CreateHuffman();
					p = p->left;
					
				}
				else if(codes[j] == '1') 
				{
					if(!p->right)
						p->right = CreateHuffman();
					p = p->right;
				}
				if(p->weight) 	//说明已经被访问过了 
					flag = 0;	//不能直接打印,要把后面的读完 
			}
			if(p->left || p->right )		//说明该结点不是叶子节点 
				flag = 0;
			else
				p->weight = 1;
		}
		len += codes.length()*cnt[c];
	}
	if(len!=min_length)
		flag = 0;
	return flag;
}

注意

judge函数模拟建树过程中

  1. 左右移动过程中不能遇到已经访问的节点
  2. 最后赋值的节点必须是叶节点
  3. 节点weight起到标记flag的作用
【声明】本内容来自华为云开发者社区博主,不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源(华为云社区)、文章链接、文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱: cloudbbs@huaweicloud.com
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。