HuffmanTree
title: 【HuffmanTree】
date: 2023-09-28 20:15:27
tags: 数据结构
哈夫曼树
定义
通过==最小堆(最小堆存放树的根结点)==来实现,每次拿出两个权值==最小的二叉树==进行合并,合并后的新树插入最小堆
注意
- 哈夫曼树 属于 ==树==,也是链式存储
- 构建
HuffmanTree
之前必须先构建MinHeap
特点
- ==权值的个数即为叶子节点的个数==
- HuffmanTree编码的**码字均在叶结点**上
哈夫曼树是一种带权路径长度最短的树,在一个度为m的哈夫曼树中,其叶结点个数为n,则非叶结点的个数为(n-1) / (m-1)。
答:叶结点即度为0的结点有n个;假设度为m的结点个数为x,则x+n=mx+1;也就是x=n-1/m-1;
若n-1不能被整除,即所给数据不能直接构造最优m叉树,这时需要加一些不影响建树的数据,可以添0;添加的个数为(m-1)-((n-1)%(m-1))。所以最终x应该为⌈n-1/m-1⌉ ,即向上取整;
联想:信息论进行m元Huffman编码进行压缩,每m个压缩一次后减少了m-1个,最后一次压缩可能需要补零。q+t=k(m-1)+m
操作
对象
typedef struct node
{
int weight;
struct node* left,right;
}*HuffmanTree;
typedef struct HeapNode
{
HuffmanTree data[Maxsize]; //Attention!!
int size;
}* MinHeap;
HuffmanTree T=CreateHuffman();
哈夫曼树的创建(初始化)
HuffmanTree CreateHuffman()
{
Huffman T=(Huffman)malloc(sizeof(struct TreeNode));
T->left=T->right=NULL;
T->weight=0;
return T;
}
哈夫曼树的建立
HuffmanTree BuildHuffman(MinHeap H)
{
HuffmanTree T;
for(int i=1;i<H->size;i++) /*做H->Size-1次合并*/
{
T=malloc(sizeof(struct node)); /*建立新结点*/
T->left=Delete(H); /*从最小堆中删除一个结点,作为新T的左子结点*/
T->right=Delete(H); /*再从最小堆中删除一个结点,作为新T的右子结点*/
T->weight=T->left->weight+T->right->weight; /*计算新权值*/
Insert(H,T); /*将新T插入最小堆*/
}
T=Delete(H);
return T;
}
需要调用的MinHeap函数
最小堆CreatHeap
MinHeap CreatHeap()
{
MinHeap H=(MinHeap)malloc(sizeof(struct HeapNode));
H->data[0]=(Huffman)malloc(sizeof(struct TreeNode));
H->data[0]->left=H->data[0]->right=NULL;
H->data[0]->weight=-1; //哨兵H->data[0]的值最小
H->size=0;
return H;
}
最小堆BuildHeap函数
void BuidHeap (MinHeap H)
{
HuffmanTree t=(HuffmanTree)malloc(sizeof(struct node));
t->left=t->right=NULL;
for(int i=0;i<n;i++)
{
cin>>t->weight;
Insert(H,t);
}
return H;
}
最小堆Delete函数
- 需要把
ElementType
改变成HuffmanTree
- 将
H->data[child]
改为H->data[child]->weight
- 将
t
改为t->weight
HuffmanTree Delete(MinHeap H) //Attention!!!
{
//可有可无
// if(H->size==0)
// return;
HuffmanTree min=H->data[1]; //取出根节点最小值,最后return
/* 用最小堆中最后一个元素从根结点开始向上过滤下层结点 */
HuffmanTree t=H->data[H->size--];
int parent,child;
for(parent=1;parent*2<=H->size;parent=child) //若parent*2>H->size说明parent没有左儿子,也就更没有右儿子
{
child=parent*2; //child指向左右儿子最小的那个,先初始赋值左儿子
if(child!=H->size&&H->data[child]->weight > H->data[child+1]->weight) //child!=H->size说明有右儿子
child++;
if(t->weight <= H->data[child]->weight) break;
else
H->data[parent]=H->data[child];
}
H->data[parent]=t;
return min;
}
- 将H->Elements[]按==权值==
H->Elements[]->weight
调整为最小堆- 第11行
parent*=2错误,必须是parent=child
,作用是parent索引变成儿子索引,向下交换- 第9行~~
data[H->size]
~~错误,必须是data[H->size--]
最小堆Insert函数
- 需要把
ElementType
改变成HuffmanTree
- 将
H->data[i/2]
改为H->data[i/2]->weight
- 将
t
改为t->weight
void Insert(MinHeap H,HuffmanTree t)
{
int i;
if(H->size>=Maxsize)
return NULL;
for(i=++H->size;t->weight < H->data[i/2]->weight;i/=2)
H->data[i]=H->data[i/2];
H->data[i]=t;
}
WPL的计算
int WPL(HuffmanTree T,int depth)
{
//注意:哈夫曼树没有度为1的节点
if(T->left==NULL&&T->right==NULL) //左右子树都为空
return depth*T->weight;
else//递归去左右子树求权重,而且深度加1
return WPL(T->left,depth+1)+WPL(T->right,depth+1);
}
- 初次调用WPL时:
int sum=WPL(T,0);
使depth的初始值为==0==;(因为==根节点的深度为0==)。这里的深度depth
实际是码字长度- 递归出口为根节点
T->left==NULL&&T->right==NULL
- 递归关系为
WPL(T->left,depth+1)+WPL(T->right,depth+1);
- 哈夫曼树没有度为1的节点
例题 05-树9 Huffman Codes
In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i]
is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i]
is the frequency of c[i]
and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i]
is the i
-th character and code[i]
is an non-empty string of no more than 63 '0’s and '1’s.
Output Specification:
For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
Sample Input:
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Sample Output:
Yes
Yes
No
No
代码长度限制
16 KB
时间限制
400 ms
内存限制
64 MB
[解题思路](数据结构_中国大学MOOC(慕课) (icourse163.org))
分析
代码
#include <iostream>
#include <unordered_map>
#define Maxsize 64
using namespace std;
//构建MinHeap和HeapNode结构体
typedef struct node
{
int weight;
struct node* left, *right;
}*HuffmanTree;
typedef struct HeapNode
{
HuffmanTree data[Maxsize]; //Attention!!
int size;
}* MinHeap;
//定义全局变量
int n,m,min_length; char c;
unordered_map<char,int>cnt;
//函数声明
MinHeap CreatHeap();
HuffmanTree CreateHuffman();
void Insert(MinHeap H,HuffmanTree t);
HuffmanTree Delete(MinHeap H);
int WPL(HuffmanTree T,int depth);
bool judge();
int main()
{
cin>>n;
// 建立最小堆
MinHeap H=CreatHeap();
for(int i=0;i<n;i++)
{
HuffmanTree t=CreateHuffman();
cin>>c>>t->weight;
cnt[c]=t->weight;
Insert(H,t);
}
// 建立哈夫曼树
HuffmanTree T;
int n=H->size;
// for(int i=1;i<H->size;i++) 错误,H->size的值会在循环中改变
for(int i=1;i<n;i++) /*做n-1次合并*/
{
T=(HuffmanTree)malloc(sizeof(struct node));
T->left=Delete(H);
T->right=Delete(H);
T->weight=T->left->weight+T->right->weight; /*计算新权值*/
Insert(H,T); /*将新T插入最小堆*/
}
T=Delete(H);
// 计算最短长度
min_length=WPL(T,0);
// 判断
cin>>m;
for(int i=0;i<m;i++)
{
if(judge()) printf("Yes\n");
else printf("No\n");
}
return 0;
}
MinHeap CreatHeap()
{
MinHeap H=(MinHeap)malloc(sizeof(struct HeapNode));
H->data[0]=(HuffmanTree)malloc(sizeof(struct node));
H->data[0]->left=H->data[0]->right=NULL;
H->data[0]->weight=-1; //哨兵H->data[0]的值最小
H->size=0;
return H;
}
HuffmanTree CreateHuffman()
{
HuffmanTree T=(HuffmanTree)malloc(sizeof(struct node));
T->left=T->right=NULL;
T->weight=0; //Attention!!
return T;
}
void Insert(MinHeap H,HuffmanTree t)
{
// 可有可无
// if(H->size>=Maxsize)
// return;
int i;
for(i=++H->size;t->weight < H->data[i/2]->weight;i/=2)
H->data[i]=H->data[i/2];
H->data[i]=t;
}
HuffmanTree Delete(MinHeap H)
{
if(H->size==0)
return NULL;
HuffmanTree min=H->data[1]; //取出根节点(weight最小),最后return
/* 用最小堆中最后一个元素从根结点开始向上过滤下层结点 */
HuffmanTree t=H->data[H->size--];
int parent,child;
for(parent=1;parent*2<=H->size;parent=child) //若parent*2>H->size说明parent没有左儿子,也就更没有右儿子
{
child=parent*2; //child指向左右儿子最小的那个,先初始赋值左儿子
if(child!=H->size&&H->data[child]->weight > H->data[child+1]->weight) //child!=H->size说明有右儿子
child++;
if(t->weight<=H->data[child]->weight) break;
else
H->data[parent]=H->data[child];
}
H->data[parent]=t;
return min;
}
int WPL(HuffmanTree T,int depth)
{
//注意:哈夫曼树没有度为1的节点
if((T->left==NULL)&&(T->right==NULL))//左右子树都为空
return depth*T->weight;
else //递归去左右子树求权重,而且深度加1
return WPL(T->left,depth+1)+WPL(T->right,depth+1);
}
bool judge()
{
int len=0;
bool flag=1;
string codes;
HuffmanTree T=CreateHuffman(); //模拟建树
for(int i=0;i<n;i++)
{
cin>>c>>codes;
if(codes.length()>=n) //也可以没有这个判断
return 0;
else
{
HuffmanTree p = T;
for(int j=0;j<codes.length();j++)
{
if(codes[j]=='0')
{
if(!p->left)
p->left = CreateHuffman();
p = p->left;
}
else if(codes[j] == '1')
{
if(!p->right)
p->right = CreateHuffman();
p = p->right;
}
if(p->weight) //说明已经被访问过了
flag = 0; //不能直接打印,要把后面的读完
}
if(p->left || p->right ) //说明该结点不是叶子节点
flag = 0;
else
p->weight = 1;
}
len += codes.length()*cnt[c];
}
if(len!=min_length)
flag = 0;
return flag;
}
注意
judge函数模拟建树过程中
- 左右移动过程中不能遇到已经访问的节点
- 最后赋值的节点必须是叶节点
- 节点weight起到标记flag的作用
- 点赞
- 收藏
- 关注作者
评论(0)