《DATA STRUCTRUES A Psuedocode Approach with C++》Learn Note

看到此文,是否觉得体内洪荒之力爆发,饥渴难耐想吐槽、情不自禁想捐赠
本文为原创文章,尊重辛勤劳动,可以免费摘要、推荐或聚合,亦可完整转载,但完整转载需要标明原出处,违者必究。

支付宝微  信

Chapter 1 Introduction

1-2 The Abstract Data Type

什么事抽象数据类型(ADT)?

下面是我的理解

描述的是一种抽象的数据,那么这个数据的抽象属性该如何描述呢?

定义一个(抽象的)数据,其中包含数据的存储方式,一些操作方法。但是,对外屏蔽其实现细节。也就是说,对于使用者而言,知道它能做些什么事情,但不需要知道它是如何实现的。即抽象数据类型。

举例来说,C++中的Stack,Queue,Java中的Class即为抽象数据类型的例子。

Chapter 2 Searching

2-1 List Searches

Sequential Search

Sequential Search(顺序搜索)

Sentinel Search(哨兵搜索)

哨兵搜索,相对于顺序搜索,主要是通过在序列尾部追加目标值,进而减少在搜索过程中下标索引的判断次数,以提升搜索性能。

Probability Search(概率搜索)

在这种算法中,尽可能按照元素被搜索的可能性由高到低从头到尾进行排列。当被搜索的元素个数繁多,但是只有少部分元素被频繁搜索时,该算法很奏效。

Ordered List Search (Interval Search)

Binary Search(二分搜索)

如果被搜索元素是有序存储的,那么可以通过检测中间元素与目标值的情况,来判定目标值是在左半部分,还是在有半部分,然后再取其中间值与目标值进行比较,进行上述迭代过程,直到寻找到目标值。

时间复杂度:O(lgn) n是被搜索数组的元素个数

2-3 Hashed List Searches

We call the set of keys that hash to the same location in our list synonyms(同义词).
A collision(冲突) occurs when a hashing algorithm produces ans address for an insertion key and that address is already occupied.
The address produced by the hashing algorithm is known as the home address(内部地址).
The memory that contains all of the home addresses is known aas prime area(基本区域).
Each calculation of an address and test for success is known probe(探测).

Hashing Methods

Direct Method

Advantages: It can be very powerful because it guarantees that there are no synonyms.

举例:有一个小组织,其中有100个员工。员工的编号介于1到100之间,且无重复的。那么可以直接使用员工的编码作为散列表的地址。

Subtraction Method

如果员工的编号从1001开始,不超过1100。那么,可以将员工编号减去1000后直接作为地址。

Modulo-Division Method(除法求余)

Also known as division remainder(除法余数),the modulo-division method divides the key by the array size and uses the remainder for the address.

The list size that is a prime number(素数) produces fewer collisions that other list sizes.

Digit-Extraction Method

举例(从六位数字中抽取某些指定位为地址):

379452 -> 394
121267 -> 112
378845 -> 388

Midsquare Method

举例(将Key——数字平方运算后,取中间部分数字作为地址):

9452 * 9452 = 89340304: address is 3403

Folding Methods

Fold shift
123456789 ->
123 + 456 + 789 => (1)368

Fold boundary
123456789 ->
321 + 456 + 789 => (1)764

Roation Method

该种哈希方法通常配合其他哈希方法使用(如folding methods、pseudorandom method),可以将数据均匀的分布在空间中。

Pseudorandom Method

y = ax + c 

For maximum efficiency, the factors a and c should be prime numbers.

Hashing Algorithm

The first step is to convert the alphanumeric key into a number by adding the American Standard Code for Information Interchange(ASCII) value for each character to an accumulator that will be the address. As each character is added, we rotate the bits int the address to maximize the distribution of the values.

We use fold shift when we add the individual characters to the address. We use rotaion when we rotate the address after each addtion. Finally, we use modulo division when we map the hashed address into the range of available address.

2-4 Collision Resolution

As a rule of thumb, a hashed list should not be allowed to become more than 75% full.

The load factor is assigned the symbol alpha.
a = k / n * 100
k represents the number of filled elements in the list and n represents the total number of elements.

Primary Clustering(一次群集)

在使用线性探测法时,会导致一段连续的被占用的槽。比如,假设当前5、6与7三个槽被占用,那么与当前槽中5、6、7中任意一个同义词(synonym)出现时,其会被存放到8槽中;此时,如果出现一个与5、6、7或8中互为同义词的元素时,会被存放到9槽中。这样,就会导致一段连续的被占用的槽。会导致散列表搜索的效率下降。这种现象即为一次群集

Secondary Clustering(二次群集)

当使用二次探测法时,由于探测序列不是连续,不会出现一整段连续的被占用的槽。但是会出现多个长度相对较小的连续被占用的槽。这种现象即为二次群集,他比一次群集对性能的影响会轻。

Open Addressing

The collisions are resolved by placing the data in a separate overflow area.

Linear Probe

Adavantages

  • Simple to implement
  • Data tend to remain near their home address

Disadvantages

  • Primary Clustering
  • Search algorithm more complex, after data have been deleted

Quadratic Probe

Adavantages

  • Prevent the Primary Clustering, but Sencondary Clustering

Disadavantages

  • Time required to square the probe number
  • Can not generate every address in the list(The solution is to use a list size which is a prime number. In which at least half of the list is reachable)

Pseudorandom Collision Resolution

The last two methods, pseudorandom and key offset, are collectively known as double hashing(双重散列法).

Equation

y = ax + c
y = y % p

Advantages

Easy to implement

Disadvantages

Secondary Clustering(All keys follow only one collision resolution path through the list)

Key Offset

offset = key / listSize
address = ( offset + old address ) modulo listSize

Advantages

Prevent the Secondary Clustering

Linked List Resolution

The advantages of Open Addressing is that each collision resolution increases the probability of future coliisions.

Linked List is an ordered collection of data in which each element contains the data and the next location of the next element.

Linked list resolution uses the separate area to store collisions and chains all synonyms together in a linked list.

It uses two storage areas, the prime area and the overflow area.

The linked list could be stored in LIFO(last in-first out) sequence, it is faster to inserts because the linked list dose not have to be scanned to insert data. The element being inserted to the list is simpliy placed at begining of the linked list and linked to the node in the prime area.


这是一篇原创文章,如果您觉得有价值,可以通过捐赠来支持我的创作~
捐赠者会展示在博客的某个页面,钱将会用在有价值的地方,思考中...


分类: 技术, 数据结构, 随记 | 标签: , , , , | 评论 | Permalink

发表评论

电子邮件地址不会被公开。