Chapter 1 Introduction

1-2 The Abstract Data Type

`下面是我的理解`

Chapter 2 Searching

2-1 List Searches

Sequential Search

Sequential Search（顺序搜索）

Sentinel Search（哨兵搜索）

Probability Search（概率搜索）

Ordered List Search (Interval Search)

Binary Search（二分搜索）

2-3 Hashed List Searches

We call the set of keys that hash to the same location in our list synonyms（同义词）.
A collision（冲突） occurs when a hashing algorithm produces ans address for an insertion key and that address is already occupied.
The address produced by the hashing algorithm is known as the home address（内部地址）.
The memory that contains all of the home addresses is known aas prime area（基本区域）.
Each calculation of an address and test for success is known probe（探测）.

Hashing Methods

Direct Method

Advantages: It can be very powerful because it guarantees that there are no synonyms.

Subtraction Method

Modulo-Division Method（除法求余）

Also known as division remainder（除法余数），the modulo-division method divides the key by the array size and uses the remainder for the address.

The list size that is a prime number（素数） produces fewer collisions that other list sizes.

Digit-Extraction Method

379452 -> 394
121267 -> 112
378845 -> 388

Midsquare Method

9452 * 9452 = 89340304: address is 3403

Folding Methods

Fold shift
123456789 ->
123 + 456 + 789 => (1)368

Fold boundary
123456789 ->
321 + 456 + 789 => (1)764

Roation Method

Pseudorandom Method

`y = ax + c `

For maximum efficiency, the factors a and c should be prime numbers.

Hashing Algorithm

The first step is to convert the alphanumeric key into a number by adding the American Standard Code for Information Interchange(ASCII) value for each character to an accumulator that will be the address. As each character is added, we rotate the bits int the address to maximize the distribution of the values.

We use fold shift when we add the individual characters to the address. We use rotaion when we rotate the address after each addtion. Finally, we use modulo division when we map the hashed address into the range of available address.

2-4 Collision Resolution

As a rule of thumb, a hashed list should not be allowed to become more than 75% full.

The load factor is assigned the symbol alpha.
a = k / n * 100
k represents the number of filled elements in the list and n represents the total number of elements.

Primary Clustering（一次群集）

Secondary Clustering（二次群集）

The collisions are resolved by placing the data in a separate overflow area.

Linear Probe

• Simple to implement
• Data tend to remain near their home address

• Primary Clustering
• Search algorithm more complex, after data have been deleted

• Prevent the Primary Clustering, but Sencondary Clustering

• Time required to square the probe number
• Can not generate every address in the list(The solution is to use a list size which is a prime number. In which at least half of the list is reachable)

Pseudorandom Collision Resolution

The last two methods, pseudorandom and key offset, are collectively known as double hashing(双重散列法).

Equation

y = ax + c
y = y % p

Easy to implement

Secondary Clustering(All keys follow only one collision resolution path through the list)

Key Offset

offset = key / listSize

Prevent the Secondary Clustering

The advantages of Open Addressing is that each collision resolution increases the probability of future coliisions.

Linked List is an ordered collection of data in which each element contains the data and the next location of the next element.

Linked list resolution uses the separate area to store collisions and chains all synonyms together in a linked list.

It uses two storage areas, the prime area and the overflow area.

The linked list could be stored in LIFO(last in-first out) sequence, it is faster to inserts because the linked list dose not have to be scanned to insert data. The element being inserted to the list is simpliy placed at begining of the linked list and linked to the node in the prime area.

Share:

This site uses Akismet to reduce spam. Learn how your comment data is processed.