这篇文章写自英文,阅读原文章.
由于本文暂未被翻译成中文,请使用下面工具选项进行翻译
Introduction
An algorithm is a procedure or description of a procedure for solving a problem. An algorithm is a step-by-step specification of operations to be performed in order to compute an output, process data, or perform a function. An algorithm must always be correct (it must always produce a valid output) and it must be finite (it must terminate after a finite number of steps).
Algorithms are not code. Programs and code in a particular language are implementations of algorithms. The word, “algorithm” itself is derived from the latinization of Abu ‘Muhammad ibn Musa al-Khwarizmi, a Persian mathematician (c. 780 – 850). The concept of algorithms predates modern computers by several thousands of years. Euclid’s algorithm for computing the greatest common denominator is 2,300 years old.
Often, to be useful an algorithm must also be feasible: given its input, it must execute in a reasonable amount of time using a reasonable amount of resources. Depending on the application requirements our tolerance may be on the order of a few milliseconds to several days. An algorithm that takes years or centuries to execute is certainly not consideredfeasible.
Deterministic An algorithm is deterministic if, when given a particular input, will always go through the exact same computational process and produce the same output. Most of the algorithms you’ve used up to this point are deterministic.
Randomized An algorithm is randomized is an algorithm that involves some form of random input. The random source can be used to make decisions such as random selections or to generate random state in a program as candidate solutions. There are many types of randomized algorithms including Monte-Carlo algorithms (that may have some error with low probability), Las Vagas algorithms (whose results are always correct, but may fail with a certain probability to produce any results), etc.
Optimization Many algorithms seek not only to find a solution to a problem, but to find the best, optimal solution. Many of these type of algorithms are heuristics: rather than finding the actual best solution (which may be infeasible), they can approximate a solution (Approximation algorithms). Other algorithmssimulate biological processes (Genetic algorithms, Ant Colony algorithms, etc.) to search for an optimal solution.
Parallel Most modern processors are multicore, meaning that they have more than one processor on a chip. Many servers have dozens of processors that work together. Multiple processors can be utilized by designing parallel algorithms that can split work across multiple processes or threads which can be executed in parallel to each other, improving overall performance
Distributed Computation can also be distributed among completely separate devices that may be located half way across the globe. Massive distributed computation networks have been built for research such as simulating protein folding (Folding@Home).
An algorithm is a more abstract, generalization of what you might be used to in a typical programming language. In an actual program, you may have functions/methods, subroutines or procedures, etc. Each one of these pieces of code could be considered an algorithm in and of itself. The combination of these smaller pieces create more complex algorithms, etc. A program is essentially a concrete implementation of a more general, theoretical algorithm.
When a program executes, it expends some amount of resources.For example:
Time The most obvious resource an algorithm takes is time: how long the algorithm takes to finish its computation (measured in seconds, minutes, etc.). Alternatively, time can be measured in how many CPU cycles or floating-point operations a particular piece of hardware takes to execute the algorithm.
Memory y The second major resource in a computer is memory. An algorithm requires memory to store the input, output, and possibly extra memory during its execution. How much memory an algorithm uses in its execution may be even more of an important consideration than time in certain environments or systems where memory is extremely limited such as embedded systems.
Power The amount of power a device consumes is an important consideration when you have limited capacity such as a battery in a mobile device. From a consumer’s perspective, a slower phone that offered twice the batter life may be preferable. In certain applications such as wireless sensor networks or autonomous systems power may be more of a concern than either time or memory.
Bandwidth In computer networks, efficiency is measured by how much data you can transmit from one computer to another, called throughput. Throughput is generally limited by a network’s bandwidth: how much a network connection can transmit under ideal circumstances (no data loss, no retransmission, etc.)
Circuitry When designing hardware, resources are typically measured in the number of gates or wires are required to implement the hardware. Fewer gates and wires means you can fit more chips on a silicon die which results in cheaper hardware. Fewer wires and gates also means faster processing.
Idleness Even when a computer isn’t computing anything, it can still be “costing” you something. Consider purchasing hardware that runs a web server for a small user base. There is a substantial investment in the hardware which requires maintenance and eventually must be replaced. However, since the user base is small, most of the time it sits idle, consuming power. A better solution may be to use the same hardware to serve multiple virtual machines (VMs). Now several small web serves can be served with the same hardware, increasing our utilization of the hardware. In scenarios like this, the lack of work being performed is the resource.
Load Somewhat the opposite of idleness, sometimes an application or service may have occasional periods of high demand. The ability of a system to service such high loads may be considered a resource, even if the capacity to handle them goes unused most of the time.
These are all very important engineering and business considerations when designing systems, code, and algorithms. However, we’ll want to consider the complexity of algorithms in a more abstract manner.
Suppose we have to different programs (or algorithms) A and B. Both of those algorithms are correct, but A uses fewer of the above resources than B. Clearly, algorithm A is the better, more efficient solution. However, how can we better quantify this efficiency?
List Operations
To give a concrete example, consider a typical list ADT. The list could be implemented as an array-based list (where the class owns a static array that is resized/copied when full) or a linked list (with nodes containing elements and linking to the next node in the list). Some operations are "cheap" on one type of list while other operations may be more "expensive."
Consider the problem of inserting a new element into the list at the beginning (index 0). For a linked list this involves creating a new node and shuffling a couple of references. The number of operations in this case is not contingent on the size of the the list. In contrast, for an array-based list, if the list contains n elements, each element will need to be shifted over one position in the array in order to make room for the element to be inserted. The number of shifts is proportional to the number of elements in the array, n. Clearly for this operation, a linked list is better (more efficient).
Now consider a different operation: given an index i, retrieve the i-th element in the list. For an array-based list we have the advantage of random access to the array. When we index an element, arr[i], it only takes one memory address computation to "jump" to the memory location containing the i-th element. In contrast here, a linked list would require us to start at the head, and traverse the list until we reach the i-th node. This requires i traversal operations. In the worst case, retrieving the last element, the n-th element, would require n such operations. A summary of these operations can be found in Table 2.1.
| List Type | Insert at start | Index-based Retrieve |
|---|---|---|
| Array-based List | n | 1 |
| Linked List | 2 | i ≈ n |
Table 2.1: Summary of the Complexity of List Operations
Already we have a good understanding of the relative performance of algorithms based on the type of data structure we choose. In this example we saw constant time algorithms and linear algorithms. Constant time algorithms execute a constant number of operations regardless of the size of the input (in this case, the size of the list n). Linear algorithms perform a number of operations linearly related to the size of the input.
In the following examples, we’ll begin to be a little bit more formal about this type of analysis.