program tip

미니멀리스트 예 Haskell quicksort가 "true"quicksort가 아닌 이유는 무엇입니까?

radiobox 2020. 8. 5. 08:04
반응형

미니멀리스트 예 Haskell quicksort가 "true"quicksort가 아닌 이유는 무엇입니까?


Haskell의 웹 사이트는 아래와 같이 매우 매력적인 5 줄 퀵 정렬 기능을 소개 합니다.

quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser = filter (< p) xs
        greater = filter (>= p) xs

또한 "C의 True quicksort" 도 포함합니다 .

// To sort array a[] of size n: qsort(a,0,n-1)

void qsort(int a[], int lo, int hi) 
{
  int h, l, p, t;

  if (lo < hi) {
    l = lo;
    h = hi;
    p = a[hi];

    do {
      while ((l < h) && (a[l] <= p)) 
          l = l+1;
      while ((h > l) && (a[h] >= p))
          h = h-1;
      if (l < h) {
          t = a[l];
          a[l] = a[h];
          a[h] = t;
      }
    } while (l < h);

    a[hi] = a[l];
    a[l] = p;

    qsort( a, lo, l-1 );
    qsort( a, l+1, hi );
  }
}

C 버전 아래의 링크는 '소개에 인용 된 퀵 정렬은 "실제"퀵 정렬이 아니며 c 코드와 같이 더 긴 목록으로 확장되지 않습니다 " 라는 페이지로 연결됩니다 .

위의 Haskell 기능이 실제 퀵 정렬이 아닌 이유는 무엇입니까? 더 긴 목록을 위해 어떻게 확장하지 못합니까?


진정한 퀵 정렬에는 두 가지 아름다운 측면이 있습니다.

  1. 나누고 정복 : 문제를 두 개의 작은 문제로 나눕니다.
  2. 요소를 제자리에 파티션하십시오.

짧은 Haskell 예제는 (1)을 보여 주지만 (2)는 보여주지 않습니다. 기술을 아직 모른다면 (2)가 어떻게 수행되는지 명확하지 않을 수 있습니다!


하스켈의 진정한 인플레 이스 퀵 정렬 :

import qualified Data.Vector.Generic as V 
import qualified Data.Vector.Generic.Mutable as M 

qsort :: (V.Vector v a, Ord a) => v a -> v a
qsort = V.modify go where
    go xs | M.length xs < 2 = return ()
          | otherwise = do
            p <- M.read xs (M.length xs `div` 2)
            j <- M.unstablePartition (< p) xs
            let (l, pr) = M.splitAt j xs 
            k <- M.unstablePartition (== p) pr
            go l; go $ M.drop k pr

다음은 "true"quicksort C 코드를 Haskell로 음역 한 것입니다. 자신을 보호하십시오.

import Control.Monad
import Data.Array.IO
import Data.IORef

qsort :: IOUArray Int Int -> Int -> Int -> IO ()
qsort a lo hi = do
  (h,l,p,t) <- liftM4 (,,,) z z z z

  when (lo < hi) $ do
    l .= lo
    h .= hi
    p .=. (a!hi)

    doWhile (get l .< get h) $ do
      while ((get l .< get h) .&& ((a.!l) .<= get p)) $ do
        modifyIORef l succ
      while ((get h .> get l) .&& ((a.!h) .>= get p)) $ do
        modifyIORef h pred
      b <- get l .< get h
      when b $ do
        t .=. (a.!l)
        lVal <- get l
        hVal <- get h
        writeArray a lVal =<< a!hVal
        writeArray a hVal =<< get t

    lVal <- get l
    writeArray a hi =<< a!lVal
    writeArray a lVal =<< get p

    hi' <- fmap pred (get l)
    qsort a lo hi'
    lo' <- fmap succ (get l)
    qsort a lo' hi

재미 있지 않습니까? 실제로 함수의 끝 부분 let뿐만 아니라 시작 부분 에서이 부분을 크게 잘라내어 where앞의 코드를 다소 아름답게 만드는 모든 도우미를 정의했습니다.

  let z :: IO (IORef Int)
      z = newIORef 0
      (.=) = writeIORef
      ref .=. action = do v <- action; ref .= v
      (!) = readArray
      (.!) a ref = readArray a =<< get ref
      get = readIORef
      (.<) = liftM2 (<)
      (.>) = liftM2 (>)
      (.<=) = liftM2 (<=)
      (.>=) = liftM2 (>=)
      (.&&) = liftM2 (&&)
  -- ...
  where doWhile cond foo = do
          foo
          b <- cond
          when b $ doWhile cond foo
        while cond foo = do
          b <- cond
          when b $ foo >> while cond foo

그리고 여기, 멍청한 테스트가 작동하는지 확인하십시오.

main = do
    a <- (newListArray (0,9) [10,9..1]) :: IO (IOUArray Int Int)
    printArr a
    putStrLn "Sorting..."
    qsort a 0 9
    putStrLn "Sorted."
    printArr a
  where printArr a = mapM_ (\x -> print =<< readArray a x) [0..9]

Haskell에서는 명령형 코드를 자주 작성하지 않으므로이 코드를 정리할 수있는 방법이 많이 있습니다.

그래서 무엇?

You will notice that the above code is very, very long. The heart of it is about as long as the C code, though each line is often a bit more verbose. This is because C secretly does a lot of nasty things that you might take for granted. For example, a[l] = a[h];. This accesses the mutable variables l and h, and then accesses the mutable array a, and then mutates the mutable array a. Holy mutation, batman! In Haskell, mutation and accessing mutable variables is explicit. The "fake" qsort is attractive for various reasons, but chief among them is it does not use mutation; this self-imposed restriction makes it much easier to understand at a glance.


In my opinion, saying that it's "not a true quicksort" overstates the case. I think it's a valid implementation of the Quicksort algorithm, just not a particularly efficient one.


I think the case this argument tries to make is that the reason why quicksort is commonly used is that it's in-place and fairly cache-friendly as a result. Since you don't have those benefits with Haskell lists, its main raison d'être is gone, and you might as well use merge sort, which guarantees O(n log n), whereas with quicksort you either have to use randomization or complicated partitioning schemes to avoid O(n2) run time in the worst case.


Thanks to lazy evaluation, a Haskell program doesn't (almost can't) do what it looks like it does.

Consider this program:

main = putStrLn (show (quicksort [8, 6, 7, 5, 3, 0, 9]))

In an eager language, first quicksort would run, then show, then putStrLn. A function's arguments are computed before that function starts running.

In Haskell, it's the opposite. The function starts running first. The arguments are only computed when the function actually uses them. And a compound argument, like a list, is computed one piece at a time, as each piece of it is used.

So the first thing that happens in this program is that putStrLn starts running.

GHC's implementation of putStrLn works by copying the characters of the argument String into to an output buffer. But when it enters this loop, show has not run yet. Therefore, when it goes to copy the first character from the string, Haskell evaluates the fraction of the show and quicksort calls needed to compute that character. Then putStrLn moves on to the next character. So the execution of all three functions—putStrLn, show, and quicksort— is interleaved. quicksort executes incrementally, leaving a graph of unevaluated thunks as it goes to remember where it left off.

Now this is wildly different from what you might expect if you're familiar with, you know, any other programming language ever. It's not easy to visualize how quicksort actually behaves in Haskell in terms of memory accesses or even the order of comparisons. If you could only observe the behavior, and not the source code, you would not recognize what it's doing as a quicksort.

For example, the C version of quicksort partitions all the data before the first recursive call. In the Haskell version, the first element of the result will be computed (and could even appear on your screen) before the first partition is finished running—indeed before any work at all is done on greater.

P.S. The Haskell code would be more quicksort-like if it did the same number of comparisons as quicksort; the code as written does twice as many comparisons because lesser and greater are specified to be computed independently, doing two linear scans through the list. Of course it's possible in principle for the compiler to be smart enough to eliminate the extra comparisons; or the code could be changed to use Data.List.partition.

P.P.S. The classic example of Haskell algorithms turning out not to behave how you expected is the sieve of Eratosthenes for computing primes.


I believe that the reason most people say that the pretty Haskell Quicksort isn't a "true" Quicksort is the fact that it isn't in-place - clearly, it can't be when using immutable datatypes. But there is also the objection that it isn't "quick": partly because of the expensive ++, and also because there is a space leak - you hang on to the input list while doing the recursive call on the lesser elements, and in some cases - eg when the list is decreasing - this results in quadratic space usage. (You might say that making it run in linear space is the closest you can get to "in-place" using immutable data.) There are neat solutions to both problems, using accumulating parameters, tupling, and fusion; see S7.6.1 of Richard Bird's Introduction to Functional Programming Using Haskell.


There is no clear definition of what is and what isn't a true quicksort.

They are calling it not a true quicksort, because it doesn't sort in-place:

True quicksort in C sorts in-place


It isn't the idea of mutating elements in-place in purely functional settings. The alternative methods in this thread with mutable arrays lost the spirit of purity.

There are at least two steps to optimize the basic version (which is the most expressive version) of quick-sort.

  1. Optimize the concatenation (++), which is a linear operation, by accumulators:

    qsort xs = qsort' xs []
    
    qsort' [] r = r
    qsort' [x] r = x:r
    qsort' (x:xs) r = qpart xs [] [] r where
        qpart [] as bs r = qsort' as (x:qsort' bs r)
        qpart (x':xs') as bs r | x' <= x = qpart xs' (x':as) bs r
                               | x' >  x = qpart xs' as (x':bs) r
    
  2. Optimize to ternary quick sort (3-way partition, mentioned by Bentley and Sedgewick), to handle duplicated elements:

    tsort :: (Ord a) => [a] -> [a]
    tsort [] = []
    tsort (x:xs) = tsort [a | a<-xs, a<x] ++ x:[b | b<-xs, b==x] ++ tsort [c | c<-xs, c>x]
    
  3. Combine 2 and 3, refer to Richard Bird's book:

    psort xs = concat $ pass xs []
    
    pass [] xss = xss
    pass (x:xs) xss = step xs [] [x] [] xss where
        step [] as bs cs xss = pass as (bs:pass cs xss)
        step (x':xs') as bs cs xss | x' <  x = step xs' (x':as) bs cs xss
                                   | x' == x = step xs' as (x':bs) cs xss
                                   | x' >  x = step xs' as bs (x':cs) xss
    

Or alternatively if the duplicated elements are not the majority:

    tqsort xs = tqsort' xs []

    tqsort' []     r = r
    tqsort' (x:xs) r = qpart xs [] [x] [] r where
        qpart [] as bs cs r = tqsort' as (bs ++ tqsort' cs r)
        qpart (x':xs') as bs cs r | x' <  x = qpart xs' (x':as) bs cs r
                                  | x' == x = qpart xs' as (x':bs) cs r
                                  | x' >  x = qpart xs' as bs (x':cs) r

Unfortunately, median-of-three can't be implemented with the same effect, for example:

    qsort [] = []
    qsort [x] = [x]
    qsort [x, y] = [min x y, max x y]
    qsort (x:y:z:rest) = qsort (filter (< m) (s:rest)) ++ [m] ++ qsort (filter (>= m) (l:rest)) where
        xs = [x, y, z]
        [s, m, l] = [minimum xs, median xs, maximum xs] 

because it still performs poorly for the following 4 cases:

  1. [1, 2, 3, 4, ...., n]

  2. [n, n-1, n-2, ..., 1]

  3. [m-1, m-2, ...3, 2, 1, m+1, m+2, ..., n]

  4. [n, 1, n-1, 2, ... ]

All these 4 cases are well handled by imperative median-of-three approach.

Actually, the most suitable sort algorithm for a purely functional setting is still merge-sort, but not quick-sort.

For detail, please visit my ongoing writing at: https://sites.google.com/site/algoxy/dcsort


Ask anybody to write quicksort in Haskell, and you will get essentially the same program--it is obviously quicksort. Here are some advantages and disadvantages:

Pro: It improves on "true" quicksort by being stable, i.e. it preserves sequence order among equal elements.

Pro: It is trivial to generalize to a three-way split (< = >), which avoids quadratic behavior due to some value occurring O(n) times.

Pro: It's easier to read--even if one had to include the definition of filter.

Con: It uses more memory.

Con: It is costly to generalize the pivot choice by further sampling, which could avoid quadratic behavior on certain low-entropy orderings.


Because taking the first element from the list results in very bad runtime. Use median of 3: first, middle, last.

참고URL : https://stackoverflow.com/questions/7717691/why-is-the-minimalist-example-haskell-quicksort-not-a-true-quicksort

반응형