C 언어 프로그래밍에서 까다로운 부분 중의 하나는 획득했던(할당받았던) 메모리를 여하히 해제하느냐이다. 프로그램밍의 사소한 오류로 메모리 부족 현상이 쉽게 일어나기 때문이다.

자바 언어, 파이썬 언어, C# 언어 등은 자동으로 쓰레기 수집(garbage collection)이 이루어지므로 거의 신경쓰지 않아도 되지만, C 언어는 소스 작성하는 프로그래머가 직접해야 한다.

오래 전(10여년 전)에 Hans Boehm 가 만들었던 C/C++ 언어용 쓰레기 수집기(Boehm-Demers-Weiser conservative garbage collector, 줄여사 BoehmGC 러고 부름)를 이용하여 컴파일되고 실행되는 소스이다. 이 쓰레기 수집기를 쓰면, 메모리 해제(free)를 프로그래머가 신경쓰지 않아도 된다. 그렇게 하기 위해서는 헤더 파일 추가 구문

              #include "gc.h"     또는      #include "gc/gc.h"

가 있어야 하고,  함수 malloc(크기) 대신에 매크로 GC_MALLOC(크기) 를, 함수 realloc(포인터, 크기) 대신에 매크로 GC_REALLOC(포인터, 크기) 를 사용해야 한다. 좀 더 좋은 성능을 위해서는 GC_MALLOC_ATOMIC(크기) 과 GC_enable_incremental() 을 사용하기도 한다. GC_MALLOC_ATOMIC 은 포인터를 전혀 갖지 않은 객체에 대하여만 적용된다. 



MinGW 의 gcc 나 Visual C++ 2010 용으로 동작하는지는 아직 확인 되지는 않았으나, Cygwin에는 Hans Boehm 의 쓰레기 수집기가 이미 설치되어 있어서, Cygwin 의  gcc 4.5.3 으로 컴파일할 수 있었다.

 

/*
 * Filename: testGC_01.c 
 *
 *           Test a garbage collection for C language.
 *
 * Compile: gcc -o testGC_01 testGC_01.c -lgc
 *
 * Execute: ./testGC_01
 *
 * See: http://www.hpl.hp.com/personal/Hans_Boehm/gc/simple_example.html
 */


#include "gc.h"
#include <assert.h>
#include <stdio.h>

int main()
{
    int i;

    GC_INIT(); /* Optional on Linux/X86; see below.  */
    for (i = 0; i < 10000000; ++i)
    {
        int **p = (int **) GC_MALLOC(sizeof(int *));
        int *q = (int *) GC_MALLOC_ATOMIC(sizeof(int));
        assert(*p == 0);
        *p = (int *) GC_REALLOC(q, 2 * sizeof(int));
        if (i % 100000 == 0)
            printf("Heap size = %d\n", GC_get_heap_size());
    }

    return 0;
}

 

 

실행 결과:
Heap size = 65536
Heap size = 131072
.................
.................
.................

 

 

* 단일 쓰레드에서 쓰레기 수집  벤치마크 테스트 (파일명: GCBench.c)

// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
//
//      This is no substitute for real applications.  No actual application
//      is likely to behave in exactly this way.  However, this benchmark was
//      designed to be more representative of real applications than other
//      Java GC benchmarks of which we are aware.
//      It attempts to model those properties of allocation requests that
//      are important to current GC techniques.
//      It is designed to be used either to obtain a single overall performance
//      number, or to give a more detailed estimate of how collector
//      performance varies with object lifetimes.  It prints the time
//      required to allocate and collect balanced binary trees of various
//      sizes.  Smaller trees result in shorter object lifetimes.  Each cycle
//      allocates roughly the same amount of memory.
//      Two data structures are kept around during the entire process, so
//      that the measured performance is representative of applications
//      that maintain some live in-memory data.  One of these is a tree
//      containing many pointers.  The other is a large array containing
//      double precision floating point numbers.  Both should be of comparable
//      size.
//
//      The results are only really meaningful together with a specification
//      of how much memory was used.  It is possible to trade memory for
//      better time performance.  This benchmark should be run in a 32 MB
//      heap, though we don't currently know how to enforce that uniformly.
//
//      Unlike the original Ellis and Kovac benchmark, we do not attempt
//      measure pause times.  This facility should eventually be added back
//      in.  There are several reasons for omitting it for now.  The original
//      implementation depended on assumptions about the thread scheduler
//      that don't hold uniformly.  The results really measure both the
//      scheduler and GC.  Pause time measurements tend to not fit well with
//      current benchmark suites.  As far as we know, none of the current
//      commercial Java implementations seriously attempt to minimize GC pause
//      times.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

#ifdef GC
#  include "gc.h"
#endif

#ifdef PROFIL
  extern void init_profiling();
  extern dump_profile();
#endif

//  These macros were a quick hack for the Macintosh.
//
//  #define currentTime() clock()
//  #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)

#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)

/* Get the current time in milliseconds */

unsigned
stats_rtclock( void )
{
  struct timeval t;
  struct timezone tz;

  if (gettimeofday( &t, &tz ) == -1)
    return 0;
  return (t.tv_sec * 1000 + t.tv_usec / 1000);
}

static const int kStretchTreeDepth    = 18;      // about 16Mb
static const int kLongLivedTreeDepth  = 16;  // about 4Mb
static const int kArraySize  = 500000;  // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;

typedef struct Node0_struct {
        struct Node0_struct * left;
        struct Node0_struct * right;
        int i, j;
} Node0;

#ifdef HOLES
#   define HOLE() GC_NEW(Node0);
#else
#   define HOLE()
#endif

typedef Node0 *Node;

void init_Node(Node me, Node l, Node r) {
    me -> left = l;
    me -> right = r;
}

#ifndef GC
  void destroy_Node(Node me) {
    if (me -> left) {
 destroy_Node(me -> left);
    }
    if (me -> right) {
 destroy_Node(me -> right);
    }
    free(me);
  }
#endif

// Nodes used by a tree of a given size
static int TreeSize(int i) {
        return ((1 << (i + 1)) - 1);
}

// Number of iterations to use for a given tree depth
static int NumIters(int i) {
        return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}

// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
        if (iDepth<=0) {
                return;
        } else {
                iDepth--;
#  ifdef GC
                  thisNode->left  = GC_NEW(Node0); HOLE();
                  thisNode->right = GC_NEW(Node0); HOLE();
#  else
                  thisNode->left  = calloc(1, sizeof(Node0));
                  thisNode->right = calloc(1, sizeof(Node0));
#  endif
                Populate (iDepth, thisNode->left);
                Populate (iDepth, thisNode->right);
        }
}

// Build tree bottom-up
static Node MakeTree(int iDepth) {
 Node result;
        if (iDepth<=0) {
#     ifndef GC
  result = calloc(1, sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     /* result is implicitly initialized in both cases. */
     return result;
        } else {
     Node left = MakeTree(iDepth-1);
     Node right = MakeTree(iDepth-1);
#     ifndef GC
  result = malloc(sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     init_Node(result, left, right);
     return result;
        }
}

static void PrintDiagnostics() {
#if 0
        long lFreeMemory = Runtime.getRuntime().freeMemory();
        long lTotalMemory = Runtime.getRuntime().totalMemory();

        System.out.print(" Total memory available="
                         + lTotalMemory + " bytes");
        System.out.println("  Free memory=" + lFreeMemory + " bytes");
#endif
}

static void TimeConstruction(int depth) {
        long    tStart, tFinish;
        int     iNumIters = NumIters(depth);
        Node    tempTree;
 int  i;

 printf("Creating %d trees of depth %d\n", iNumIters, depth);
       
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
#  ifndef GC
                  tempTree = calloc(1, sizeof(Node0));
#  else
                  tempTree = GC_NEW(Node0);
#  endif
                Populate(depth, tempTree);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\tTop down construction took %d msec\n",
               elapsedTime(tFinish - tStart));
            
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
                tempTree = MakeTree(depth);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\tBottom up construction took %d msec\n",
               elapsedTime(tFinish - tStart));

}

int main() {
        Node    root;
        Node    longLivedTree;
        Node    tempTree;
        long    tStart, tFinish;
        long    tElapsed;
   int i, d;
 double  *array;

#ifdef GC
 // GC_full_freq = 30;
 // GC_free_space_divisor = 16;
 // GC_enable_incremental();
#endif
 printf("Garbage Collector Test\n");
  printf(" Live storage will peak at %d bytes.\n\n",
               2 * sizeof(Node0) * TreeSize(kLongLivedTreeDepth) +
               sizeof(double) * kArraySize);
        printf(" Stretching memory with a binary tree of depth %d\n",
               kStretchTreeDepth);
        PrintDiagnostics();
# ifdef PROFIL
     init_profiling();
# endif
      
        tStart = currentTime();
       
        // Stretch the memory space quickly
        tempTree = MakeTree(kStretchTreeDepth);
# ifndef GC
          destroy_Node(tempTree);
# endif
        tempTree = 0;

        // Create a long lived object
        printf(" Creating a long-lived binary tree of depth %d\n",
               kLongLivedTreeDepth);
# ifndef GC
          longLivedTree = calloc(1, sizeof(Node0));
# else
          longLivedTree = GC_NEW(Node0);
# endif
        Populate(kLongLivedTreeDepth, longLivedTree);

        // Create long-lived array, filling half of it
 printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
          array = malloc(kArraySize * sizeof(double));
# else
#   ifndef NO_PTRFREE
            array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
#   else
            array = GC_MALLOC(sizeof(double) * kArraySize);
#   endif
# endif
        for (i = 0; i < kArraySize/2; ++i) {
                array[i] = 1.0/i;
        }
        PrintDiagnostics();

        for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
                TimeConstruction(d);
        }

        if (longLivedTree == 0 || array[1000] != 1.0/1000)
  fprintf(stderr, "Failed\n");
                                // fake reference to LongLivedTree
                                // and array
                                // to keep them from being optimized away

        tFinish = currentTime();
        tElapsed = elapsedTime(tFinish-tStart);
        PrintDiagnostics();
        printf("Completed in %d msec\n", tElapsed);
# ifdef GC
   printf("Completed %d collections\n", GC_gc_no);
   printf("Heap size is %d\n", GC_get_heap_size());
#       endif
# ifdef PROFIL
   dump_profile();
# endif
}

 

* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o GCBench GCBench.c -lgc

* 실행하기
$ ./GCBench
Garbage Collector Test
 Live storage will peak at 8194272 bytes.

 Stretching memory with a binary tree of depth 18
 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
Creating 33824 trees of depth 4
        Top down construction took 378 msec
        Bottom up construction took 374 msec
Creating 8256 trees of depth 6
        Top down construction took 379 msec
        Bottom up construction took 357 msec
Creating 2052 trees of depth 8
        Top down construction took 379 msec
        Bottom up construction took 354 msec
Creating 512 trees of depth 10
        Top down construction took 383 msec
        Bottom up construction took 356 msec
Creating 128 trees of depth 12
        Top down construction took 382 msec
        Bottom up construction took 363 msec
Creating 32 trees of depth 14
        Top down construction took 387 msec
        Bottom up construction took 381 msec
Creating 8 trees of depth 16
        Top down construction took 396 msec
        Bottom up construction took 400 msec
Completed in 5525 msec

 

* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o GCBench GCBench.c -lgc

* 실행하기
$ ./GCBench
Garbage Collector Test
 Live storage will peak at 8194272 bytes.

 Stretching memory with a binary tree of depth 18
 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
Creating 33824 trees of depth 4
        Top down construction took 44 msec
        Bottom up construction took 37 msec
Creating 8256 trees of depth 6
        Top down construction took 43 msec
        Bottom up construction took 39 msec
Creating 2052 trees of depth 8
        Top down construction took 38 msec
        Bottom up construction took 38 msec
Creating 512 trees of depth 10
        Top down construction took 38 msec
        Bottom up construction took 38 msec
Creating 128 trees of depth 12
        Top down construction took 44 msec
        Bottom up construction took 38 msec
Creating 32 trees of depth 14
        Top down construction took 38 msec
        Bottom up construction took 39 msec
Creating 8 trees of depth 16
        Top down construction took 38 msec
        Bottom up construction took 52 msec
Completed in 640 msec
Completed 25 collections
Heap size is 28975104

 

* 다중 쓰레드에서 쓰레기 수집  벤치마크 테스트 (파일명: MT_GCBench.c)

// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
// Adapted to run NTHREADS client threads concurrently.  Each
// thread executes the original benchmark.  12 June 2000  by Hans Boehm.
//
//      This is no substitute for real applications.  No actual application
//      is likely to behave in exactly this way.  However, this benchmark was
//      designed to be more representative of real applications than other
//      Java GC benchmarks of which we are aware.
//      It attempts to model those properties of allocation requests that
//      are important to current GC techniques.
//      It is designed to be used either to obtain a single overall performance
//      number, or to give a more detailed estimate of how collector
//      performance varies with object lifetimes.  It prints the time
//      required to allocate and collect balanced binary trees of various
//      sizes.  Smaller trees result in shorter object lifetimes.  Each cycle
//      allocates roughly the same amount of memory.
//      Two data structures are kept around during the entire process, so
//      that the measured performance is representative of applications
//      that maintain some live in-memory data.  One of these is a tree
//      containing many pointers.  The other is a large array containing
//      double precision floating point numbers.  Both should be of comparable
//      size.
//
//      The results are only really meaningful together with a specification
//      of how much memory was used.  It is possible to trade memory for
//      better time performance.  This benchmark should be run in a 32 MB
//      heap, though we don't currently know how to enforce that uniformly.
//
//      Unlike the original Ellis and Kovac benchmark, we do not attempt
//      measure pause times.  This facility should eventually be added back
//      in.  There are several reasons for omitting it for now.  The original
//      implementation depended on assumptions about the thread scheduler
//      that don't hold uniformly.  The results really measure both the
//      scheduler and GC.  Pause time measurements tend to not fit well with
//      current benchmark suites.  As far as we know, none of the current
//      commercial Java implementations seriously attempt to minimize GC pause
//      times.

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>

#ifdef GC
#  ifndef LINUX_THREADS
#     define LINUX_THREADS
#  endif
#  ifndef _REENTRANT
#     define _REENTRANT
#  endif
#  ifdef LOCAL
#    define GC_REDIRECT_TO_LOCAL
#    include "gc_local_alloc.h"
#  endif
#  include "gc.h"
#endif


#ifndef NTHREADS
#   define NTHREADS 1
#endif

#ifdef PROFIL
  extern void init_profiling();
  extern dump_profile();
#endif

//  These macros were a quick hack for the Macintosh.
//
//  #define currentTime() clock()
//  #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)

#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)

/* Get the current time in milliseconds */

unsigned
stats_rtclock( void )
{
  struct timeval t;
  struct timezone tz;

  if (gettimeofday( &t, &tz ) == -1)
    return 0;
  return (t.tv_sec * 1000 + t.tv_usec / 1000);
}

static const int kStretchTreeDepth    = 18;      // about 16Mb
static const int kLongLivedTreeDepth  = 16;  // about 4Mb
static const int kArraySize  = 500000;  // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;

typedef struct Node0_struct {
        struct Node0_struct * left;
        struct Node0_struct * right;
        int i, j;
} Node0;

#ifdef HOLES
#   define HOLE() GC_NEW(Node0);
#else
#   define HOLE()
#endif

typedef Node0 *Node;

void init_Node(Node me, Node l, Node r) {
    me -> left = l;
    me -> right = r;
}

#ifndef GC
  void destroy_Node(Node me) {
    if (me -> left) {
 destroy_Node(me -> left);
    }
    if (me -> right) {
 destroy_Node(me -> right);
    }
    free(me);
  }
#endif

// Nodes used by a tree of a given size
static int TreeSize(int i) {
        return ((1 << (i + 1)) - 1);
}

// Number of iterations to use for a given tree depth
static int NumIters(int i) {
        return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}

// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
        if (iDepth<=0) {
                return;
        } else {
                iDepth--;
#  ifdef GC
                  thisNode->left  = GC_NEW(Node0); HOLE();
                  thisNode->right = GC_NEW(Node0); HOLE();
#  else
                  thisNode->left  = calloc(1, sizeof(Node0));
                  thisNode->right = calloc(1, sizeof(Node0));
#  endif
                Populate (iDepth, thisNode->left);
                Populate (iDepth, thisNode->right);
        }
}

// Build tree bottom-up
static Node MakeTree(int iDepth) {
 Node result;
        if (iDepth<=0) {
#     ifndef GC
  result = calloc(1, sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     /* result is implicitly initialized in both cases. */
     return result;
        } else {
     Node left = MakeTree(iDepth-1);
     Node right = MakeTree(iDepth-1);
#     ifndef GC
  result = malloc(sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     init_Node(result, left, right);
     return result;
        }
}

static void PrintDiagnostics() {
#if 0
        long lFreeMemory = Runtime.getRuntime().freeMemory();
        long lTotalMemory = Runtime.getRuntime().totalMemory();

        System.out.print(" Total memory available="
                         + lTotalMemory + " bytes");
        System.out.println("  Free memory=" + lFreeMemory + " bytes");
#endif
}

static void TimeConstruction(int depth) {
        long    tStart, tFinish;
        int     iNumIters = NumIters(depth);
        Node    tempTree;
 int  i;

 printf("0x%x: Creating %d trees of depth %d\n", pthread_self(), iNumIters, depth);
       
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
#  ifndef GC
                  tempTree = calloc(1, sizeof(Node0));
#  else
                  tempTree = GC_NEW(Node0);
#  endif
                Populate(depth, tempTree);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\t0x%x: Top down construction took %d msec\n",
               pthread_self(), elapsedTime(tFinish - tStart));
            
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
                tempTree = MakeTree(depth);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\t0x%x: Bottom up construction took %d msec\n",
               pthread_self(), elapsedTime(tFinish - tStart));

}

void * run_one_test(void * arg) {
 int d;
        for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
                TimeConstruction(d);
        }
}

int main() {
        Node    root;
        Node    longLivedTree;
        Node    tempTree;
        long    tStart, tFinish;
        long    tElapsed;
   int i;
 double  *array;

#ifdef GC
 // GC_full_freq = 30;
 // GC_free_space_divisor = 16;
 // GC_enable_incremental();
#endif
#       if defined(GC) && defined(LOCAL)
   GC_thr_init();
#   endif
 printf("Garbage Collector Test\n");
  printf(" Live storage will peak at %d bytes.\n\n",
               2 * sizeof(Node0) * TreeSize(kLongLivedTreeDepth) +
               sizeof(double) * kArraySize);
        printf(" Stretching memory with a binary tree of depth %d\n",
               kStretchTreeDepth);
        PrintDiagnostics();
# ifdef PROFIL
     init_profiling();
# endif
      
        tStart = currentTime();
       
        // Stretch the memory space quickly
        tempTree = MakeTree(kStretchTreeDepth);
# ifndef GC
          destroy_Node(tempTree);
# endif
        tempTree = 0;

        // Create a long lived object
        printf(" Creating a long-lived binary tree of depth %d\n",
               kLongLivedTreeDepth);
# ifndef GC
          longLivedTree = calloc(1, sizeof(Node0));
# else
          longLivedTree = GC_NEW(Node0);
# endif
        Populate(kLongLivedTreeDepth, longLivedTree);

        // Create long-lived array, filling half of it
 printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
          array = malloc(kArraySize * sizeof(double));
# else
#   ifndef NO_PTRFREE
            array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
#   else
            array = GC_MALLOC(sizeof(double) * kArraySize);
#   endif
# endif
        for (i = 0; i < kArraySize/2; ++i) {
                array[i] = 1.0/i;
        }

        {
   pthread_t thread[NTHREADS];
   for (i = 1; i < NTHREADS; ++i) {
         int code;

     if ((code = pthread_create(thread+i, 0, run_one_test, 0)) != 0) {
           fprintf(stderr, "Thread creation failed %u\n", code);
       exit(1);
     }
   }
   /* We use the main thread to run one test.  This allows */
   /* profiling to work, for example.    */
   run_one_test(0);
   for (i = 1; i < NTHREADS; ++i) {
         int code;
     if ((code = pthread_join(thread[i], 0)) != 0) {
         fprintf(stderr, "Thread join failed %u\n", code);
           }
    }
        }
        PrintDiagnostics();

        if (longLivedTree == 0 || array[1000] != 1.0/1000)
  fprintf(stderr, "Failed\n");
                                // fake reference to LongLivedTree
                                // and array
                                // to keep them from being optimized away

        tFinish = currentTime();
        tElapsed = elapsedTime(tFinish-tStart);
        PrintDiagnostics();
        printf("Completed in %d msec\n", tElapsed);
# ifdef GC
   printf("Completed %d collections\n", GC_gc_no);
   printf("Heap size is %d\n", GC_get_heap_size());
#       endif
# ifdef PROFIL
   dump_profile();
# endif
}

 

* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o MT_GCBench MT_GCBench.c -lgc

* 실행하기
$ ./MT_GCBench
Garbage Collector Test
 Live storage will peak at 8194272 bytes.

 Stretching memory with a binary tree of depth 18
 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
        0x4b0038: Top down construction took 379 msec
        0x4b0038: Bottom up construction took 353 msec
0x4b0038: Creating 8256 trees of depth 6
        0x4b0038: Top down construction took 380 msec
        0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 2052 trees of depth 8
        0x4b0038: Top down construction took 379 msec
        0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 512 trees of depth 10
        0x4b0038: Top down construction took 382 msec
        0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 128 trees of depth 12
        0x4b0038: Top down construction took 386 msec
        0x4b0038: Bottom up construction took 361 msec
0x4b0038: Creating 32 trees of depth 14
        0x4b0038: Top down construction took 383 msec
        0x4b0038: Bottom up construction took 360 msec
0x4b0038: Creating 8 trees of depth 16
        0x4b0038: Top down construction took 389 msec
        0x4b0038: Bottom up construction took 373 msec
Completed in 5423 msec

 

* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o MT_GCBench MT_GCBench.c -lgc

* 실행하기
$ ./MT_GCBench
Garbage Collector Test
 Live storage will peak at 8194272 bytes.

 Stretching memory with a binary tree of depth 18
 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
        0x4b0038: Top down construction took 52 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 8256 trees of depth 6
        0x4b0038: Top down construction took 44 msec
        0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 2052 trees of depth 8
        0x4b0038: Top down construction took 38 msec
        0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 512 trees of depth 10
        0x4b0038: Top down construction took 38 msec
        0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 128 trees of depth 12
        0x4b0038: Top down construction took 44 msec
        0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 32 trees of depth 14
        0x4b0038: Top down construction took 38 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 8 trees of depth 16
        0x4b0038: Top down construction took 38 msec
        0x4b0038: Bottom up construction took 51 msec
Completed in 651 msec
Completed 25 collections
Heap size is 28975104

 

 

* 다중 쓰레드에서 쓰레기 수집  벤치마크 테스트 2 (파일명: MT_GCBench2.c)

// This is version 2 of the multithreaded GC Bench.
// Heap expansion is handled differently from version 1, in an attempt
// to make scalability measurements more meaningful.  The version with
// N threads now immediately expands the heap to N*32MB.
//
// To run this with BDWGC versions 6 and later with thread local allocation,
// define GC and LOCAL.  Without thread-local allocation, define just GC.
// To run it with the University of Tokyo scalable GC,
// define SGC.  To run it with malloc and explicit deallocation, define
// none of these.  (This should also work for Hoard.)
//
// Note that defining GC or SGC removes the explicit deallocation passes,
// which seems fair.
//
// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
// Adapted to run NTHREADS client threads concurrently.  Each
// thread executes the original benchmark.  12 June 2000  by Hans Boehm.
// Changed heap expansion rule, and made the number of threads run-time
// configurable.  25 Oct 2000 by Hans Boehm.
//
//      This is no substitute for real applications.  No actual application
//      is likely to behave in exactly this way.  However, this benchmark was
//      designed to be more representative of real applications than other
//      Java GC benchmarks of which we were aware at the time.
//      It still doesn't seem too bad for something this small.
//      It attempts to model those properties of allocation requests that
//      are important to current GC techniques.
//      It is designed to be used either to obtain a single overall performance
//      number, or to give a more detailed estimate of how collector
//      performance varies with object lifetimes.  It prints the time
//      required to allocate and collect balanced binary trees of various
//      sizes.  Smaller trees result in shorter object lifetimes.  Each cycle
//      allocates roughly the same amount of memory.
//      Two data structures are kept around during the entire process, so
//      that the measured performance is representative of applications
//      that maintain some live in-memory data.  One of these is a tree
//      containing many pointers.  The other is a large array containing
//      double precision floating point numbers.  Both should be of comparable
//      size.
//
//      The results are only really meaningful together with a specification
//      of how much memory was used.  This versions of the benchmark tries
//      to preallocate a sufficiently large heap that expansion should not be
//      needed.
//
//      Unlike the original Ellis and Kovac benchmark, we do not attempt
//      measure pause times.  This facility should eventually be added back
//      in.  There are several reasons for omitting it for now.  The original
//      implementation depended on assumptions about the thread scheduler
//      that don't hold uniformly.  The results really measure both the
//      scheduler and GC.  Pause time measurements tend to not fit well with
//      current benchmark suites.  As far as we know, none of the current
//      commercial Java implementations seriously attempt to minimize GC pause
//      times.
//
//      Since this benchmark has recently been more widely used, some
//      anomalous behavious has been uncovered.  The user should be aware
//      of this:
//      1) Nearly all objects are of the same size.  This benchmark is
//         not useful for analyzing fragmentation behavior.  It is unclear
//         whether this is an issue for well-designed allocators.
//      2) Unless HOLES is defined, it tends to drop consecutively allocated
//         memory at the same time.  Many real applications do exhibit this
//         phenomenon, but probably not to this extent.  (Defining HOLES tends
//         to move the benchmark to the opposite extreme.)
//      3) It appears harder to predict object lifetimes than for most real
//         Java programs (see T. Harris, "Dynamic adptive pre-tenuring",
//         ISMM '00).

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>

#ifdef GC
#  ifndef LINUX_THREADS
#     define LINUX_THREADS
#  endif
#  ifndef _REENTRANT
#     define _REENTRANT
#  endif
#  ifdef LOCAL
#    define GC_REDIRECT_TO_LOCAL
#    include "gc_local_alloc.h"
#  endif
#  include "gc.h"
#endif
#ifdef SGC
#  include "sgc.h"
#  define GC
#  define pthread_create GC_pthread_create
#  define pthread_join GC_pthread_join
#endif

#define MAX_NTHREADS 1024

int nthreads = 0;

#ifdef PROFIL
  extern void init_profiling();
  extern dump_profile();
#endif

//  These macros were a quick hack for the Macintosh.
//
//  #define currentTime() clock()
//  #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)

#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)

/* Get the current time in milliseconds */

unsigned
stats_rtclock( void )
{
  struct timeval t;
  struct timezone tz;

  if (gettimeofday( &t, &tz ) == -1)
    return 0;
  return (t.tv_sec * 1000 + t.tv_usec / 1000);
}

static const int kStretchTreeDepth    = 18;      // about 16Mb
static const int kLongLivedTreeDepth  = 16;  // about 4Mb
static const int kArraySize  = 500000;  // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;

typedef struct Node0_struct {
        struct Node0_struct * left;
        struct Node0_struct * right;
        int i, j;
} Node0;

#ifdef HOLES
#   define HOLE() GC_NEW(Node0);
#else
#   define HOLE()
#endif

typedef Node0 *Node;

void init_Node(Node me, Node l, Node r) {
    me -> left = l;
    me -> right = r;
}

#ifndef GC
  void destroy_Node(Node me) {
    if (me -> left) {
 destroy_Node(me -> left);
    }
    if (me -> right) {
 destroy_Node(me -> right);
    }
    free(me);
  }
#endif

// Nodes used by a tree of a given size
static int TreeSize(int i) {
        return ((1 << (i + 1)) - 1);
}

// Number of iterations to use for a given tree depth
static int NumIters(int i) {
        return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}

// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
        if (iDepth<=0) {
                return;
        } else {
                iDepth--;
#  ifdef GC
                  thisNode->left  = GC_NEW(Node0); HOLE();
                  thisNode->right = GC_NEW(Node0); HOLE();
#  else
                  thisNode->left  = calloc(1, sizeof(Node0));
                  thisNode->right = calloc(1, sizeof(Node0));
#  endif
                Populate (iDepth, thisNode->left);
                Populate (iDepth, thisNode->right);
        }
}

// Build tree bottom-up
static Node MakeTree(int iDepth) {
 Node result;
        if (iDepth<=0) {
#     ifndef GC
  result = calloc(1, sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     /* result is implicitly initialized in both cases. */
     return result;
        } else {
     Node left = MakeTree(iDepth-1);
     Node right = MakeTree(iDepth-1);
#     ifndef GC
  result = malloc(sizeof(Node0));
#     else
  result = GC_NEW(Node0); HOLE();
#     endif
     init_Node(result, left, right);
     return result;
        }
}

static void PrintDiagnostics() {
#if 0
        long lFreeMemory = Runtime.getRuntime().freeMemory();
        long lTotalMemory = Runtime.getRuntime().totalMemory();

        System.out.print(" Total memory available="
                         + lTotalMemory + " bytes");
        System.out.println("  Free memory=" + lFreeMemory + " bytes");
#endif
}

static void TimeConstruction(int depth) {
        long    tStart, tFinish;
        int     iNumIters = NumIters(depth);
        Node    tempTree;
 int  i;

 printf("0x%x: Creating %d trees of depth %d\n", pthread_self(), iNumIters, depth);
       
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
#  ifndef GC
                  tempTree = calloc(1, sizeof(Node0));
#  else
                  tempTree = GC_NEW(Node0);
#  endif
                Populate(depth, tempTree);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\t0x%x: Top down construction took %d msec\n",
               pthread_self(), elapsedTime(tFinish - tStart));
            
        tStart = currentTime();
        for (i = 0; i < iNumIters; ++i) {
                tempTree = MakeTree(depth);
#  ifndef GC
                  destroy_Node(tempTree);
#  endif
                tempTree = 0;
        }
        tFinish = currentTime();
        printf("\t0x%x: Bottom up construction took %d msec\n",
               pthread_self(), elapsedTime(tFinish - tStart));

}

void * run_one_test(void * arg) {
 int d, i;
        Node    longLivedTree;
 double  *array;
 /* size_t initial_bytes = GC_get_total_bytes(); */

        // Create a long lived object
        printf(" Creating a long-lived binary tree of depth %d\n",
               kLongLivedTreeDepth);
# ifndef GC
          longLivedTree = calloc(1, sizeof(Node0));
# else
          longLivedTree = GC_NEW(Node0);
# endif
        Populate(kLongLivedTreeDepth, longLivedTree);

        // Create long-lived array, filling half of it
 printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
          array = malloc(kArraySize * sizeof(double));
# else
#   ifndef NO_PTRFREE
            array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
#   else
            array = GC_MALLOC(sizeof(double) * kArraySize);
#   endif
# endif
        for (i = 0; i < kArraySize/2; ++i) {
                array[i] = 1.0/i;
        }

        for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
                TimeConstruction(d);
        }
 /* printf("Allocated %ld bytes before start, %ld after\n",
  initial_bytes, GC_get_total_bytes() - initial_bytes); */
        if (longLivedTree->left -> right == 0 || array[1000] != 1.0/1000)
  fprintf(stderr, "Failed\n");
                                // fake reference to LongLivedTree
                                // and array
                                // to keep them from being optimized away

}

int main(int argc, char **argv) {
        Node    root;
        Node    tempTree[MAX_NTHREADS];
        long    tStart, tFinish;
        long    tElapsed;
   int i;
# ifdef SGC
   SGC_attr_t attr;
# endif

 if (1 == argc) {
     nthreads = 1;
 } else if (2 == argc) {
     nthreads = atoi(argv[1]);
     if (nthreads < 1 || nthreads > MAX_NTHREADS) {
  fprintf(stderr, "Invalid # of threads argument\n");
  exit(1);
     }
 } else {
     fprintf(stderr, "Usage: %s [# of threads]\n");
     exit(1);
 }
#       if defined(SGC)
   /* The University of Tokyo collector needs explicit */
   /* initialization.     */
   SGC_attr_init(&attr);
   SGC_init(nthreads, &attr);
#   endif
#ifdef GC
 // GC_full_freq = 30;
 // GC_free_space_divisor = 16;
 // GC_enable_incremental();
#endif
 printf("Garbage Collector Test\n");
  printf(" Live storage will peak at %d bytes or less .\n\n",
               2 * sizeof(Node0) * nthreads
          * (TreeSize(kLongLivedTreeDepth) + TreeSize(kMaxTreeDepth))
               + sizeof(double) * kArraySize);
        PrintDiagnostics();
       
# ifdef GC
   /* GC_expand_hp fails with empty heap */
   GC_malloc(1);
   GC_expand_hp(32*1024*1024*nthreads);
# endif

# ifdef PROFIL
     init_profiling();
# endif
      
        tStart = currentTime();
        {
   pthread_t thread[MAX_NTHREADS];
   for (i = 1; i < nthreads; ++i) {
         int code;

     if ((code = pthread_create(thread+i, 0, run_one_test, 0)) != 0) {
           fprintf(stderr, "Thread creation failed %u\n", code);
       exit(1);
     }
   }
   /* We use the main thread to run one test.  This allows */
   /* profiling to work, for example.    */
   run_one_test(0);
   for (i = 1; i < nthreads; ++i) {
         int code;
     if ((code = pthread_join(thread[i], 0)) != 0) {
         fprintf(stderr, "Thread join failed %u\n", code);
           }
    }
        }
        PrintDiagnostics();

        tFinish = currentTime();
        tElapsed = elapsedTime(tFinish-tStart);
        PrintDiagnostics();
        printf("Completed in %d msec\n", tElapsed);
# ifdef GC
   printf("Completed %d collections\n", GC_gc_no);
   printf("Heap size is %d\n", GC_get_heap_size());
#       endif
# ifdef PROFIL
   dump_profile();
# endif
        return 0;
}

 

* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o MT_GCBench2 MT_GCBench2.c -lgc

* 실행하기
$ ./MT_GCBench2
Garbage Collector Test
 Live storage will peak at 12388544 bytes or less .

 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
        0x4b0038: Top down construction took 396 msec
        0x4b0038: Bottom up construction took 379 msec
0x4b0038: Creating 8256 trees of depth 6
        0x4b0038: Top down construction took 399 msec
        0x4b0038: Bottom up construction took 396 msec
0x4b0038: Creating 2052 trees of depth 8
        0x4b0038: Top down construction took 385 msec
        0x4b0038: Bottom up construction took 368 msec
0x4b0038: Creating 512 trees of depth 10
        0x4b0038: Top down construction took 419 msec
        0x4b0038: Bottom up construction took 380 msec
0x4b0038: Creating 128 trees of depth 12
        0x4b0038: Top down construction took 399 msec
        0x4b0038: Bottom up construction took 360 msec
0x4b0038: Creating 32 trees of depth 14
        0x4b0038: Top down construction took 414 msec
        0x4b0038: Bottom up construction took 358 msec
0x4b0038: Creating 8 trees of depth 16
        0x4b0038: Top down construction took 415 msec
        0x4b0038: Bottom up construction took 402 msec
Completed in 5504 msec

 

* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o MT_GCBench2 MT_GCBench2.c -lgc

* 실행하기
$ ./MT_GCBench2
Garbage Collector Test
 Live storage will peak at 12388544 bytes or less .

 Creating a long-lived binary tree of depth 16
 Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
        0x4b0038: Top down construction took 54 msec
        0x4b0038: Bottom up construction took 41 msec
0x4b0038: Creating 8256 trees of depth 6
        0x4b0038: Top down construction took 39 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 2052 trees of depth 8
        0x4b0038: Top down construction took 39 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 512 trees of depth 10
        0x4b0038: Top down construction took 38 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 128 trees of depth 12
        0x4b0038: Top down construction took 39 msec
        0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 32 trees of depth 14
        0x4b0038: Top down construction took 39 msec
        0x4b0038: Bottom up construction took 40 msec
0x4b0038: Creating 8 trees of depth 16
        0x4b0038: Top down construction took 40 msec
        0x4b0038: Bottom up construction took 42 msec
Completed in 581 msec
Completed 14 collections
Heap size is 33619968

 

 

 

 

Posted by Scripter
,