C 언어 프로그래밍에서 까다로운 부분 중의 하나는 획득했던(할당받았던) 메모리를 여하히 해제하느냐이다. 프로그램밍의 사소한 오류로 메모리 부족 현상이 쉽게 일어나기 때문이다.
자바 언어, 파이썬 언어, C# 언어 등은 자동으로 쓰레기 수집(garbage collection)이 이루어지므로 거의 신경쓰지 않아도 되지만, C 언어는 소스 작성하는 프로그래머가 직접해야 한다.
오래 전(10여년 전)에 Hans Boehm 가 만들었던 C/C++ 언어용 쓰레기 수집기(Boehm-Demers-Weiser conservative garbage collector, 줄여사 BoehmGC 러고 부름)를 이용하여 컴파일되고 실행되는 소스이다. 이 쓰레기 수집기를 쓰면, 메모리 해제(free)를 프로그래머가 신경쓰지 않아도 된다. 그렇게 하기 위해서는 헤더 파일 추가 구문
#include "gc.h" 또는 #include "gc/gc.h"
가 있어야 하고, 함수 malloc(크기) 대신에 매크로 GC_MALLOC(크기) 를, 함수 realloc(포인터, 크기) 대신에 매크로 GC_REALLOC(포인터, 크기) 를 사용해야 한다. 좀 더 좋은 성능을 위해서는 GC_MALLOC_ATOMIC(크기) 과 GC_enable_incremental() 을 사용하기도 한다. GC_MALLOC_ATOMIC 은 포인터를 전혀 갖지 않은 객체에 대하여만 적용된다.
MinGW 의 gcc 나 Visual C++ 2010 용으로 동작하는지는 아직 확인 되지는 않았으나, Cygwin에는 Hans Boehm 의 쓰레기 수집기가 이미 설치되어 있어서, Cygwin 의 gcc 4.5.3 으로 컴파일할 수 있었다.
/*
* Filename: testGC_01.c
*
* Test a garbage collection for C language.
*
* Compile: gcc -o testGC_01 testGC_01.c -lgc
*
* Execute: ./testGC_01
*
* See: http://www.hpl.hp.com/personal/Hans_Boehm/gc/simple_example.html
*/
#include "gc.h"
#include <assert.h>
#include <stdio.h>
int main()
{
int i;
GC_INIT(); /* Optional on Linux/X86; see below. */
for (i = 0; i < 10000000; ++i)
{
int **p = (int **) GC_MALLOC(sizeof(int *));
int *q = (int *) GC_MALLOC_ATOMIC(sizeof(int));
assert(*p == 0);
*p = (int *) GC_REALLOC(q, 2 * sizeof(int));
if (i % 100000 == 0)
printf("Heap size = %d\n", GC_get_heap_size());
}
return 0;
}
실행 결과:
Heap size = 65536
Heap size = 131072
.................
.................
.................
* 단일 쓰레드에서 쓰레기 수집 벤치마크 테스트 (파일명: GCBench.c)
// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
//
// This is no substitute for real applications. No actual application
// is likely to behave in exactly this way. However, this benchmark was
// designed to be more representative of real applications than other
// Java GC benchmarks of which we are aware.
// It attempts to model those properties of allocation requests that
// are important to current GC techniques.
// It is designed to be used either to obtain a single overall performance
// number, or to give a more detailed estimate of how collector
// performance varies with object lifetimes. It prints the time
// required to allocate and collect balanced binary trees of various
// sizes. Smaller trees result in shorter object lifetimes. Each cycle
// allocates roughly the same amount of memory.
// Two data structures are kept around during the entire process, so
// that the measured performance is representative of applications
// that maintain some live in-memory data. One of these is a tree
// containing many pointers. The other is a large array containing
// double precision floating point numbers. Both should be of comparable
// size.
//
// The results are only really meaningful together with a specification
// of how much memory was used. It is possible to trade memory for
// better time performance. This benchmark should be run in a 32 MB
// heap, though we don't currently know how to enforce that uniformly.
//
// Unlike the original Ellis and Kovac benchmark, we do not attempt
// measure pause times. This facility should eventually be added back
// in. There are several reasons for omitting it for now. The original
// implementation depended on assumptions about the thread scheduler
// that don't hold uniformly. The results really measure both the
// scheduler and GC. Pause time measurements tend to not fit well with
// current benchmark suites. As far as we know, none of the current
// commercial Java implementations seriously attempt to minimize GC pause
// times.
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#ifdef GC
# include "gc.h"
#endif
#ifdef PROFIL
extern void init_profiling();
extern dump_profile();
#endif
// These macros were a quick hack for the Macintosh.
//
// #define currentTime() clock()
// #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)
#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)
/* Get the current time in milliseconds */
unsigned
stats_rtclock( void )
{
struct timeval t;
struct timezone tz;
if (gettimeofday( &t, &tz ) == -1)
return 0;
return (t.tv_sec * 1000 + t.tv_usec / 1000);
}
static const int kStretchTreeDepth = 18; // about 16Mb
static const int kLongLivedTreeDepth = 16; // about 4Mb
static const int kArraySize = 500000; // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;
typedef struct Node0_struct {
struct Node0_struct * left;
struct Node0_struct * right;
int i, j;
} Node0;
#ifdef HOLES
# define HOLE() GC_NEW(Node0);
#else
# define HOLE()
#endif
typedef Node0 *Node;
void init_Node(Node me, Node l, Node r) {
me -> left = l;
me -> right = r;
}
#ifndef GC
void destroy_Node(Node me) {
if (me -> left) {
destroy_Node(me -> left);
}
if (me -> right) {
destroy_Node(me -> right);
}
free(me);
}
#endif
// Nodes used by a tree of a given size
static int TreeSize(int i) {
return ((1 << (i + 1)) - 1);
}
// Number of iterations to use for a given tree depth
static int NumIters(int i) {
return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}
// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
if (iDepth<=0) {
return;
} else {
iDepth--;
# ifdef GC
thisNode->left = GC_NEW(Node0); HOLE();
thisNode->right = GC_NEW(Node0); HOLE();
# else
thisNode->left = calloc(1, sizeof(Node0));
thisNode->right = calloc(1, sizeof(Node0));
# endif
Populate (iDepth, thisNode->left);
Populate (iDepth, thisNode->right);
}
}
// Build tree bottom-up
static Node MakeTree(int iDepth) {
Node result;
if (iDepth<=0) {
# ifndef GC
result = calloc(1, sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
/* result is implicitly initialized in both cases. */
return result;
} else {
Node left = MakeTree(iDepth-1);
Node right = MakeTree(iDepth-1);
# ifndef GC
result = malloc(sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
init_Node(result, left, right);
return result;
}
}
static void PrintDiagnostics() {
#if 0
long lFreeMemory = Runtime.getRuntime().freeMemory();
long lTotalMemory = Runtime.getRuntime().totalMemory();
System.out.print(" Total memory available="
+ lTotalMemory + " bytes");
System.out.println(" Free memory=" + lFreeMemory + " bytes");
#endif
}
static void TimeConstruction(int depth) {
long tStart, tFinish;
int iNumIters = NumIters(depth);
Node tempTree;
int i;
printf("Creating %d trees of depth %d\n", iNumIters, depth);
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
# ifndef GC
tempTree = calloc(1, sizeof(Node0));
# else
tempTree = GC_NEW(Node0);
# endif
Populate(depth, tempTree);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\tTop down construction took %d msec\n",
elapsedTime(tFinish - tStart));
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
tempTree = MakeTree(depth);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\tBottom up construction took %d msec\n",
elapsedTime(tFinish - tStart));
}
int main() {
Node root;
Node longLivedTree;
Node tempTree;
long tStart, tFinish;
long tElapsed;
int i, d;
double *array;
#ifdef GC
// GC_full_freq = 30;
// GC_free_space_divisor = 16;
// GC_enable_incremental();
#endif
printf("Garbage Collector Test\n");
printf(" Live storage will peak at %d bytes.\n\n",
2 * sizeof(Node0) * TreeSize(kLongLivedTreeDepth) +
sizeof(double) * kArraySize);
printf(" Stretching memory with a binary tree of depth %d\n",
kStretchTreeDepth);
PrintDiagnostics();
# ifdef PROFIL
init_profiling();
# endif
tStart = currentTime();
// Stretch the memory space quickly
tempTree = MakeTree(kStretchTreeDepth);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
// Create a long lived object
printf(" Creating a long-lived binary tree of depth %d\n",
kLongLivedTreeDepth);
# ifndef GC
longLivedTree = calloc(1, sizeof(Node0));
# else
longLivedTree = GC_NEW(Node0);
# endif
Populate(kLongLivedTreeDepth, longLivedTree);
// Create long-lived array, filling half of it
printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
array = malloc(kArraySize * sizeof(double));
# else
# ifndef NO_PTRFREE
array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
# else
array = GC_MALLOC(sizeof(double) * kArraySize);
# endif
# endif
for (i = 0; i < kArraySize/2; ++i) {
array[i] = 1.0/i;
}
PrintDiagnostics();
for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
TimeConstruction(d);
}
if (longLivedTree == 0 || array[1000] != 1.0/1000)
fprintf(stderr, "Failed\n");
// fake reference to LongLivedTree
// and array
// to keep them from being optimized away
tFinish = currentTime();
tElapsed = elapsedTime(tFinish-tStart);
PrintDiagnostics();
printf("Completed in %d msec\n", tElapsed);
# ifdef GC
printf("Completed %d collections\n", GC_gc_no);
printf("Heap size is %d\n", GC_get_heap_size());
# endif
# ifdef PROFIL
dump_profile();
# endif
}
* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o GCBench GCBench.c -lgc
* 실행하기
$ ./GCBench
Garbage Collector Test
Live storage will peak at 8194272 bytes.
Stretching memory with a binary tree of depth 18
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
Creating 33824 trees of depth 4
Top down construction took 378 msec
Bottom up construction took 374 msec
Creating 8256 trees of depth 6
Top down construction took 379 msec
Bottom up construction took 357 msec
Creating 2052 trees of depth 8
Top down construction took 379 msec
Bottom up construction took 354 msec
Creating 512 trees of depth 10
Top down construction took 383 msec
Bottom up construction took 356 msec
Creating 128 trees of depth 12
Top down construction took 382 msec
Bottom up construction took 363 msec
Creating 32 trees of depth 14
Top down construction took 387 msec
Bottom up construction took 381 msec
Creating 8 trees of depth 16
Top down construction took 396 msec
Bottom up construction took 400 msec
Completed in 5525 msec
* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o GCBench GCBench.c -lgc
* 실행하기
$ ./GCBench
Garbage Collector Test
Live storage will peak at 8194272 bytes.
Stretching memory with a binary tree of depth 18
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
Creating 33824 trees of depth 4
Top down construction took 44 msec
Bottom up construction took 37 msec
Creating 8256 trees of depth 6
Top down construction took 43 msec
Bottom up construction took 39 msec
Creating 2052 trees of depth 8
Top down construction took 38 msec
Bottom up construction took 38 msec
Creating 512 trees of depth 10
Top down construction took 38 msec
Bottom up construction took 38 msec
Creating 128 trees of depth 12
Top down construction took 44 msec
Bottom up construction took 38 msec
Creating 32 trees of depth 14
Top down construction took 38 msec
Bottom up construction took 39 msec
Creating 8 trees of depth 16
Top down construction took 38 msec
Bottom up construction took 52 msec
Completed in 640 msec
Completed 25 collections
Heap size is 28975104
* 다중 쓰레드에서 쓰레기 수집 벤치마크 테스트 (파일명: MT_GCBench.c)
// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
// Adapted to run NTHREADS client threads concurrently. Each
// thread executes the original benchmark. 12 June 2000 by Hans Boehm.
//
// This is no substitute for real applications. No actual application
// is likely to behave in exactly this way. However, this benchmark was
// designed to be more representative of real applications than other
// Java GC benchmarks of which we are aware.
// It attempts to model those properties of allocation requests that
// are important to current GC techniques.
// It is designed to be used either to obtain a single overall performance
// number, or to give a more detailed estimate of how collector
// performance varies with object lifetimes. It prints the time
// required to allocate and collect balanced binary trees of various
// sizes. Smaller trees result in shorter object lifetimes. Each cycle
// allocates roughly the same amount of memory.
// Two data structures are kept around during the entire process, so
// that the measured performance is representative of applications
// that maintain some live in-memory data. One of these is a tree
// containing many pointers. The other is a large array containing
// double precision floating point numbers. Both should be of comparable
// size.
//
// The results are only really meaningful together with a specification
// of how much memory was used. It is possible to trade memory for
// better time performance. This benchmark should be run in a 32 MB
// heap, though we don't currently know how to enforce that uniformly.
//
// Unlike the original Ellis and Kovac benchmark, we do not attempt
// measure pause times. This facility should eventually be added back
// in. There are several reasons for omitting it for now. The original
// implementation depended on assumptions about the thread scheduler
// that don't hold uniformly. The results really measure both the
// scheduler and GC. Pause time measurements tend to not fit well with
// current benchmark suites. As far as we know, none of the current
// commercial Java implementations seriously attempt to minimize GC pause
// times.
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
#ifdef GC
# ifndef LINUX_THREADS
# define LINUX_THREADS
# endif
# ifndef _REENTRANT
# define _REENTRANT
# endif
# ifdef LOCAL
# define GC_REDIRECT_TO_LOCAL
# include "gc_local_alloc.h"
# endif
# include "gc.h"
#endif
#ifndef NTHREADS
# define NTHREADS 1
#endif
#ifdef PROFIL
extern void init_profiling();
extern dump_profile();
#endif
// These macros were a quick hack for the Macintosh.
//
// #define currentTime() clock()
// #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)
#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)
/* Get the current time in milliseconds */
unsigned
stats_rtclock( void )
{
struct timeval t;
struct timezone tz;
if (gettimeofday( &t, &tz ) == -1)
return 0;
return (t.tv_sec * 1000 + t.tv_usec / 1000);
}
static const int kStretchTreeDepth = 18; // about 16Mb
static const int kLongLivedTreeDepth = 16; // about 4Mb
static const int kArraySize = 500000; // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;
typedef struct Node0_struct {
struct Node0_struct * left;
struct Node0_struct * right;
int i, j;
} Node0;
#ifdef HOLES
# define HOLE() GC_NEW(Node0);
#else
# define HOLE()
#endif
typedef Node0 *Node;
void init_Node(Node me, Node l, Node r) {
me -> left = l;
me -> right = r;
}
#ifndef GC
void destroy_Node(Node me) {
if (me -> left) {
destroy_Node(me -> left);
}
if (me -> right) {
destroy_Node(me -> right);
}
free(me);
}
#endif
// Nodes used by a tree of a given size
static int TreeSize(int i) {
return ((1 << (i + 1)) - 1);
}
// Number of iterations to use for a given tree depth
static int NumIters(int i) {
return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}
// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
if (iDepth<=0) {
return;
} else {
iDepth--;
# ifdef GC
thisNode->left = GC_NEW(Node0); HOLE();
thisNode->right = GC_NEW(Node0); HOLE();
# else
thisNode->left = calloc(1, sizeof(Node0));
thisNode->right = calloc(1, sizeof(Node0));
# endif
Populate (iDepth, thisNode->left);
Populate (iDepth, thisNode->right);
}
}
// Build tree bottom-up
static Node MakeTree(int iDepth) {
Node result;
if (iDepth<=0) {
# ifndef GC
result = calloc(1, sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
/* result is implicitly initialized in both cases. */
return result;
} else {
Node left = MakeTree(iDepth-1);
Node right = MakeTree(iDepth-1);
# ifndef GC
result = malloc(sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
init_Node(result, left, right);
return result;
}
}
static void PrintDiagnostics() {
#if 0
long lFreeMemory = Runtime.getRuntime().freeMemory();
long lTotalMemory = Runtime.getRuntime().totalMemory();
System.out.print(" Total memory available="
+ lTotalMemory + " bytes");
System.out.println(" Free memory=" + lFreeMemory + " bytes");
#endif
}
static void TimeConstruction(int depth) {
long tStart, tFinish;
int iNumIters = NumIters(depth);
Node tempTree;
int i;
printf("0x%x: Creating %d trees of depth %d\n", pthread_self(), iNumIters, depth);
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
# ifndef GC
tempTree = calloc(1, sizeof(Node0));
# else
tempTree = GC_NEW(Node0);
# endif
Populate(depth, tempTree);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\t0x%x: Top down construction took %d msec\n",
pthread_self(), elapsedTime(tFinish - tStart));
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
tempTree = MakeTree(depth);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\t0x%x: Bottom up construction took %d msec\n",
pthread_self(), elapsedTime(tFinish - tStart));
}
void * run_one_test(void * arg) {
int d;
for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
TimeConstruction(d);
}
}
int main() {
Node root;
Node longLivedTree;
Node tempTree;
long tStart, tFinish;
long tElapsed;
int i;
double *array;
#ifdef GC
// GC_full_freq = 30;
// GC_free_space_divisor = 16;
// GC_enable_incremental();
#endif
# if defined(GC) && defined(LOCAL)
GC_thr_init();
# endif
printf("Garbage Collector Test\n");
printf(" Live storage will peak at %d bytes.\n\n",
2 * sizeof(Node0) * TreeSize(kLongLivedTreeDepth) +
sizeof(double) * kArraySize);
printf(" Stretching memory with a binary tree of depth %d\n",
kStretchTreeDepth);
PrintDiagnostics();
# ifdef PROFIL
init_profiling();
# endif
tStart = currentTime();
// Stretch the memory space quickly
tempTree = MakeTree(kStretchTreeDepth);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
// Create a long lived object
printf(" Creating a long-lived binary tree of depth %d\n",
kLongLivedTreeDepth);
# ifndef GC
longLivedTree = calloc(1, sizeof(Node0));
# else
longLivedTree = GC_NEW(Node0);
# endif
Populate(kLongLivedTreeDepth, longLivedTree);
// Create long-lived array, filling half of it
printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
array = malloc(kArraySize * sizeof(double));
# else
# ifndef NO_PTRFREE
array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
# else
array = GC_MALLOC(sizeof(double) * kArraySize);
# endif
# endif
for (i = 0; i < kArraySize/2; ++i) {
array[i] = 1.0/i;
}
{
pthread_t thread[NTHREADS];
for (i = 1; i < NTHREADS; ++i) {
int code;
if ((code = pthread_create(thread+i, 0, run_one_test, 0)) != 0) {
fprintf(stderr, "Thread creation failed %u\n", code);
exit(1);
}
}
/* We use the main thread to run one test. This allows */
/* profiling to work, for example. */
run_one_test(0);
for (i = 1; i < NTHREADS; ++i) {
int code;
if ((code = pthread_join(thread[i], 0)) != 0) {
fprintf(stderr, "Thread join failed %u\n", code);
}
}
}
PrintDiagnostics();
if (longLivedTree == 0 || array[1000] != 1.0/1000)
fprintf(stderr, "Failed\n");
// fake reference to LongLivedTree
// and array
// to keep them from being optimized away
tFinish = currentTime();
tElapsed = elapsedTime(tFinish-tStart);
PrintDiagnostics();
printf("Completed in %d msec\n", tElapsed);
# ifdef GC
printf("Completed %d collections\n", GC_gc_no);
printf("Heap size is %d\n", GC_get_heap_size());
# endif
# ifdef PROFIL
dump_profile();
# endif
}
* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o MT_GCBench MT_GCBench.c -lgc
* 실행하기
$ ./MT_GCBench
Garbage Collector Test
Live storage will peak at 8194272 bytes.
Stretching memory with a binary tree of depth 18
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
0x4b0038: Top down construction took 379 msec
0x4b0038: Bottom up construction took 353 msec
0x4b0038: Creating 8256 trees of depth 6
0x4b0038: Top down construction took 380 msec
0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 2052 trees of depth 8
0x4b0038: Top down construction took 379 msec
0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 512 trees of depth 10
0x4b0038: Top down construction took 382 msec
0x4b0038: Bottom up construction took 355 msec
0x4b0038: Creating 128 trees of depth 12
0x4b0038: Top down construction took 386 msec
0x4b0038: Bottom up construction took 361 msec
0x4b0038: Creating 32 trees of depth 14
0x4b0038: Top down construction took 383 msec
0x4b0038: Bottom up construction took 360 msec
0x4b0038: Creating 8 trees of depth 16
0x4b0038: Top down construction took 389 msec
0x4b0038: Bottom up construction took 373 msec
Completed in 5423 msec
* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o MT_GCBench MT_GCBench.c -lgc
* 실행하기
$ ./MT_GCBench
Garbage Collector Test
Live storage will peak at 8194272 bytes.
Stretching memory with a binary tree of depth 18
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
0x4b0038: Top down construction took 52 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 8256 trees of depth 6
0x4b0038: Top down construction took 44 msec
0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 2052 trees of depth 8
0x4b0038: Top down construction took 38 msec
0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 512 trees of depth 10
0x4b0038: Top down construction took 38 msec
0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 128 trees of depth 12
0x4b0038: Top down construction took 44 msec
0x4b0038: Bottom up construction took 38 msec
0x4b0038: Creating 32 trees of depth 14
0x4b0038: Top down construction took 38 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 8 trees of depth 16
0x4b0038: Top down construction took 38 msec
0x4b0038: Bottom up construction took 51 msec
Completed in 651 msec
Completed 25 collections
Heap size is 28975104
* 다중 쓰레드에서 쓰레기 수집 벤치마크 테스트 2 (파일명: MT_GCBench2.c)
// This is version 2 of the multithreaded GC Bench.
// Heap expansion is handled differently from version 1, in an attempt
// to make scalability measurements more meaningful. The version with
// N threads now immediately expands the heap to N*32MB.
//
// To run this with BDWGC versions 6 and later with thread local allocation,
// define GC and LOCAL. Without thread-local allocation, define just GC.
// To run it with the University of Tokyo scalable GC,
// define SGC. To run it with malloc and explicit deallocation, define
// none of these. (This should also work for Hoard.)
//
// Note that defining GC or SGC removes the explicit deallocation passes,
// which seems fair.
//
// This is adapted from a benchmark written by John Ellis and Pete Kovac
// of Post Communications.
// It was modified by Hans Boehm of Silicon Graphics.
// Translated to C++ 30 May 1997 by William D Clinger of Northeastern Univ.
// Translated to C 15 March 2000 by Hans Boehm, now at HP Labs.
// Adapted to run NTHREADS client threads concurrently. Each
// thread executes the original benchmark. 12 June 2000 by Hans Boehm.
// Changed heap expansion rule, and made the number of threads run-time
// configurable. 25 Oct 2000 by Hans Boehm.
//
// This is no substitute for real applications. No actual application
// is likely to behave in exactly this way. However, this benchmark was
// designed to be more representative of real applications than other
// Java GC benchmarks of which we were aware at the time.
// It still doesn't seem too bad for something this small.
// It attempts to model those properties of allocation requests that
// are important to current GC techniques.
// It is designed to be used either to obtain a single overall performance
// number, or to give a more detailed estimate of how collector
// performance varies with object lifetimes. It prints the time
// required to allocate and collect balanced binary trees of various
// sizes. Smaller trees result in shorter object lifetimes. Each cycle
// allocates roughly the same amount of memory.
// Two data structures are kept around during the entire process, so
// that the measured performance is representative of applications
// that maintain some live in-memory data. One of these is a tree
// containing many pointers. The other is a large array containing
// double precision floating point numbers. Both should be of comparable
// size.
//
// The results are only really meaningful together with a specification
// of how much memory was used. This versions of the benchmark tries
// to preallocate a sufficiently large heap that expansion should not be
// needed.
//
// Unlike the original Ellis and Kovac benchmark, we do not attempt
// measure pause times. This facility should eventually be added back
// in. There are several reasons for omitting it for now. The original
// implementation depended on assumptions about the thread scheduler
// that don't hold uniformly. The results really measure both the
// scheduler and GC. Pause time measurements tend to not fit well with
// current benchmark suites. As far as we know, none of the current
// commercial Java implementations seriously attempt to minimize GC pause
// times.
//
// Since this benchmark has recently been more widely used, some
// anomalous behavious has been uncovered. The user should be aware
// of this:
// 1) Nearly all objects are of the same size. This benchmark is
// not useful for analyzing fragmentation behavior. It is unclear
// whether this is an issue for well-designed allocators.
// 2) Unless HOLES is defined, it tends to drop consecutively allocated
// memory at the same time. Many real applications do exhibit this
// phenomenon, but probably not to this extent. (Defining HOLES tends
// to move the benchmark to the opposite extreme.)
// 3) It appears harder to predict object lifetimes than for most real
// Java programs (see T. Harris, "Dynamic adptive pre-tenuring",
// ISMM '00).
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <pthread.h>
#ifdef GC
# ifndef LINUX_THREADS
# define LINUX_THREADS
# endif
# ifndef _REENTRANT
# define _REENTRANT
# endif
# ifdef LOCAL
# define GC_REDIRECT_TO_LOCAL
# include "gc_local_alloc.h"
# endif
# include "gc.h"
#endif
#ifdef SGC
# include "sgc.h"
# define GC
# define pthread_create GC_pthread_create
# define pthread_join GC_pthread_join
#endif
#define MAX_NTHREADS 1024
int nthreads = 0;
#ifdef PROFIL
extern void init_profiling();
extern dump_profile();
#endif
// These macros were a quick hack for the Macintosh.
//
// #define currentTime() clock()
// #define elapsedTime(x) ((1000*(x))/CLOCKS_PER_SEC)
#define currentTime() stats_rtclock()
#define elapsedTime(x) (x)
/* Get the current time in milliseconds */
unsigned
stats_rtclock( void )
{
struct timeval t;
struct timezone tz;
if (gettimeofday( &t, &tz ) == -1)
return 0;
return (t.tv_sec * 1000 + t.tv_usec / 1000);
}
static const int kStretchTreeDepth = 18; // about 16Mb
static const int kLongLivedTreeDepth = 16; // about 4Mb
static const int kArraySize = 500000; // about 4Mb
static const int kMinTreeDepth = 4;
static const int kMaxTreeDepth = 16;
typedef struct Node0_struct {
struct Node0_struct * left;
struct Node0_struct * right;
int i, j;
} Node0;
#ifdef HOLES
# define HOLE() GC_NEW(Node0);
#else
# define HOLE()
#endif
typedef Node0 *Node;
void init_Node(Node me, Node l, Node r) {
me -> left = l;
me -> right = r;
}
#ifndef GC
void destroy_Node(Node me) {
if (me -> left) {
destroy_Node(me -> left);
}
if (me -> right) {
destroy_Node(me -> right);
}
free(me);
}
#endif
// Nodes used by a tree of a given size
static int TreeSize(int i) {
return ((1 << (i + 1)) - 1);
}
// Number of iterations to use for a given tree depth
static int NumIters(int i) {
return 2 * TreeSize(kStretchTreeDepth) / TreeSize(i);
}
// Build tree top down, assigning to older objects.
static void Populate(int iDepth, Node thisNode) {
if (iDepth<=0) {
return;
} else {
iDepth--;
# ifdef GC
thisNode->left = GC_NEW(Node0); HOLE();
thisNode->right = GC_NEW(Node0); HOLE();
# else
thisNode->left = calloc(1, sizeof(Node0));
thisNode->right = calloc(1, sizeof(Node0));
# endif
Populate (iDepth, thisNode->left);
Populate (iDepth, thisNode->right);
}
}
// Build tree bottom-up
static Node MakeTree(int iDepth) {
Node result;
if (iDepth<=0) {
# ifndef GC
result = calloc(1, sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
/* result is implicitly initialized in both cases. */
return result;
} else {
Node left = MakeTree(iDepth-1);
Node right = MakeTree(iDepth-1);
# ifndef GC
result = malloc(sizeof(Node0));
# else
result = GC_NEW(Node0); HOLE();
# endif
init_Node(result, left, right);
return result;
}
}
static void PrintDiagnostics() {
#if 0
long lFreeMemory = Runtime.getRuntime().freeMemory();
long lTotalMemory = Runtime.getRuntime().totalMemory();
System.out.print(" Total memory available="
+ lTotalMemory + " bytes");
System.out.println(" Free memory=" + lFreeMemory + " bytes");
#endif
}
static void TimeConstruction(int depth) {
long tStart, tFinish;
int iNumIters = NumIters(depth);
Node tempTree;
int i;
printf("0x%x: Creating %d trees of depth %d\n", pthread_self(), iNumIters, depth);
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
# ifndef GC
tempTree = calloc(1, sizeof(Node0));
# else
tempTree = GC_NEW(Node0);
# endif
Populate(depth, tempTree);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\t0x%x: Top down construction took %d msec\n",
pthread_self(), elapsedTime(tFinish - tStart));
tStart = currentTime();
for (i = 0; i < iNumIters; ++i) {
tempTree = MakeTree(depth);
# ifndef GC
destroy_Node(tempTree);
# endif
tempTree = 0;
}
tFinish = currentTime();
printf("\t0x%x: Bottom up construction took %d msec\n",
pthread_self(), elapsedTime(tFinish - tStart));
}
void * run_one_test(void * arg) {
int d, i;
Node longLivedTree;
double *array;
/* size_t initial_bytes = GC_get_total_bytes(); */
// Create a long lived object
printf(" Creating a long-lived binary tree of depth %d\n",
kLongLivedTreeDepth);
# ifndef GC
longLivedTree = calloc(1, sizeof(Node0));
# else
longLivedTree = GC_NEW(Node0);
# endif
Populate(kLongLivedTreeDepth, longLivedTree);
// Create long-lived array, filling half of it
printf(" Creating a long-lived array of %d doubles\n", kArraySize);
# ifndef GC
array = malloc(kArraySize * sizeof(double));
# else
# ifndef NO_PTRFREE
array = GC_MALLOC_ATOMIC(sizeof(double) * kArraySize);
# else
array = GC_MALLOC(sizeof(double) * kArraySize);
# endif
# endif
for (i = 0; i < kArraySize/2; ++i) {
array[i] = 1.0/i;
}
for (d = kMinTreeDepth; d <= kMaxTreeDepth; d += 2) {
TimeConstruction(d);
}
/* printf("Allocated %ld bytes before start, %ld after\n",
initial_bytes, GC_get_total_bytes() - initial_bytes); */
if (longLivedTree->left -> right == 0 || array[1000] != 1.0/1000)
fprintf(stderr, "Failed\n");
// fake reference to LongLivedTree
// and array
// to keep them from being optimized away
}
int main(int argc, char **argv) {
Node root;
Node tempTree[MAX_NTHREADS];
long tStart, tFinish;
long tElapsed;
int i;
# ifdef SGC
SGC_attr_t attr;
# endif
if (1 == argc) {
nthreads = 1;
} else if (2 == argc) {
nthreads = atoi(argv[1]);
if (nthreads < 1 || nthreads > MAX_NTHREADS) {
fprintf(stderr, "Invalid # of threads argument\n");
exit(1);
}
} else {
fprintf(stderr, "Usage: %s [# of threads]\n");
exit(1);
}
# if defined(SGC)
/* The University of Tokyo collector needs explicit */
/* initialization. */
SGC_attr_init(&attr);
SGC_init(nthreads, &attr);
# endif
#ifdef GC
// GC_full_freq = 30;
// GC_free_space_divisor = 16;
// GC_enable_incremental();
#endif
printf("Garbage Collector Test\n");
printf(" Live storage will peak at %d bytes or less .\n\n",
2 * sizeof(Node0) * nthreads
* (TreeSize(kLongLivedTreeDepth) + TreeSize(kMaxTreeDepth))
+ sizeof(double) * kArraySize);
PrintDiagnostics();
# ifdef GC
/* GC_expand_hp fails with empty heap */
GC_malloc(1);
GC_expand_hp(32*1024*1024*nthreads);
# endif
# ifdef PROFIL
init_profiling();
# endif
tStart = currentTime();
{
pthread_t thread[MAX_NTHREADS];
for (i = 1; i < nthreads; ++i) {
int code;
if ((code = pthread_create(thread+i, 0, run_one_test, 0)) != 0) {
fprintf(stderr, "Thread creation failed %u\n", code);
exit(1);
}
}
/* We use the main thread to run one test. This allows */
/* profiling to work, for example. */
run_one_test(0);
for (i = 1; i < nthreads; ++i) {
int code;
if ((code = pthread_join(thread[i], 0)) != 0) {
fprintf(stderr, "Thread join failed %u\n", code);
}
}
}
PrintDiagnostics();
tFinish = currentTime();
tElapsed = elapsedTime(tFinish-tStart);
PrintDiagnostics();
printf("Completed in %d msec\n", tElapsed);
# ifdef GC
printf("Completed %d collections\n", GC_gc_no);
printf("Heap size is %d\n", GC_get_heap_size());
# endif
# ifdef PROFIL
dump_profile();
# endif
return 0;
}
* 쓰레기 수집기가 동작하지 않도록 컴파일하기
$ gcc -o MT_GCBench2 MT_GCBench2.c -lgc
* 실행하기
$ ./MT_GCBench2
Garbage Collector Test
Live storage will peak at 12388544 bytes or less .
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
0x4b0038: Top down construction took 396 msec
0x4b0038: Bottom up construction took 379 msec
0x4b0038: Creating 8256 trees of depth 6
0x4b0038: Top down construction took 399 msec
0x4b0038: Bottom up construction took 396 msec
0x4b0038: Creating 2052 trees of depth 8
0x4b0038: Top down construction took 385 msec
0x4b0038: Bottom up construction took 368 msec
0x4b0038: Creating 512 trees of depth 10
0x4b0038: Top down construction took 419 msec
0x4b0038: Bottom up construction took 380 msec
0x4b0038: Creating 128 trees of depth 12
0x4b0038: Top down construction took 399 msec
0x4b0038: Bottom up construction took 360 msec
0x4b0038: Creating 32 trees of depth 14
0x4b0038: Top down construction took 414 msec
0x4b0038: Bottom up construction took 358 msec
0x4b0038: Creating 8 trees of depth 16
0x4b0038: Top down construction took 415 msec
0x4b0038: Bottom up construction took 402 msec
Completed in 5504 msec
* 쓰레기 수집기가 동작하도록 컴파일하기
$ gcc -DGC -o MT_GCBench2 MT_GCBench2.c -lgc
* 실행하기
$ ./MT_GCBench2
Garbage Collector Test
Live storage will peak at 12388544 bytes or less .
Creating a long-lived binary tree of depth 16
Creating a long-lived array of 500000 doubles
0x4b0038: Creating 33824 trees of depth 4
0x4b0038: Top down construction took 54 msec
0x4b0038: Bottom up construction took 41 msec
0x4b0038: Creating 8256 trees of depth 6
0x4b0038: Top down construction took 39 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 2052 trees of depth 8
0x4b0038: Top down construction took 39 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 512 trees of depth 10
0x4b0038: Top down construction took 38 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 128 trees of depth 12
0x4b0038: Top down construction took 39 msec
0x4b0038: Bottom up construction took 39 msec
0x4b0038: Creating 32 trees of depth 14
0x4b0038: Top down construction took 39 msec
0x4b0038: Bottom up construction took 40 msec
0x4b0038: Creating 8 trees of depth 16
0x4b0038: Top down construction took 40 msec
0x4b0038: Bottom up construction took 42 msec
Completed in 581 msec
Completed 14 collections
Heap size is 33619968
'프로그래밍 > C' 카테고리의 다른 글
오일러(Euler) phi 함수 구현하기 (0) | 2013.12.14 |
---|---|
long long 타입의 정수를 printf 함수로 출력하기 (0) | 2013.12.03 |
이진 파일을 읽어서 16진수로 보여주는 HexView 소스 with C (0) | 2013.08.05 |
Visual C 2010 으로 컴파일하여 실행해 본 OpenGL 예제: Redbook 의 Teapots (0) | 2013.05.17 |
Visual C 2010 으로 컴파일하여 실행해 본 OpenGL 예제: Redbook 의 Cube (0) | 2013.05.10 |