Compilers, C Libraries and GCC


I like compilers, i think they are great. A long time ago i was involved (admittedly only with the documentation) with the DICE compiler on the Amiga. Now that I'm doing embedded and linux work professionally I'm mostly involved with GCC which for the uninformed is the Gnu Compiler Collection.

Writing a C compiler is actually not that difficult, VBCC, for a good optimising multi backended compiler that is small enough to read through and understand in a day. C++ compilers however are a massive step up from that. This is where GCC stands out as a multi language (called front end) system that supports C,C++,ObjC,ObjC++,Fortran,Ada,Java,Pascal and many others. It has a common optimising middle and back ends for more CPU and operating system combinations that i could list here.

If you care about compilers becoming slower over time, take a look at the bottom of the page.



GCC and newlib for an unsupported Embedded Target

F or, most operating systems either Embedded or mainstream a compiler is supplied as part of the package, e.g WindowsCE comes with Embedded Visual Studio. The advantage of this is that EVS knows how to use the CE OS functionality to do exception handling and C++ support and debugging etc.

Its disadvantage is that you are stuck with that compiler and you don't get to learn how C++ works ! The following are my notes on getting GCC 3.x (2.96 is fairly similar) and GCC 4.x to work with Nucleus, although the examples are common enough that they would work with any (threaded, not pollOS) system.

A modified crosstool script to build gcc 3.x for nucleus
A modified crosstool script to build gcc 4.x for nucleus


C++ Support
Adding support for GCC's C++ language features covers a number of OS requirements, stack unwind support, global constructors and destructor's and static object thread safety. There are two implementations of c++ exceptions, the setjump/longjump method, --enable-sjlj-exceptions, which works on all platforms but is very slow and crap and the libunwind method which is much better, faster but only works on certain platforms.

There are many implications of c++ support, global constructors (are you using them ?), exception handling, rtti (for dynamic_cast) and implementing platform specific locking in gcc, i would *really* recommend that you don't use c++ exceptions in embedded projects, however i did, but its quite a lot of work. For a start i would recommend that you start by reading the code in gcc/unwind.XXX and preferably having a basic understanding of it.

For c++ exception support, using libunwind, you need to include a header from gcc, that defines the dwarf2 object format used for stack unwinding #include "unwind-dw2-fde.h" and add #define EH_FRAME_SECTION_ASM_OP ".section\t.eh_frame,\"aw\"" which will add a placer in the symbol table for your exception handling section support. Next add a pointer to begin of .eh_frame section extern "C" void * __eh_frame_start;, you then need to connect your app to gcc's frame unwind support, with some code like the following.

static struct object object;
// exception handling frame information
__register_frame_info ((void *)&__eh_frame_start, &object);


// test code
try 
	{
	try
	{
		try
		{
			throw (float) 1.2;
		}
		catch (float)
		{
			throw (int) 1;
		}
	}
	catch (int)
	{
		throw;
	}
}
catch (...)
{
	double here;
}


Global constructors and destructor's are static C++ objects that are run before main, we need to simulate that behaviour on our systems, so you will need code like the following to get a handle into the sections that GCC creates in which all static constructors and destructor's are linked as a list of function pointers.
/*
        gas code to access elf sections
*/
#define CTORS_SECTION_ASM_OP ".section\t.ctors,\"aw\""
#define DTORS_SECTION_ASM_OP ".section\t.dtors,\"aw\""

/*
        create a __CTOR_LIST__ from the section .dtors
*/
asm (CTORS_SECTION_ASM_OP);     /* cc1 doesn't know that we are switching! */
static void (* __CTOR_LIST__[1]) (void) __attribute__ ((section (".ctors"))) = 
{ (void (*) (void)) -1 };
asm (".previous"); /* go back */

/*
        create a __DTOR_LIST__ from the section .dtors
*/
asm (DTORS_SECTION_ASM_OP);     /* cc1 doesn't know that we are switching! */
static void (* __DTOR_LIST__[1]) (void) __attribute__ ((section (".dtors"))) = 
{ (void (*) (void)) -1 };
asm (".previous"); /* go back */


So now we have a handle we need to run through the list of function pointers and run them, note that there is no standard way of doing this, (forwards, backwards etc), this is why your global constuctors should never reply on the order of execution. So here is some example code demonstrate.
/*
run global destructor's
*/
static void __attribute__((used))
do_global_dtors_aux (void)
{
	static func_ptr *p = __DTOR_LIST__ + 1;
	static bool completed =0;
	func_ptr f;

	if (completed)
	return;

	while ((f = *p))
	{
		p++;
		f ();
	}

	// exception handling shutdown 
	__deregister_frame_info ((void *)&__eh_frame_start);

	completed = 1;
}

/*
run global constructors
*/
static void __attribute__((used)) 
do_global_ctors_aux (void)
{
	static func_ptr *p = __CTOR_LIST__ + 1;
	static int completed = 0;

	if (completed)
	return;
			
	/*
		mjf - now run through and find end marker (null)
		and then execute the function pointers backwards, just like linux does :-)
	*/
	while ((*p) && (*p != (func_ptr)-1))
	{
		p++;
	}

	// there is a trailing null, so go back one
	p--;

	// run all the functions in the list, 
	// right back to the start of list (0xfffffff)
	while (*p != (func_ptr)-1)
	{
		(*(p)) ();
		p--;
	}

	static struct object object;

	// exception handling frame information
	__register_frame_info ((void *)&__eh_frame_start, &object);

	completed = 1;
}


So ok so now you have to decide where in your OS startup your going to call this, note these constructors are going to require a working 'new' call so you better have your heap setup, then just call 'do_global_ctors_aux();' someplace before main().


Threading/Reentrancy Support
Adding support for GCC's and newlibs threading and reentrancy features for pre-emptive operation require the implementing of normal and recursive locking functions.

If your going to use newlib in a multithread / pre-emptive environment then the following is quite important when building newlib to pass '--enable-newlib-multithread=yes' on the configure line and to your makefiles add '-DREENTRANT_SYSCALLS_PROVIDED -D__DYNAMIC_REENT__' to make newlib aware of the environment that your operating in.

As and aside if you have a hardware floating point unit and your compiler supports it adding '--enable-newlib-hw-fp' will make newlib use its non-integer only math routines.

You may also want to add '-DHAVE_STAMFORD -DWANT_PRINTF_LONG_LONG' depending on your requirements.

The following code is a critical element that swaps the reentrant structure pointer when a task swap occurs.

// Function to add multithread support to newlib
	struct _reent * __getreent (void)
	{
		NU_HISR *HisrPtr;
		NU_TASK *TaskPtr;
		
		if ((HisrPtr = TCC_Current_HISR_Pointer()) == NULL)
		{
			// Running in normal task mode
			if ((TaskPtr = TCC_Current_Task_Pointer()) == NULL)
			{
				// No valid tasks are running currently return global space
				return _impure_ptr;
			}
			return TaskPtr->_impure_ptr;
		}
		return HisrPtr->_impure_ptr;
	}


Threading and locks, you can either patch gcc to support your platform, or be a paraiste and use an exisiting target to enable your platform spesific threads/locking system for your RTOS, (this is required for c++ exceptions). It is *much* easier to just use and existing threading configuration so add '--enable-threads=rtems' to your gcc configuration, here is my example code, note that this is targeted towards to Nucleus (hense NU_XX calls) OS, but if your OS supports MUTEX or SEMAPHORE primatives it should map over fairly easily.

I should note here that GCC 4.x requires an implementation of *reqursive* locks, not shown here, but thats not difficult.
// will be 2, if task switching started
extern "C" INT INC_Initialize_State;

/* avoid dependency on rtems specific headers */
typedef void *__gthread_key_t;
typedef int � __gthread_once_t;
typedef void *__gthread_mutex_t;

/* 
mutex support 
*/

unsigned int rtems_gxx_mutex_lock_calls =0, 
			rtems_gxx_mutex_trylock_calls =0, 
			rtems_gxx_mutex_unlock_calls =0, 
			rtems_gxx_mutex_init_calls =0;


extern "C" void rtems_gxx_mutex_init(__gthread_mutex_t *mutex)
{
	rtems_gxx_mutex_init_calls++;
	
	*mutex = malloc(sizeof(NU_SEMAPHORE));
	
	NU_Create_Semaphore((NU_SEMAPHORE *)(*mutex), "GCC Sema", 1, NU_PRIORITY);
}

extern "C" int rtems_gxx_mutex_lock(__gthread_mutex_t *mutex)
{
	rtems_gxx_mutex_lock_calls++;
	int Status = -1;

	if (INC_Initialize_State == 2)
	{
		if (NU_Obtain_Semaphore((NU_SEMAPHORE *)(*mutex), NU_SUSPEND) == NU_SUCCESS)
		{
			Status = 0;
		}
		else
		{
			Status = 0;
		}
	}
		
	return Status;
}

extern "C" int rtems_gxx_mutex_trylock(__gthread_mutex_t *mutex)
{
	rtems_gxx_mutex_trylock_calls++;
	int Status = -1;

	if (INC_Initialize_State == 2)
	{
		// Don't currently own the lock see whether it can be obtained 
		if (NU_Obtain_Semaphore((NU_SEMAPHORE *)(*mutex),NU_NO_SUSPEND) == NU_SUCCESS)
		{
			Status = 0;
		}
		else
		{
			Status = 0;
		}
	}
	
	return Status;
}

extern "C" int rtems_gxx_mutex_unlock(__gthread_mutex_t *mutex)
{
	rtems_gxx_mutex_unlock_calls++;
	int Status = -1;

	if (INC_Initialize_State == 2)
	{
		if (NU_Release_Semaphore((NU_SEMAPHORE *)(*mutex)) == NU_SUCCESS)
		{
			Status = 0;
		}
		else
		{
			Status = 0;
		}
	}
		
	return Status;
}


File i/o locking functions for newlib, note the SCX_XX calls allow reqursive calls, and the NU_XX (Nucleus) ones don t. This is important to how things work. See the following code for examples of locking functions.
//
// file i/o locking functions for newlib
//

/* avoid depedency on newlib specific headers */
typedef void * _LOCK_T;
typedef void * _LOCK_RECURSIVE_T;

//#define __LOCK_INIT(class,lock) class _LOCK_T lock;
//#define __LOCK_INIT_RECURSIVE(class,lock) class _LOCK_RECURSIVE_T lock;
/*
typedef struct stub_critical_section
{
	NU_SEMAPHORE CriticalSection; // Locks the 
	int		count;
	NU_TASK * thread;
		
	critical_section()
	{
		CriticalSection.sm_id = 0;
		thread = NULL;
		count = 0;
	};
} CRITICAL_SECTION;
*/

// will be 2, if task switching started
// extern "C" INT INC_Initialize_State;


extern "C" void __local_lock_init(_LOCK_T lock)
{
	if (INC_Initialize_State == 2)
	{
		// only first time
		if (*((void **)lock) == 0)
		{
			*((void **)lock) = malloc(sizeof(CRITICAL_SECTION));
		
			SCXInitializeCriticalSection(*(CRITICAL_SECTION **)lock);
		}
	}
}
extern "C" void __local_lock_init_recursive(_LOCK_RECURSIVE_T lock)
{
	if (INC_Initialize_State == 2)
	{
		// only first time
		if (*((void **)lock) == 0)
		{
			*((void **)lock) = malloc(sizeof(CRITICAL_SECTION));
						
			SCXInitializeCriticalSection(*(CRITICAL_SECTION **)lock);
		}
	}
}

extern "C" void __local_lock_close(_LOCK_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXDeleteCriticalSection(*(CRITICAL_SECTION **)lock);
				
		// free when no longer held
		if ( (*(CRITICAL_SECTION **)lock)->count == 0)
		{
			free(*((void **)lock));
			*((void **)lock) = 0;
		}
	}
}
extern "C" void __local_lock_close_recursive(_LOCK_RECURSIVE_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXDeleteCriticalSection(*(CRITICAL_SECTION **)lock);
				
		// free when no longer held
		if ( (*(CRITICAL_SECTION **)lock)->count == 0)
		{
			free(*((void **)lock));
			*((void **)lock) = 0;
		}
	}
}

extern "C" void __local_lock_acquire(_LOCK_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXEnterCriticalSection(*(CRITICAL_SECTION **)lock);
	}
}
extern "C" void __local_lock_acquire_recursive(_LOCK_RECURSIVE_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXEnterCriticalSection(*(CRITICAL_SECTION **)lock);
	}
}

/*
not used by newlib
extern "C" int __lock_try_acquire(_LOCK_T lock)
{
}
extern "C" int __lock_try_acquire_recursive(_LOCK_T lock)
{
}
*/

extern "C" void __local_lock_release(_LOCK_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXLeaveCriticalSection(*(CRITICAL_SECTION **)lock);
	}
}
extern "C" void __local_lock_release_recursive(_LOCK_RECURSIVE_T lock)
{
	if (INC_Initialize_State == 2)
	{
		SCXLeaveCriticalSection(*(CRITICAL_SECTION **)lock);
	}
}


Well thats about it, now you have to go do some work for yourself. For more information try lurking on the crossgcc/gcc and newlib mailing lists a lot of good information passes by there.
Compilers getting slow,.. so what ?

S so what if compilers are getting slower ?, GCC is far slower in 'runtime' as i visual C++. I know i have used them for years, GCC i started with egcs (gcc 2.91) on a 8mb Amiga, with a Motorola m86k (68020), it was sodding slow at the time, much slower than DICE or the Amiga, but much quicker than current compilers.

Anyhow here is a little benchmark to show that CPU speeds have stomped all over any flabby code in compilers, ignoring that you may have to sawp and become disk bound.
/* ------------------------------------------------------------------ */
/* $VER: benchmark.h 0.1.1 (started 13.05.2000)                       */
/*                                                                    */
/* CPU based absolute benchmarks                                      */
/*                                                                    */
/* (C) Copyright 2000 Matthew J Fletcher - All Rights Reserved.       */
/* amimjf@connectfree.co.uk - www.amimjf.connectfree.co.uk            */
/* ------------------------------------------------------------------ */

/* NOTES:

This should produce about the same ass source on all compilers, there is
no real opportunity for optimisation, i had to defeat gcc by using the _count
as the sqrt input, its optimiser is much to smart !!

17-03-2001 - added CLOCKS_PER_SEC for linux gcc tests, to give proper results
in all conditions (was 60).

17-03-2002 (yes freaky) - upped tests to 100000000 (8 zeros), because new
processors were getting to quick for the old one, i manually recalculated
the previous results, they are liniar anyhow.

11-10-2007 - new OS, compiler and hardware.
*/

#include 
#include 
#include 
#include 

void brkfunc(int);
int main (void);

long int _count=0; // force the issue
clock_t start, end;
float total;


int main (void) {
double isroot=0;

	printf("\nMatthews Crap Benchmark 0.1.1\n\n");
	printf("If you get bored, ctrl-c will stop benchmarker\n");
	printf("Some resuults...\n");
	printf("Compiled with.. 'gcc benchmark.c -o bench -lm' (without optimisation !!)\n\n");
	printf("MacOS 8.5 PPC 603ev @ 225mhz                 = 88.33 secs - (Metroworks CodeWarrior v2.1)\n");
	printf("AmigaOS 3.0 68k 020 @ 14mhz                  = 2745.21 secs - (amigaos egcs2.91 / ixemul)\n");
	printf("Win NT4 IntelPII @ 400mhz                    = 119.23 secs - (borlandC++ 4)\n");
	printf("Linux 2.4.2 AMD K6II @ 500mhz                = 59.72 secs - (gcc2.95.2)\n");
	printf("Linux 2.4.8 AMD Duron @ 900mhz               = 13.64 secs - (gcc3.0.1)\n");
	printf("MacOS X 10.1.3 (ppc-darwin) G4 PPC @ 867mhz  = 23.46 secs - (gcc2.95.2)\n");
	printf("Linux 2.6.22 AMD Dual Core 3800 @ 2GHz       = 1.66 secs - (gcc4.2.1)\n");
	printf("Linux 2.6.31 AMD Quad Core X4 B50 @ 3.2GHZ   = 0.85 secs - (gcc4.4.1)\n");

	printf("\n");

	start = clock(); // the beginning

    // multitasking OS should be shut down here, but unfortunately
    // the two highest selling commercial OS dont support POSIX.1,
    // this means it cannot be done in an ANSI way. It would proably
    // break on anything other than gcc/egcs anyway.
    // forbid();

    signal(SIGINT, brkfunc);

    do {
       _count++;
       isroot = sqrt(_count);
    } while (_count != 100000000);  /* yes 8 zeros */

    // permit();
    // multitasking should now be switched back on.

    end = clock(); // the end

    total = (end - start); // time difference

    printf("Doing %d integer square root ops, took %f secs\n", _count, (total / CLOCKS_PER_SEC));

    exit(0);
}

// if processing broken
void brkfunc(int signo) {

    printf("signo %d occured\n", signo);

    end = clock(); // the end

    total = (end - start); // time difference

    printf("Doing %d integer square root ops, took %f secs\n", _count, (total / CLOCKS_PER_SEC));

    exit(1);
}

      
FIN