SMP Multi-thread code basics

Sparrowhawke3DSparrowhawke3D Posts: 102

I want to learn to implement code using SMP threads but I'm having trouble understanding how to achieve this with the SDK functions - especially without some complete sample code. Like most of the SDK it's a baffling puzzle that is only simple to understand if you already know the answer.

Any references I have found and read about multi-thread code have their API specific calls and examples are usually in an application context with a main thread. The Carrara SDK functions which I can find are different and almost seem incomplete.

I'm aware that it is a complex and more dangerous area of programming. Perhaps somebody can fill in the gaps for me or give me a kick in the right direction...

1. Creating and launching a thread is clear enough but without a 'WaitForThread' or 'WaitForMultipleObjects' how do we know when its ::Work() is done ? Must all the synchronizing be achieved through IShSemaphore and if so exactly how ?

2. How are large amounts of data in TMCArray(s) accessed and processed by a thread in ::Work() ? We can't use globals and obviously we don't want to copy them. Does all the large data have to use the LocalStorage ? How are TMCSMPArray and TMCSMPArrayRequest used ?

To present an example: how can I move this section of code, in the context of a Deformer call, out into a thread ?

class MYTHREAD: public TBasicSmpThread
{
public:
virtual MCCOMErr MCCOMAPI Work();
};


MCCOMErr MyDeformer::DeformFacetMesh(real lod,FacetMesh* MeshIn,FacetMesh** MeshOut)
{
...
TMCArray OffsetVertices;
TVector3 vOffset;
uint32 uVertex,uTotalVertices;
vOffset.x=0.0;
vOffset.y=1.0;
vOffset.z=2.0;
uTotalVertices=MeshIn->VerticesNbr();
OffsetVertices.SetElemCount(uTotalVertices);
for(uVertex=0;uVertex OffsetVertices[uVertex]=MeshIn->fVertices[uVertex]+vOffset;
...
// MYTHREAD MyThread;
int32 iThreadID;
// iThreadID=gShellSMPUtilities->LaunchSMPThread(&MyThread;,0,NULL,kHighPriority,&bAbortThread;);
// wait for thread to finish...
...
// rebuild the facet offset mesh
}

MyThread::Work()
{

// ! MOVE THE ABOVE CODE TO HERE !

// signal that the work is done...
return MC_S_OK;
}

If that isn't too hard then the next example could be how to launch more than one thread for each iteration of the loop and get a few running simultaneously and wait for them to finish. Another example to clear things up might be to have 3 different threads with each to process x,y and z separately.

Comments

  • (...a few years later...)

    I had the need for plugin speed and figured it would be worth coming back (a bit wiser) and trying to figure this out through trial and error, comparing the Carrara SDK to how SMP is supposed to be implemented and I got it to work.  I was reluctant to code with OS specific multi-threading because I didn't know how that would interact with the rest of the SDK classes but I would have tried that next.

    I'll revise what I did and share the methods.  If anyone knows better please correct me.  We don't need multi-threading for some plugins, shaders for example, because each tile will be used by any available CPU.

     

    In a deformer plugin I have a loop where I am going to check for all of the deformed vertices if any of them are going to intersect with that same facet mesh.  The position of the deformed vertex has already been stored in outMesh before I get to the loop.  By cloning the input mesh again I will check the original facets and vertices.

    MCCOMErr MyPlugin::DeformFacetMesh(real lod,FacetMesh* in, FacetMesh** outMesh)...TMCCountedPtr<FacetMesh> OutputMesh;in->Clone(&outMesh);OutputMesh=*(outMesh);  ...  // first I deform the vertices  ...  TMCCountedPtr<FacetMesh> InputMesh;  in->Clone(&InputMesh);  uint32 uVertexCount,uVertex;  uint32 uFacetCount,uFacet;  TVector3 v3VertexIn,v3VertexOut;  TVector3 pIntersection;  Triangle triFacet;  // get the counters for the loops  uVertexCount=InputMesh->VerticesNbr();  uFacetCount=InputMesh->FacetsNbr();  // process all of the vertices in the mesh  for(uVertex=0;uVertex<uVertexCount;uVertex++)	{	// check this vertex move against all of the facets	// the line segment is from the proposed moved vertex above	// and the input facet mesh vertex of the same index	// the collision test is against every facet of the input mesh	v3VertexIn=InputMesh->fVertices[uVertex];	v3VertexOut=outMesh->fVertices[uVertex];	for(uFacet=0;uFacet<uFacetCount;uFacet++)	  {	  // get the 3 vertex indices of the facet triangle	  triFacet=InputMesh->fFacets[uFacet];	  // test for a hit using my own function      // input the 3 points of the facet and two points of the line segment	  if( CheckIfLineSegmentIntersectsFacet(InputMesh->fVertices[triFacet[0]],InputMesh->fVertices[triFacet[1]],			InputMesh->fVertices[triFacet[3]],v3VertexIn,v3VertexOut,pIntersection) )		{		// if the vertex has passed through any facet prevent that moving and quit testing		outMesh->fVertices[uVertex]=InputMesh->fVertices[uVertex];		break;		}	  } // end for uFacet<uFacetCount	} // end for uVertex<uVertexCount

    So now to use all the available CPU power I can put the inner loop into threads.  While the first iteration is being tested the second one could proceed in another thread.  The thread will need to be passed pointers to any of the data in the loop once.  The counter will need to be changed when the thread has finished processing one iteration of the loop.  

    #include "BasicSmpThread.h"#include "MCSMPArray.h"#include "IShSMP.h"class VERTEXTHREAD: public TBasicSmpThread  {    public :	static void Create(VERTEXTHREAD** out)    {    TMCCountedCreateHelper<VERTEXTHREAD>helper(out);    helper=new VERTEXTHREAD;    }  virtual MCCOMErr MCCOMAPI	Work();    // a semaphore is needed for thread management  TMCCountedPtr<IShSemaphore> pSemaphore;	  // flags are needed to control the thread	  boolean bRunning; // true when the thread is running  boolean bReady; // false when thread has no new data  // the thread will need to be passed pointers to the facet meshes  // these will not change during the loop   TMCCountedPtr<FacetMesh> InputMesh;  TMCCountedPtr<FacetMesh> OutputMesh;  uint32 uVertexCount,uFacetCount;  // this value from the unthreaded loop will change constantly  uint32 uVertex;	  };

    Before the loop begins the threads will be created and launched.  The number of threads will vary on any system so they need to be in an array.  There is no point in having any more threads than there are CPUs.  A semaphore will be used to know when any thread is free by signalling it.  In the initial state all of the threads will be free but not ready to process any data so the flags are used to control that.  An abort flag is needed and must be false for the thread to run.  Even once the thread has finished execution it will still remain in existance and we will also want to check for a user abort in the loop.

      uint32 uCPUcount,uThread;  uCPUcount=gShellSMPUtilities->GetNumberOfCPUs();  // all the threads will be managed by one semaphore here in the main thread  TMCCountedPtr<IShSemaphore> pSemaphore;  // create a semaphore set to have the maximum CPUs and initially all are available  gShellSMPUtilities->CreateSemaphore(&pSemaphore,uCPUcount,uCPUcount);  // an array is needed for the unknown number of threads  TMCCountedPtrArray<VERTEXTHREAD> Threads;  TMCCountedPtr<VERTEXTHREAD> pThread;  boolean bAbortAllThreads=false;  // create and initialize each thread  for(uThread=0;uThread<uCPUcount;uThread++)	{		VERTEXTHREAD::Create(&pThread);	pThread->InputMesh=InputMesh;	pThread->OutputMesh=OutputMesh;    pThread->uVertexCount=uVertexCount;	pThread->uFacetCount=uFacetCount;	// each thread needs to signal the semaphore	pThread->pSemaphore=pSemaphore;			// the thread will be set to run but not ready for data	pThread->bRunning=true;	pThread->bReady=false;	Threads.AddElem(pThread);	// launch the thread, all can share the same abort flag	gShellSMPUtilities->LaunchSMPThread(pThread,0,NULL,kNormalPriority,&bAbortAllThreads);	}

    Now the code from the loop needs to go into the thread's ::Work() function.  As a function/method in the class when the thread is launched it will begin processing and return but we don't want that.  I'm putting the thread into a while loop so it will keep running as long as it's needed but it won't process any data until it is ready.  In the initial state the ready flag is set to false so the thread will not enter that inner loop.  As soon as the work is done the thread can reset the ready flag and signal the semaphore.  When multiple threads are running and changing the same data the critical section class is used to prevent race conditions.  In this loop the original facet mesh is not changing and each iteration changes only one vertex.  In the loop I haven't show that deformed the vertices I used a critical section when a triangle's vertices where being read, in case they where changed part the way through, and when any vertex was being written.

    MCCOMErr VERTEXTHREAD::Work()  {    uint32 uFacet;  TVector3 v3VertexIn,v3VertexOut;  TVector3 pIntersection;  Triangle triFacet;  TMCCriticalSection *theCS;  theCS=NewCS();  while(bRunning)	{	if(bReady)	  {	  v3VertexIn=InputMesh->fVertices[uVertex];	  v3VertexOut=OutputMesh->fVertices[uVertex];	  for(uFacet=0;uFacet<uFacetCount;uFacet++)		{						triFacet=InputMesh->fFacets[uFacet];						if( CheckIfLineSegmentIntersectsFacet(InputMesh->fVertices[triFacet[0]],InputMesh->fVertices[triFacet[1]],		      InputMesh->fVertices[triFacet[3]],v3VertexIn,v3VertexOut,pIntersection) )		  {							  OutputMesh->fVertices[uVertex]=InputMesh->fVertices[uVertex];		  break;		  }      } // end for uFacet<uFacetCount				  {			   	  CWhileInCS cs(theCS);	      // tell the main thread this thread is ready for another vertex			  bReady=false;				  }	  // signal the semaphore to let the main thread know a thread is free	  pSemaphore->Signal();	  } // end when ready	} // end while running  DeleteCS(theCS);  return MC_S_OK;  }

    In the loop the semaphore is used to halt at that point and wait until a thread is free.  The semaphore doesn't know which actual thread is free so by checking the ready flag in each of them that can be determined.  A free thread can then be given the next iteration counter of the loop and any data.

    // the semaphore wants a boolean which must be true to waitboolean bWait=true;for(uVertex=0;uVertex<uVertexCount;uVertex++)  {  if(gShellUtilities->CheckForUserCancel()) break;    pSemaphore->WaitForSignal(bWait);  {  CWhileInCS cs(theCS);  // find out which thread is free  for(uThread=0;uThread<uCPUcount;uThread++)	{	pThread=Threads[uThread];	if(!pThread->bReady) break;	}  // check that the thread is okay  if( (uThread<uCPUcount) && (pThread) )	{	pThread->uVertex=uVertex;    // set the ready flag and the already running thread will process it	pThread->bReady=true;							}  } // end critical section  } // end for uVertex<uVertexCount

    So that takes the code from within the for loop and puts it into threads to run on all the available CPUs.  When the next iteration comes around it waits for a free thread to be available then gives it the next vertex in the loop to process.  I'm not totally clear about what really needs to go into a critical section yet.  I know that using too many will slow things down so I used a few as a precaution when setting those thread control flags if more than one variable was changing.  Once a thread is flagged as being ready another might have signalled.

    When the main thread loop has finished the thread functions need to exit and be aborted or they will remain active.  For this I wait until all the threads were flagged as ready and then set their running flags to false to exit the outer while loop in the thread.  After that the flag to abort all threads was set true to tell the system to end them.

      uThread=0;  while(uThread<uCPUcount)	{	if(gShellUtilities->CheckForUserCancel()) break;  	pThread=Threads[uThread];    // wait until this thread has finished processing	if(!pThread->bReady) uThread++;			}  for(uThread=0;uThread<uCPUcount;uThread++)	{	pThread=Threads[uThread];	pThread->bRunning=false;	}  bAbortAllThreads=true;  DeleteCS(theCS);	

    When I run this and check the Windows Task Manager the CPU usage goes to 100%, the application thread count increases and the plugin works significantly faster.

    What I did have trouble with was when I tried to bring up a progress bar and increment it inside the loop two of them would appear: one would work but the upper one would hang and crash Carrara.  A progress bar is always better than a spinning circle when the processing takes more than a few seconds.

     

     

Sign In or Register to comment.