Summary of timing experiments with RHexLib (all averages were taken over 60000 cycles ~ 1 minute)

Conclusions:

- There is significant variability in the computation time. All modules, even the simplest ones, tend to have maximum execution times higher than 70us. My belief is that this is due to poor cache performance on our current CPU board. The execution times go up as the executable size increases.

- The database access functions have a large number of internal copy operations, which most likely result in significant CPU cycles wasted. There are also a few places where there may be memory leaks, which could cause unwanted mallocs happening, leading to wasted CPU. These problems can in principle be fixed with some reasonable effort.

- The basic motion control functionality, including rHio reads and writes, encoder reads and PD control computations take an average of 302us. With only the basic modules linked and activated, the decomposition is as follows:

  • encoderreader: 59.5us (for all 6 axes)
  • positioncontrol: 73.2us (for all 6 axes)
  • rhioreader: 67.1us
  • rhiowriter: 45.1us
  • timingtester: 7.4us

I am assuming the reason why these do not add up to 302us is because of the profiling and module manager overhead. Also, note that these includes a few very trivial optimizations I have made along the way, which improved the final figure by about 50us compared to our latest SWRI code.

- When the communications, database, logserver and the gyro, together with their threads are activated, the average loop time goes up to 351us, even though the updates of these modules do not do anything significant. They only start their corresponding threads. My impression is that this is again due to the cache performance and possibly bus related issues due to the serial gyro access. The timing decomposition becomes as follows:

  • encoderreader: 62.7us (for all 6 axes)
  • positioncontrol: 85.3us (for all 6 axes)
  • rhioreader: 75.5us
  • rhiowriter: 48.6us
  • logserver: 2.7us
  • dbmodule: 2.7us
  • timingtester: 8.4us

Note that the resolution of timing is around 2us, so the small values are not really reliable. Changing the priority of the gyro thread does not effect the results above, which are obtained with the gyro thread priority (27) above the main thread (25)

- The rest of the components are a lot harder to profile because they are network related and are less deterministic. The next step was to add the state proxy as well as all the associated state estimator modules. In particular, the heatsink, battery SOC monitor and the average power were activated. The most important conclusion is that, the stateproxy::update() function takes about 100us, which I suspect is due to inefficiencies and/or bugs in the underlying database access functions. This needs to be sorted out before the timing data is reliable. The summary in this case is as follows:

  • stateproxy: 101.6us (period is 10us)
  • heatsinkmodule: 4.9us
  • battermonitor: 22.6us (includes some complex math functions)
  • avgpower: 8.6us
  • encoderreader: 59.3us (for all 6 axes)
  • positioncontrol: 79.6us (for all 6 axes)
  • rhioreader: 82.6us
  • rhiowriter: 47.0us
  • logserver: 3.2us
  • dbmodule: 2.9us
  • timingtester: 9.4us

The average loop time is 403us, but keep in mind that the state proxy has a period of 10, so it only contributes to the average with 0.1 weight. So, a loop time of 490us would be a more accurate picture.

- The next experiment involves the basic walking machine with inclination compensation (i.e. what gets run from WalkMode). The execution time for the WalkMachine is hard to summarize due to irregular recomputations of thr Beuhler clock profile at transition points. However, the following list summarizes the timings when new turning commands are issued to RHexWalker at every cycle.

  • stateproxy: 91.5us (period is 10us)
  • heatsinkmodule: 5.1us
  • battermonitor: 22.2us (includes some complex math functions)
  • avgpower: 8.4us
  • rhexwalker: 3.4us
  • walkmachine: 30.3us (maxtime:256us. cache?.)
  • walkpitchfilter: 32.9us (maxtime:116us)
  • slopeestimator: 3.2us (maxtime:46us. almost empty update. weird!)
  • encoderreader: 59.7us (for all 6 axes)
  • positioncontrol: 68.0us (for all 6 axes)
  • rhioreader: 82.2us (maxtime:347us cache or sync prob?)
  • rhiowriter: 46.5us
  • logserver: 3.2us
  • dbmodule: 3.2us
  • timingtester: 7.8us

The avg loop time becomes 445us, which is reasonable. However, the maximum loop time is at 815us, which is a little distressing. Note that the max time is not the sum of max times for all modules because they may occur at different cycles. In fact, the sum of max times comes up to be 2.3s, which is much larger. Final note, the reason the position controller timing is smaller is probably because motor drives were disabled and the legs were not moving, reducing computation on the PD loop.

- Testing modes is significantly harder because of the nondeterminism of the user interface. As a consequence, I have done a much more limited version, where the mode supervisor is added and activated. Calibration was then performed manually, followed by a forced transition to OdometryMode, which supposedly has a lot of state updates. At this point, the profiling was started and run for 60000 cycles. A number of additional modules are necessarily activated. The following list gives a summary.

  • modeSupervisor: 13.2us
  • stateproxy: 100.7us (period is 10us)
  • heatsinkmodule: 6.0us
  • battermonitor: 24.6us (includes some complex math functions)
  • avgpower: 8.9us
  • odometrymodule: 5.3us
  • localizermodule: 90.9us
  • rhexstatemodule: 5.7us
  • rhexrunner: 2.4us
  • walkmachine: 29.8us (maxtime:254us. cache?.)
  • rotationint: 65.7us (maxtime:160us)
  • encoderreader: 61.2us (for all 6 axes)
  • positioncontrol: 69.6us (for all 6 axes)
  • rhioreader: 75.4us
  • rhiowriter: 50.4us
  • logserver: 3.0us
  • dbmodule: 2.9us
  • timingtester: 6.0us

The average loop timing was 636us, but cycles were skipped a number of times, with a maximum loop time of 995us (which forces cycle skip to let the comm thread run). It looks to me like, overall, the code is fairly well optimized (except the database copies), but the memory architecture on the Geode is killing us.

-- UlucSaranli? - 07 Aug 2003

 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback