Reducing leakage power of embedded systems is essential as it constitutes an increasing fraction of the total power consumption in modern embedded processors. Power gating of functional units has been proved to be an effective technique to reduce leakage, and its various implementations can be categorized into compiler-based and hardware-based approaches. Hardware-only designs rely on specific circuits and microarchitectural designs to monitor instruction executions to determine when to powergate functional units, whereas compiler-based methods attempt to exploit global information of programs and let compilers embed special instructions to turn on and off functional units. This paper compares the effiencies of hardware and software techniques for power gating of functional units. Experimental results of the DSPstone benchmarks on Wattch show that the hardware-only approach is generally effective in reducing leakage, while the compiler-based approach occasionally performs better as the global knowledge of programs gathered by compilers would avoid incurring excessive powergating on/off activities. This outcome suggests a better scheme: a hardware-based technique is deployed as the default power gating mechanism, and a compiler would intervene only when its analysis indicates the default method is inferior for certain application programs.