1. Memory based in the CPU obviously operates much faster (same speed as the CPU) that that of standard RAM. So number crunching done in there would be faster and L1 and L2 are usually assigned to each core, but L3 is normally shared between all cores. This is a compromise allowing greater flexibility, but the more L1, L2 and L3 memory the better.
2. I believe it can, but that would depend on the application being able to take advantage of multi-core CPUs, many aren't able to do this.
Processes don't use GHz theu use CPU cycles % of the CPUs processing power, so can't exceed that figure. If an application is using a lot of the a cores CPU % then if the application is so designed it could split/share the multi threaded processes over more than one core.