If clock speed is what matters, than why is the A7 faster than processors with more than double the clock speed? And also, the A7 is only .2 ghz faster than that A6 and it benchmarks, and in a lot of cases performs better than the A6. Care you explain that smarty?
You're confusing two ideas. There's obviously more to a processor than purely clock speed. I was explaining from within the context of your analogy. If you don't understand the difference between clock speed and bits, there's no point getting into more complex systems. I'm not attempting to suggest clock cycle is the only measure of a processor's efficiency. I'm just not overstating the value of a 64-bit processor compared to its 32-bit counterpart.
If everything else remains equal, doubling a processor's clock speed doubles its ability to compute data. If you take a 1GHz processor and the only change is to increase it to a 2GHz processor, tasks take half the time.
If everything else remains equal, doubling the bits from 32 to 64 increases the processing speed of some calculations while providing absolutely no benefit to others. Take the idea of a Multiply/Accumulate calculation. This is something like y = a(x+z). If the processor is built to include an opcode just for the MAC, it's possible to make this a single cycle calculation. If not, it's going to take however long the processor requires to process both the add and the multiply. It doesn't matter if the processor is 32 or 64 bits. Why? Until the x+z is added, the result can't be multiplied. Sure, you can do more addition functions at a time by utilizing the larger address space. But, you wouldn't in this case. Practically, it's not possible to pipeline every instruction you give in such a way as to gain twice the processing power from the extended size as you suggest.
The three most common ways to leverage a 64-bit device to gain processing power are: more space for streamlined op-codes, more simultaneous calculations, more RAM addressability.
We've already addressed simultaneous calculations to some extent. This is very specific to the application being used to benchmark. In some, it may appear to be a huge difference. In others, you're unlikely to see a change. Different applications have different degrees of pipelining used in their design. As an example of pipelining, let's say you have a collection of savings account information and it's time to compute interest. Every account needs to be multiplied by 1.03. If the multiply takes two cycles, you can start computing on the first account during the first cycle. After the first cycle's work is finished, you can finish it in the second cycle while doing the first cycle's work on the second account in the second cycle. This organization means you can do 1,000,000 accounts in 1,000,001 cycles instead of 2,000,000. Depending on the registers being used to do this, it's possible to leverage the 64-bit processor to decrease the time. But, that's not keeping all things the same. That's making additional hardware decisions. It's likely these decisions would be included. But, they'd STILL only see a benefit in these types of cases. When you're not doing a large number of similar calculations immediately following each other, you're not going to see the benefit.
In terms of op-codes, let's look at the most basic architecture. We'll use an 8-bit architecture and we can expand the idea from there. With each address being a byte in size, we have 8 possible positions to include all of the information the processor needs to handle the instruction. Let's use 3 of those bits for op-codes and the remaining 5 for register addressing. This is an arbitrary decision. You can choose however you'd like. With only 8-bits, we're severely restricted on space. 3-bits for op-codes mean we can handle 8 different instructions. That's not a very advanced system. With 5-bits for addressing, we'd likely split the space into 3 sections: 1 bit to determine which register to save the result, 2-bits for each register to use in the calculation. With only 8 possible commands, we likely don't have room for add, subtract, multiply, and divide. That would use half of our available commands. If we also include save/load, we're at 6. This means we can't include and, or, and not. We could get away with just using AND and NOT, but that's it. We also only have one type of save/load, so it's not very efficient. It's easy to see how the 8-bit space restricts us. If we expand to 16-bits, we can extend the op-code space to 4 bits, double our op-codes, and increase the efficiency of our program by opening up a great deal more possibilities for placement of data. We can further extend this to 32-bits. At this point, we see a lot of architectures using the idea of byte addressability. At this point, we consider the value of having EVERY instruction maintaining the same length. If I want to OR two locations, I need the op-code space, the two locations, and a place to store the result. That could require more bits to explain than something like an unconditional branch. The branch just needs to say "go to this address no matter what." It's the op-code and an offset. This style allows you to pack more instructions into the same space. At this point, we already have a large number of op-codes available. While the 64-bit processor can easily spare a single bit to double the op-codes, this also requires building streamlined operations for each. That's a HUGE investment in research and development for negligible performance improvement. It's not practical at this degree.
Most users understand that more RAM offers faster potential processing. The reason it offers quicker processing relates to the time it takes to get information. Let's use the analogy of cooking preparation. You're going to make beef stew. You'll need to cut some beef, carrots, potatoes, and celery. You'll also need to mix spices and other ingredients into the pot. It takes less time to cut the carrots if they're already on the table in front of you than if you have to walk across the kitchen to get them out of the fridge. RAM works a lot like this. It keeps the data closer to the processor. As a result, it speeds up computation. More bits means you can have more ingredients on the table, but only if you have enough table space to accommodate the added ingredients. Going from 32-bit to 64-bit without increasing your RAM completely ignores this benefit. You can have more ingredients, but your table is the same size.
Many processors use these ideas to specialize. A processor used primarily for Digital Signal Processing will want to include measures to make the FFT faster to compute. It makes sense to set aside registers, create butterfly architecture, and use op-codes to reduce the time it takes to compute each butterfly. If the end user isn't likely to be building these types of applications, they may not use the FFT as often. As such, it doesn't make sense to devote this much attention to the calculation. We could benchmark two processors side by side and use the FFT as our application and find one processor to be far superior to the other. But, that doesn't mean it processes that much faster than the other. There are countless decisions like this one that go into designing a processor. If I were to make the B1 processor for DSP and skip the FFT streamlining, I could make a HUGE improvement to the B2 by adding this streamlining. This is true even if I don't change the bits, clock speed, or anything else. You can't simply ask "why does this processor have a faster clock rate and perform better?" The answer isn't that simple.
It's adorable you're STILL acting condescending when it's beyond apparent you have absolutely NO clue what the discussion is about. Do you have any more questions that you'd like to ask with a condescending tone while simultaneously highlighting your ignorance in this topic?