No, Apple didn’t pay for the iPhone to benchmark better

Geekbench 6 is the latest benchmark in the series.

AppleInsider may earn an affiliate commission on purchases made through links on our site.

Accusations that Apple paid benchmark developers so that its iPhone can beat Samsung’s latest models are unfounded, and based on tribalism. Here’s why.

Social media complaints about the Samsung S23 Ultra performing worse than the iPhone after submitting Geekbench 6 has led to accusations of bias in favor of Apple. In fact, it’s just a problem of how benchmarks are viewed as the full value of a smartphone.

Since the introduction of Geekbench 6 in February, Samsung and Android fans have taken to Twitter and other public forums to complain about its results. Specifically, the beef on the internet is about how the Samsung Galaxy S23 Ultra fares against the iPhone 14 Pro lineup.

Accumulate counts before PhoneArena It identifies the complaints as basically how the results have drifted further with the introduction of the new Geekbench 6.

Under Geekbench 5, the Galaxy S23 Ultra will score around 1,600 for single core and 5,000 for multi-core, in the ballpark of the iPhone 14 Pro’s 1,900 and 5,500 points.

Ballpark numbers for results under Geekbench 5

Ballpark numbers for results under Geekbench 5

When tested using Geekbench 6, the Galaxy S23 Ultra will manage around 1,900 for the single-core test and 5,100 for the multi-core test. Meanwhile, the iPhone 14 Pro manages 2,500 for a single-core score and 6,500 for multi-core.

Notice the biggest difference in scores for Geekbench 6 ballpark results.

Notice the biggest difference in scores for Geekbench 6 ballpark results.

In fact, iPhone is 18% better in single-core and 10% better in multi-core than Samsung under Geekbench 5. Moving to Geekbench 6, the lead increased to 31% and 18%, respectively.

Samsung Galaxy S23 Ultra

Samsung Galaxy S23 Ultra

Social media denizens claim that this change in score must mean there is some kind of bias at play towards Apple. A reasonably close race in Geekbench 5 should certainly be equally close in Geekbench 6, the tweet says.

So, for these people, there is a certain level of pro-Apple bias. As is almost always the case, someone has already accused Apple of pushing GeekBench to bump up the results.

The game has changed

The first thing to consider is what goes into the standard itself. The synthetic benchmark performs a variety of tests, with the results aggregated into the individual final score.

These tests do not change throughout the lifetime of the benchmark generation. So, there is a level of consistency in testing between devices over a long period of time.

However, benchmarking tools need to be updated every now and then to match trends in hardware specifications and the kind of tasks a user can expect to do with their device.

The release of Geekbench 6 has done just that, making tweaks to existing tests and introducing new ones to best match what’s possible with a modern device. This includes new tests that focus on machine learning and augmented reality, which are big growth areas in computing.

“These tests are meticulously designed to ensure that results are representative of real-world use cases and workloads,” the Geekbench 6 description reads.

Machine learning is an area of ​​growth and potential to innovate

Machine learning is a growth field and is capable of creating “art,” so shifting the focus of the benchmark in this direction makes sense.

Think of it as a race between a runner and someone who is into parkour. The race might normally be like the 100-meter dash, which the runner is used to, but changing to something like the Tough Mudder obstacle course will likely end in a different outcome.

If you take nothing else from this piece, here’s the main point. If you change what is being tested, of course the results will be different.

It’s no different than if you were to compare your Geekbench 5 scores to those of other benchmark suites. Because there are different tests and alternate weighting of each in the final scores, you will find performance differences between devices to also differ between benchmarking tools.

If you think that Geekbench 6 is a completely different benchmarking tool than Geekbench 5, the differences in performance can make more sense to understand.

Yes, changing the weighting to make some areas more important to the score than others can cause the scores to change. But as long as it does not affect the ability to directly compare the score with others of the same generation of the application, this is not really a problem.

The need for trust

Benchmarking tools have a unique position, in that they are an entity that relies entirely on trusting users to be honest in the results they provide. The developers say that a set of known tests will be executed by the tool, and they will be performed a certain way, every time.

In general, metrics thrive on this reliability, as there is no company-specific bias at play. The results obtained are considered to be legitimate, and there is no error whatsoever.

Hypothetically speaking, if a benchmark developer was offered a huge bag of money to give results in favor of a manufacturer, this would be achievable. Except that the difference in score compared to the rest of the benchmarking industry is likely to suddenly and suddenly cause users to question the results the test brings.

Such a situation will break confidence in the results of the benchmarking tool because other results will be called into question.

Benchmarking developers therefore need to minimize any bias in test results, to be as accurate as possible, to maintain the credibility and trust that has been built.

Wait a hot moment or two

That credibility takes time to form, which can be problematic for metrics at first.

After a year of operation, tools like Geekbench can build up a set of results that users can refer to. With Geekbench 5 being heavily used by the media and enthusiasts, this collection is very important.

However, as we discussed, Geekbench 6 is not Geekbench 5, and it has only been out for a few weeks. It has not configured this results catalog to be able to adequately enable comparisons between a wide range of devices yet.

Over time, Geekbench 6 will catch up with the size of Geekbench 5's results catalog.

Over time, Geekbench 6 will catch up with the size of Geekbench 5’s results catalog.

Unfortunately, this means that people will try to compare the results of Geekbench 6 vs. Geekbench 5 until this catalog is fleshed out enough to be significant.

This is a problem that won’t be fixed right away, as it relies on results collected from millions of tests with the tool. It could take months for that to come to light, certainly not the two weeks that have passed since the release of Geekbench 6 itself.

Wait a few months, then take a look at the benchmarks. If Geekbench 6 is to be trusted, you’ll see the same kind of trend across all the devices tested by it.

History warning

Since benchmarks are a major way to compare one device to another, this may lead some to believe that it is the ultimate arbiter of what is the best smartphone you can buy.

As we just pointed out, benchmarking should only be a small part of your overall purchasing decision, not the entirety of it. Prioritizing standards as the “most important thing” has indeed led to strange situations in the past.

Take, for example, the reports from March 2022, when Samsung was caught tweaking how its devices work specifically with standards in mind.

Samsung's Galaxy S21 lineup has been caught up in a stifling scandal over standards.

Samsung’s Galaxy S21 lineup has been caught up in a stifling scandal over standards.

To keep smartphones running great and without issues, smartphone manufacturers can choose to limit the processing power of their devices. This makes some sense, as a smartphone that gets too hot is not desirable to consumers, and it can’t drain the battery.

At the time, it was discovered that Samsung had subjected its long list of apps to “performance limits,” that is, restricted them for such a reason. Except that benchmark apps like Geekbench 5 and Antutu weren’t throttled at all, and worked without limitations.

To the end user, this could mean that the device will perform well, but in actual use it will end up performing at a much lower level of performance than would be expected for many normal applications.

This effectively short-circuits the end user by making them believe the device is running faster than it actually is, at least under benchmarks.

Standards are not the real world

The whole point of benchmarking is that it gives you a unified way to compare one device to another, and generally see the difference in performance. The key is standardization, and like many areas of life, that won’t necessarily lead to a true reflection of something’s capabilities.

This niche even comes down to the exact same benchmark, since while Geekbench is more general, there are others with specific audiences in mind.

For example, many players rely on in-game benchmarks such as those in Rise of the Tomb Raider. This makes sense as a benchmark, since as it is an actual game, it can better test the hardware performance elements keeping the player’s needs in mind.

Meanwhile, although Cinebench offers GPU-focused testing, it’s largely useful for those who work in 3D, since it caters more to that field than general 3D needs.

There are browser-based benchmarks as well, but while they are useful for those who work in Internet-centric fields, they won’t be as useful for those who work in 3D or are gamers.

Ideally, users need to choose performance measurement tools that meet their needs. Geekbench is a simple, generalized testing suite, but while it’s not the best for specific scenarios, its ease of use and general-purpose nature make it ideal for mass-market testing, as in publications.

However, no matter which criterion you use, you will not get a complete summary of your specific needs. You’ll still get a signal, but no certainty.

This runner is great at short-distance races, but he probably isn’t very good at paying taxes, or at figuring out where the eggs are at the supermarket. Knowing how they enter the race won’t help get your math done faster, but you’ll at least know they’re physically fit.

Likewise, a smartphone can do quite well in accomplishing specific tasks in a benchmark, but it’s still an approximation of what you want to do with the device. For example, you can prioritize the time it takes to perform a biometric unlock, or the camera’s image quality.

The benchmarking tool will only provide a general guide on how one smartphone compares to another under specific circumstances. He won’t tell you how compatible he is with your life.


Leave a Reply

Your email address will not be published. Required fields are marked *