Olympus E-M1X Q&A: A closer look at the amazing tech underlying the new OM-D series flagship

by Dave Etchells

posted Thursday, January 24, 2019 at 2:00 AM EDT


As we've described elsewhere, Olympus held a limited press-only briefing and shooting experience with its flagship Olympus E-M1X back in early December, subject to a non-disclosure agreement. During that time, we were able to ask questions of senior technical staff, who attended from Olympus' Hachioji R&D headquarters, just outside Tokyo.

The discussion was fairly informal and unstructured, so rather than trying to wrestle it into an intelligible transcript, I've chosen to package it in summary form, with digests of our questions and Olympus' answers. (The first question is from a separate executive briefing, but I included it here because I felt it was an important question that many users will be asking.)

Why the E-M1X now, what about the long-awaited E-M5 III?

We along with a lot of our readers wondered why Olympus chose to release the E-M1X just now, vs updating the long-in-the-tooth E-M5 Mark II. What's up with that, and when might that product be updated?


Olympus's answer to this was that they wanted to fully flesh-out their E-M product series with a fully professional model, before they returned to updating the E-M5 model line. The E-M1 Mark II came close to that point, but they wanted to deliver the performance and ruggedness they knew they could with the E-M1X, to round out the top end of their line. Now that that's done, the E-M5 III will presumably be coming at some point, but they didn't reveal any specifics on its timing.

Lots more processor power(!) What do they use it for?

The E-M1X has twice the CPU power of the E-M1 Mark II; a total of eight CPU cores. We asked Olympus what the added processing power does, and whether there was also a similar increase in the front-end LSI that handles the lowest-level processing.


The eight CPU cores are general-purpose processors, twice as many as the E-M1 Mark II has, but the front-end LSI is the same. The increased processing power enables the handheld High-Res and Live ND modes, as well as the AI-based autofocus algorithms. (See below for further discussion; the engineers referred to this as "intelligent subject detection.") The processor chips are designed to perform standard image processing functions as found in other cameras, but they can also be used for executing deep learning algorithms.

I found it very interesting that the E-M1X doesn't have dedicated hardware for doing its AI-based subject recognition. Apparently, having eight powerful image-processing cores gives them enough horsepower to run deep learning processing at high enough frame rates to deliver real-time subject recognition.


Olympus explained that AI technology in general and their subject-detection algorithms in particular are evolving so rapidly that committing to a specific hardware architecture for them too early could hamper development. With general-purpose processors, even major algorithm changes are just a matter of loading new firmware.

As to the front-end LSI circuitry, that's integrated into the basic processor chip. (There are two chips, and each has four cores on it). It sounded, though, like the raw image data comes through the front-end circuitry of just one of the processor chips, with the other chip just providing the additional CPU power. It's likely that there wouldn't be any advantage in splitting the front-end processing between two chips, because at that low level, the overall throughput is limited by the sensor readout speed.


Two chips means two UHS-II memory channels

Each processor chip has circuitry on it to support a single UHS-II data stream, which is why only one of the two SD card slots on the E-M1 II was UHS-II capable. With two chips, though, the E-M1X can support two UHS-II cards. This will mean faster buffer clearing, although as I write this, we're waiting to see final firmware before we officially test the E-M1X's performance.

Same sensor, but better image processing

The E-M1X uses the same sensor as the E-M1 Mark II, but the E-M1X has better image processing, so high-ISO noise will be lower in JPEGs. (Raw files should look the same, though.) The improved processing apparently doesn't just mean lower noise levels, but also smoother gradations in skin tones, and improved color rendering.


It sounded like the extra processing power was part of the improved color rendering, but the engineers told me that it wasn't that, but just a matter of improved algorithms and feedback they'd received from professional photographers. (While it doesn't require the dual processors, though, I was told that it isn't possible to implement the E-M1X's color rendering changes on the E-M1 II.)

Most interesting was a comment that when you enable image-quality priority mode (aka low-ISO detail priority), "the camera can do the noise reduction process twice". This almost certainly doesn't mean just running the same noise reduction algorithms a second time, but more likely means that the camera is running a two-stage noise reduction system, with the second stage doing some different, higher-level processing in the second round.

Of particular relevance to real-world photographers is that the engineers said that the results of the two-stage noise reduction approach will be most visible from ISO 800 to 1600.


A heat pipe for cool running

Olympus's presentation for the E-M1X mentions an internal heat pipe. This was the first we'd ever heard of a heat pipe in a camera, so I asked the engineers about it. Is it a true heat pipe? Where dooes it move the heat from and to? And will the feature eventually make its way to lower-end cameras?

(For reference, a heat pipe is a heat-transfer device often used in CPU coolers in computers. See Wikipedia for a full description, but the short story is that they're hollow tubes with a special liquid and wick inside. The liquid evaporates at the hot end and condenses at the cool end, whereupon the condensed liquid wicks back to the hot end again. The evaporation and condensation means it can move a lot of heat quickly, much more than a solid block of metal could.)


The E-M1X does have a true heat pipe that draws heat from the depths of the camera around the processor chips, and transfers it to the top of the large body casting, where it can be easily radiated into the surrounding environment. We've yet to test this aspect much, but it should mean that the E-M1X will be able to operate and do things like record 4K video in higher-temperature environments without having to shut down due to over-temperature. (As well as to use its eight cores to run the AI-based Intelligent Subject Detection autofocus system.)

As to the question of whether heat pipe technology will appear in lower-end cameras, it's entirely a matter of whether it's needed or not. If the dual-chip eight-core processor system migrates to lower-cost bodies, we'll likely see heat pipes in them as well.

AI-based autofocus?!

Olympus was actually the first company I heard mention AI technology explicitly, when I visited their R&D headquarters in Hachioji, Japan back in early March of 2018, although other companies have made reference to it since then. They mentioned it when I asked them what the next frontier was for camera technology, and they brought it up in the context of autofocus.

As you'll read in our overview of the camera, the E-M1X can intelligently recognize three different types of subjects anywhere in the frame and then put the autofocus point directly over the critical area. The three subject types are motorsports (including both cars and motorcycles) plus trains and airplanes. This is an absolutely remarkable, unprecedented capability. Eye-detect autofocus is a simplistic version of this kind of subject recognition, but the tech that Olympus has deployed in the E-M1X goes way, way beyond that.

So I had a lot of questions :-)

Q: Is there special hardware for the deep learning/AI subject detection?

A: No, that's why we have eight CPU cores in this camera (actually mentioned above, but included here for the sake of having all the AI-related stuff in one place)

Q: How on earth did you do the training for the deep learning algorithms? That must have taken thousands of manually-tagged reference images!

A: It actually took tens of thousands of images for each subject type.


[! I figured it would have taken a lot of different images to "train" the AI algorithms on what the right focus point was for each category of subjects. It turns out it took tens of thousands of reference images, though. And note that each of the training images had to be manually processed by a human, to identify the subject and the correct focus point. For example, for the autosports images, someone had to go through all the training images to say "Here's the car/motorcycle's body, here's the rider's helmet," etc, etc. These aren't some images you can just pull out of a standard image-processing library, either; Olympus had to create the raw image database they were working from.

I do expect, though, that they could use any given image multiple times, by identifying the subject and then scaling it and moving it around in the frame. Still, this is an absolutely enormous amount of work, for each subject type. Of course, the engineers couldn't comment on the specifics of how they did this, or how they might have been able to re-use images with scaling, etc.]

How long has Olympus been working on AI-based autofocus?

For more than three years! :-0

Initially, the Olympus camera group partnered with the central R&D group (that's shared across multiple Olympus divisions) to determine what might be feasible, in terms of using deep learning technology to recognize subjects within images. After about six months of work, they determined that it would be possible, and established the basic algorithmic approach that they'd use.

That initial six months of analysis showed that it would be feasible to do what they wanted, but it then took an additional two and a half years or more until they had the final algorithms reduced to practice and implemented in the E-M1X prototypes.


Are these unique Olympus algorithms, or just a general deep learning approach?

The general approach uses somewhat standard "deep learning" techniques, but the particular algorithm is unique to Olympus. And the training data set was obviously very specificallly developed by Olympus.

[So-called "deep learning" is a very generic name for an approach to AI processing, but there's a very wide range of ways the neural networks can be set up and applied. A key component for Olympus was that the results had to be available in nearly real-time. (That is, almost instantly.) It wouldn't do any good if the algorithm and hardware could recognize a subject after several seconds of processing. To be useful for focus determination, a delay of even a substantial fraction of a second would be unworkable. The engineers wouldn't say just how quickly the subject-recognition algorithm can update, but in observing and using it in real-world situations, it was very quick indeed, and clearly up to the task of finding the subject fast enough to set the focus where it needed to be.]


Does the Intelligent Subject Recognition use any phase-detection data?

We wondered whether the AI-based subject recognition made any use of distance data obtained from the phase-detection pixels on the image sensor. (It seemed to us that it would make sense for it to use distance data to help tell what was the subject and what was the background.) As it turns out, though, the subject-recognition algorithms only look at the actual image data itself; they recognize the subjects just based on how they appear in the frame, with no input from the phase-detect AF system.

This surprised me a little, as I would have thought that using the distance information would make the subject recognition easier. Thinking about it, though, situations in which these AF modes would be the most useful are often characterized by very confusing depth data. (Think about motorsports in particular: There's frequently a myriad of objects in the frame, with varying distances. When faced with a cluttered foreground, it would be easy for a camera's AF system to consider the actual subjects to be background objects.) It ultimately comes down to what the subjects look like, an easy task for humans, but a very difficult one for computers.


Does the AI system look at all the pixels in the image, or just a subset?

It actually looks at all the pixels in the image! This is an enormous amount of processing, to scan and digest all the pixels in the image (perhaps 10x/second, although Olympus didn't specify the rate?), and determine what's the subject and what isn't.

I mean, this is just a crazy amount of image-crunching. In a past lifetime, I worked on image-processing algorithms, and can attest to what a flood of data 20 million pixels streaming past at 5-10 frames/second represents. Modern deep learning approaches are very different from what I had to work with back then, but still, this is an insane amount of processing.


And to be so general! Watching NASCAR cars scream around the track, it didn't matter what color or shape the cars were or how large or small they were in the frame, the camera had a very clear sense of what a car looked like and where all of them were in the frame. And it wasn't just cars; the same "motorsports" setting was equally able to recognize motorcycles and their riders, and put the AF point on the rider's helmet more times than not.

At the risk of sounding overly hyperbolic, this is just nuts. And it works very well, even surprisingly so.


Can the range of AI-recognized subjects be expanded with firmware updates?

Yes! This was definitely good news. I asked if the range of subjects the E-M1X's intelligent subject recognition could process could be expanded via firmware updates, and the answer was unequivocally yes. And that Olympus is already planning to add additional subject types going forward, via that mechanism.

My immediate followup question was how much space there was in the camera's firmware storage to hold the data clouds for additional subject types, and the answer was deliberately vague. The engineers declined to say either how many megabytes each definition took up or how much free space remained in the E-M1X's EPROMs, the chips in which the firmware is stored.


My impression was that the data clouds associated with subject descriptions are quite large, perhaps on the order of a gigabyte or so. (Although it's important to note that they didn't specifically confirm that number.) They did say that the data was very compressed, compared to what its natural, uncompressed size would be.

What else might AI be used for?

I also asked what other functions might be able to performed by AI, and the engineers said perhaps image editing. I didn't get the sense that this was a major direction they're pursuing, though, but felt it was more a response to my asking specifically if it could be used for other things besides autofocus.

7.5 stops of IS?! How??

Back when Olympus announced their 300mm f/4 super-zoom, which has a focal length equivalent to 600mm on a full frame camera, they touted the fact that when paired with the E-M1 II's in-body image stabilization, the combined system could deliver 6.5 stops of IS improvement relative to unaided hand-holding. In an interview at the time, they told me that the limit that kept them from going beyond 6.5 stops of stabilization was the rotation of the Earth.

So how did they end up with a spec of 7.5 stops for the E-M1X with appropriate lenses or seven stops for the body on its own? Are they perhaps using a combination of GPS and compass data to compensate for the Earth's rotation?


They answered that they weren't using GPS and compass data, but rather that they'd improved the underlying performance to the extent that the overall result was a roughly one-stop higher reduction in camera shake. Apparently, the 6.5 stop "limit" was at least somewhat influenced by the Earth's rotation, but improvements in gyro (rotation sensor) technology meant that the overall system performance was improved by a factor of two.

A co-developed gyro sensor

The key to getting to this new level of performance was that Olympus and the gyro manufacturer cooperated to jointly develop the new level of technology. It was apparently an iterative process that initially began with a set of specs from Olympus stating what they needed. From there, the gyro maker would produce some more advanced units, Olympus would test them in camera/lens systems and report the results back. The gyro company would then refine the underlying tech and send another batch to Olympus for testing, and the cycle would repeat. Eventually they ended up with sensors capable of the remarkable 7.5 stops of IS improvement seen in the E-M1X.


I'm a little unclear at this point whether the technology developed is exclusive to Olympus cameras. At the very least, it was a cooperative venture, so Olympus is currently the only camera manufacturer that has it available in a production-level camera, but I'm not sure whether the sensor vendor is able to sell these advanced devices to other customers or not. At the very least, given the normal camera development cycle, I wouldn't expect to see similar capabilities in competing cameras for at least 18-24 months, even if the vendor was able to sell the new sensors starting immediately.

(BTW, Olympus did reveal that the gyro manufacturer in question is Epson.)

The details of how Handheld High-Res mode works

I was also intrigued by the new handheld high-resolution (pixel-shift) exposure mode worked. At first blush, you might think that the camera would need to be held so still during the multiple exposures that even the most incredible IS system imaginable wouldn't have been up to the task. (I mean, if it required sub-pixel stability and sensor motion control over the space of however long it took to capture the needed number of exposures, how could you ever do that?)

The solution isn't to hold the image stationary on the camera's sensor for that long, but rather to take advantage of the natural camera movement resulting from hand-holding. In handheld high-res mode, the E-M1X lets the image move on the focal plane between exposures, then uses its ample processor horsepower to micro-align the 16 separate images with each other. (!) The camera turns on the IS system during each of the 16 individual exposures (so each individual image will be sharp), but turns it off in between them.

It then closely examines all 16 images, and mathematically shifts each of them as needed to render all 16 in perfect alignment. That is, shifted so that the R, G, and B sub-pixels align with each other as needed to create a 50-megapixel final super-resolution image. [And yes, Ricoh does something similar with its similar Handheld Pixel Shift Resolution function in the Pentax K-1 II DSLR, but that uses only four separate exposures to create each image, rather than 16 of them as in the E-M1X.]

What's most amazing about all of this is that it seems to actually work! :-0 Samples we shot in this mode are quite surprisingly sharp and crisp, to a degree we never would have imagined would be possible. (Oh, and note that there's still a tripod-based high-resolution mode that produces 80-megapixel final images, but only with the camera locked down solidly on a tripod.)

Does the "Virtual ND Filter" mode work the same way?

Actually, no.

In the case of the Virtual ND Filter mode, the camera leaves the IS system turned on throughout the process, snapping multiple shorter exposures and then combining them to produce the final image with moving elements appearing blurred. Rather than micro-aligning the individual images, Virtual ND Filter mode relies on the E-M1X's incredible 7-7.5 stop IS ability to keep the overall image stationary, and then combines the multiple shorter-exposure images to provide the desired motion blur. Because it doesn't require the sub-pixel shifts of Handheld High-Res mode, the camera can simply count on the IS system to keep the multiple images aligned, and so can simply combine them without worrying about possible movement from one to the next.

Can both Handheld High-Res and Virtual ND Filter modes handle camera rotation?

Particularly in the case of Handheld High-Res mode, I wondered whether the process could handle small rotations about the lens axis. It's one thing to simply make small shifts in the X/Y directions to align two images, but quite another to take into account rotations of one image vs another.

It turns out that this isn't a problem, and that both Handheld High-Res and Virtual ND Filter modes handle camera rotation just fine. (The latter isn't unexpected, thanks to the E-M1X's five-axis IS system, but I was surprised to learn that the processing of Handheld High-Res mode could handle rotation as well.)


How does USB power delivery work? Can it both operate the camera and charge the batteries?

Yes, just not at the same time.

A big feature of the E-M1X is that it can take advantage of USB-C Power Delivery to power the camera and charge its batteries. We were curious as to the details on how this worked, and were a little confused by the early specs.

It turns out that the camera can operate indefinitely from a USB-C power source, and can also charge its batteries from one, just not at the same time. If the camera is operating from USB-C power, it will do so for as long as the power source can provide juice, without discharging its internal batteries at all. (But it does apparently need at least one internal battery with at least a tiny amount of charge left, in order to turn on and recognize the USB-C power source.)


It won't charge its batteries while running from USB-C power, though; it just won't discharge them either. The camera will charge its batteries if it's plugged into a USB-C source while turned off. When it's doing that, the batteries seem to charge about as quickly as they would in the external charger that ships with the camera: I timed it charging a pair of batteries drained to 10% each, and it took about two hours to bring them to 90% charge.

The ability to power the camera from a USB-C power brick will be a huge boon to people wanting to shoot video for extended periods or to those doing time-lapse photography. It's also great to be able to top-off the batteries when not shooting, whether on a break, driving in the car or whatever.

Does the E-M1X change the phase-detect AF "baseline", based on the maximum lens aperture?

This is a bit of a complicated point, needing a bit of background explanation. Phase-detect AF systems look at light rays coming from opposite sides of the lens, and correlate what they see to determine how far in or out of focus a subject is. Correlation means the system shifts the image it sees coming from each side of the lens back and forth until the two align. If the subject is in focus, no shifting is needed to maximize the alignment. If the lens is focused in front of or behind the subject, though, the images from the two sides of the lens will need to be shifted one way or another to align with each other.

The sensitivity of the AF system in measuring out-of-focus amounts depends on how wide an area it is looking across for each point that its determining the focal distance for. The distance across which the camera looks to evaluate alignment is called the "baseline" of the phase detect system.


The distance that can be "looked across" is a function of the maximum aperture of the lens. Large lens apertures mean the AF system can see light rays arriving from greater angles, and so can look across greater distances to determine the alignment. Smaller apertures mean the camera can only consider smaller distances when comparing alignment for the sake of determining focal distance.

My question here was whether the E-M1X's AF sysgtem changes the length of its phase-detect baseline depending on the aperture of the lens being used. (Based on my understanding, some mirrorless systems do this, using longer baselines for larger-aperture lenses, and shorter baselines for smaller-aperture ones.)


The answer surprised me: The E-M1X evaluates focal distance for all points across the entire sensor simultaneously, rather than processing the individual AF points separately. It sounds like it does indeed process all the phase-detect data depending on the maximum aperture of the lens in use, but the most interesting bit of information to me was that the E-M1X essentially develops a depth map across the entire image area, feeding that information to higher levels of subject detection and area-of-interest processing. [But not, as noted previously, to the AI subject recognition system; rather, this applies to normal AF processing only.]

This was the first time I'd heard that a camera calculates a full-image depth map as part of routine AF processing.

How do they achieve EV -6 low light AF sensitivity?

I was running out of time in the interview at this point, but wondered about the E-M1X manages to focus in a light level of EV -6. That's really, really dim, a level corresponding to dim moonlight.


It turns out that (like other mirrorless cameras capable of extremely low-light AF), the E-M1X switches to contrast-detect AF at the very darkest light levels. This wasn't unexpected, and frankly makes sense. If you're shooting under such dim conditions, your subject isn't going to be moving much, or it will be hopelessly blurred during the required exposure time. So the longer time required for contrast-detect AF to work (especially under such dim conditions) isn't typically an issue: You just want the camera to focus, never mind if it takes a few seconds.


So that's some of the story behind the amazing tech in the Olympus E-M1X. We have a hands-on review of it elsewhere, based on preproduction firmware, and will be updating that now that we've received a final firmware version.