Trooper Dan, on 06 March 2019 - 12:39 PM, said:
I think that would be a wasted effort. Chances are, the process will be improved in the future (AI upscales are a relatively new thing) and then all of the clean up work will have gone to waste.
It would make much more sense to clean up the original low-res sprites (as Jimmy suggested in a different thread). Then any upscales made from them would show the improvements every time.
My logic here is this:
1. I do believe that AI upscaling will probably not surpass human artists in the foreseeable future. There are many details in the low-res video game graphics that are kinda "there" but they have to be basically guessed by human perception from a handful of pixels. The straightforward type of AI upscaling we've dealt with so far can only enlarge these small pixels but is rather limited in the way of adding or enhancing this detail.
An AI network may have a semantic level that allows to replicate the guessing process of human perception, and I think high-profile networks that boast a "hallucination" feature can do something similar
but this is mainly designed for photographs featuring real-life objects, whereas video games are full of imaginary objects that have no real-file counterparts. It may be very difficult to get a correct "guess" even if the network has a semantic level of image identification involved in the upscaling process.
2. AI upscaling/Single-Image Super Resolution (SISR) tools are currently developed primarily for photographic images, which are in my opinion fundamentally different from low-res video game graphics:
- when a photograph is scaled down using any conventional sampling method, there's an objective loss of data due to lower resolution; conversely, low-resolution video game graphics are not actually characterised by loss of image data --- rather, the amount of data in the image is limited by the low resolution. In fact, high quality graphics assets have as much detail as possible within these limits, which is achieved by the artist's hand
- low-res video game graphics, even when derived from higher-resolution originals (scanned hand-drawn art, 3D models or digitised photographs), is usually (post-)processed so that the resulting image fits the target resolution. Details are enhanced, noise is removed. This is completely unlike a straightforwardly downsampled photograph which is the primary subject matter of ESRGAN and similar tools. We know for a fact that the sprites in Duke3D and other Build games were edited after having been scaled down from angles of Chuck Jones' original models
- there's a difference in scale. Most photographic images, be it portraits or landscapes, have both large features and small details. E.g. if you scale down a portrait you will still see the face, if you scale down a landscape you'll still discern major features like houses, trees etc. Small details will be in many cases reduced to incoherent pixels or disappear completely. Conversely, low-res video game art has both large features and small details that are theoretically intact, unless we deal with images produced by down sampling without post-processing
- if we're talking about art in VGA mode games, the art there was supposed to be viewed in a roughly 2x the original size on the screen, and that, too affected how the art was drawn and/or edited. There's no similarity to scaled down photographs in this respect, which are supposed to be always viewed in their original resolution
- finally an important thing to remember is that video game art from the era we're interested in is limited to an indexed palette, which inevitably entails sharper contrasts and generally different colour arrangement (people who better understand this topic may correct me or provide more proper terms) than in a true-colour photograph
3. As said above, the creation of video game art involves port-processing and hand editing. I'm fairly certain that
sebabdukeboss20' new monsters that were created from models received at lest some degree of manual refining after the animation frames were produced from high-res models, right? The same is true for the original monster sprites in
Duke3D,
Shadow Warrior, etc. This post-processing is necessary to adapt each image to its current resolution.
Now why should this be different if we're going in the opposite direction of scaling sprites up? The fundamental change is the same as with scaling down: the image is converted to a different resolution which allows a different level of detail. I may be wrong but I don't believe that any AI network alone will be able to accomlish this task of adjusting the image to the proper resolution as a human artist, at least in the foreseeable future.