Duke4.net Forums: How about porting Duke 3D to my new engine? - Duke4.net Forums

Jump to content

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

How about porting Duke 3D to my new engine?  "Make Duke levels playable in a game powered by my Brahma engine"

#31

View PostJan Satcitananda, on 10 September 2018 - 08:26 AM, said:

In multithreaded mode, it will handle everything smoothly. Also there's "realtime adaptive detail" feature that will automatically adjust visual quality based on rendering time so the game will never slow down and you'll see perfectly stable framerate. My goal is to develop a technology from scratch that is internally unique down to pixel level, not relying on specific hardware. There's a lot of GPU-powered engines, all of them look pretty much the same using polygon meshes, BSP and bilinear texture filters. This is not what I want. I want a powerful but nonconventional engine that I fully understand. Perhaps, mine will use GPU for physics or AI.

Your conflating art styles with tech. It's alright to say "I like raycasted 8 bit game visuals", but your hiding behind nonsense by saying, the CPU is better for doing vector math for graphics, then the GPU...because that's just WRONG!

EDIT:
Do you remember NVIDIA trying to launch dedicated physics boards? That product died when CPU's added more then one core on the die, because processing that work on the CPU was faster then NVIDIA's physics boards. Stop putting work on the wrong hardware!

This post has been edited by icecoldduke: 10 September 2018 - 08:59 AM

0

User is offline   Phredreeke 

#32

View Posticecoldduke, on 10 September 2018 - 06:30 AM, said:

Multithreading the "software renderer" would be a waste of time. The "Software Renderer" would be benefit from parallelization, but not on the CPU. As it turns out the GPU can run a lot more work in parallel then the CPU ever could(in terms of vector math processing), so you could easily get a huge speed up by moving the "software renderer" over to GPU compute.


I'd love to see this - ClassicNG when? :rolleyes:
1

#33

View Posticecoldduke, on 10 September 2018 - 08:51 AM, said:

Your conflating art styles with tech. It's alright to say "I like raycasted 8 bit game visuals", but your hiding behind nonsense by saying, the CPU is better for doing vector math for graphics, then the GPU...because that's just WRONG!

Vector math is not everything needed for graphics. There are various sortings, lots of lookup tables, span buffers and nontrivial things being used. Besides, I don't quite know how to couple software with hardware acceleration effectively. I did some OpenGL programming, but that was on simple 2D graphics. I couldn't get any further. And don't want to go into debate that I shouldn't work on my engine and learn Unreal or Unity instead. I've tried once, and it wasn't what I need for my games.
1

#34

View PostPhredreeke, on 10 September 2018 - 09:02 AM, said:

I'd love to see this - ClassicNG when? :P

I will literally do ClassicNG to win this argument, its really not that much work. Pettiness is a huge driving factor behind me doing stuff :rolleyes:, and stopping this nonsense from spreading is good enough for me. How about we compare my GPU accelerated raycaster vs this nonsense? I'll let you guys know when its complete.

This post has been edited by icecoldduke: 10 September 2018 - 09:15 AM

-5

#35

View PostJan Satcitananda, on 10 September 2018 - 09:04 AM, said:

Vector math is not everything needed for graphics. There are various sortings, lots of lookup tables, span buffers and nontrivial things being used. Besides, I don't quite know how to couple software with hardware acceleration effectively. I did some OpenGL programming, but that was on simple 2D graphics. I couldn't get any further. And don't want to go into debate that I shouldn't work on my engine and learn Unreal or Unity instead. I've tried once, and it wasn't what I need for my games.

I think your conflating technique with pass implementation. It's alright to say you don't know how to do something, but blindly telling people "the cpu is better for classic rendering then the gpu", is wrong. I would never tell you to stop working on your engine. However I will tell you when your ideas behind your implementations are garbage, especially when people on here start believing your rational.

This post has been edited by icecoldduke: 10 September 2018 - 09:21 AM

0

#36

View Posticecoldduke, on 10 September 2018 - 09:08 AM, said:

I will literally do ClassicNG to win this argument, its really not that much work. Pettiness is a huge driving factor behind me doing stuff :rolleyes:, and stopping this nonsense from spreading is good enough for me. How about we compare my GPU accelerated raycaster vs this nonsense? I'll let you guys know when its complete.

You can compare performance or feature set, but can't compare the spirit of the engine. Your aggressive manner of arguing makes no sense.
4

#37

View PostJan Satcitananda, on 10 September 2018 - 09:38 AM, said:

You can compare performance or feature set, but can't compare the spirit of the engine. Your aggressive manner of arguing makes no sense.

I really like that your passionate about your project. I kinda feel terrible for coming out as strong as I did, but what irritated me was you making assertions that are factually incorrect. If all you said was "Here is my awesome engine, it's doing classic rendering without GPU acceleration", I wouldn't have said anything. I guess you didn't understand that you can use the GPU for rendering without doing rasterization. I've had a SDF(signed distance field) based renderer for awhile now, which doesn't use standard rasterization at all. You really need to learn compute, however I will do a classic based renderer on the GPU in compute(without Polygons, and only use OpenGL for swapchain stuff), and maybe that will be a great learning tool for you.

This post has been edited by icecoldduke: 10 September 2018 - 10:05 AM

1

#38

View Posticecoldduke, on 10 September 2018 - 09:54 AM, said:

I really like that your passionate about your project. I kinda feel terrible for coming out as strong as I did, but what irritated me was you making assertions that are factually incorrect. If all you said was "Here is my awesome engine, it's doing classic rendering without GPU acceleration", I wouldn't have said anything. I guess you didn't understand that you can use the GPU for rendering without doing rasterization. I've had a SDF(signed distance field) based renderer for awhile now, which doesn't use standard rasterization at all. You really need to learn compute, however I will do a classic based renderer on the GPU in compute(without Polygons, and only use OpenGL for swapchain stuff), and maybe that will be a great learning tool for you.

Okay. Certainly pretty interesting to learn about OpenGL rendering without rasterization and implement something. The current state of the engine is just a working prototype, I'll change a lot of things in later iterations which should support true color and native 6-DOF, be faster and support more cool things.
2

#39

View PostJan Satcitananda, on 10 September 2018 - 10:17 AM, said:

Okay. Certainly pretty interesting to learn about OpenGL rendering without rasterization and implement something. The current state of the engine is just a working prototype, I'll change a lot of things in later iterations which should support true color and native 6-DOF, be faster and support more cool things.

Before worrying about DOF and such, figure out what kind of game you want to make on top of your tech, then make the tech to fit the game. You already have a 8bit renderer that runs without a dedicated GPU. There is a market for retro style games, tap into that market and let your game sell your engine.
0

#40

View Posticecoldduke, on 10 September 2018 - 11:22 AM, said:

Before worrying about DOF and such, figure out what kind of game you want to make on top of your tech, then make the tech to fit the game. You already have a 8bit renderer that runs without a dedicated GPU. There is a market for retro style games, tap into that market and let your game sell your engine.

I have most of the visual part, but currently lack any physics and collision detection, even hitscan routine is unfinished. Beside that, the level editor is rudimentary. Also I want my games to support binaural 3D sound I plan to implement. I have a vision of what games I want to make, and making the tech specifically fit to them.
1

#41

View PostJan Satcitananda, on 10 September 2018 - 12:15 PM, said:

I have most of the visual part, but currently lack any physics and collision detection, even hitscan routine is unfinished. Beside that, the level editor is rudimentary. Also I want my games to support binaural 3D sound I plan to implement. I have a vision of what games I want to make, and making the tech specifically fit to them.

How good is your vector math knowledge?
0

#42

View Posticecoldduke, on 10 September 2018 - 12:44 PM, said:

How good is your vector math knowledge?

Sufficient to make 3D games, I think. Can multiply vectors/matrices, inverse matrix, work with fixed-point arithmetic. I rarely use floating point for a reason.

This post has been edited by Jan Satcitananda: 10 September 2018 - 01:02 PM

0

#43

---

This post has been edited by Jan Satcitananda: 10 September 2018 - 01:02 PM

0

#44

View PostJan Satcitananda, on 10 September 2018 - 01:00 PM, said:

Sufficient to make 3D games, I think. Can add vectors/matrices, inverse matrix, work with fixed-point arithmetic. I rarely use floating point for a reason.

You need to use standard vector math with floats, rather then using fixed point stuff. Fixed point math was good in a time were most CPU's didn't have a FPU, and floating point math was emulated(and really slow). Now-a-days you have dedicated instruction sets on the CPU(such as SSE) that make vector/matrix math a breeze and a lot faster. ALU cost aside(how much time it takes the CPU to actually do the floating point math), the CPU can better predict what memory to bring into the caches when utilizing sse for vector math operations.

This post has been edited by icecoldduke: 10 September 2018 - 01:12 PM

0

User is offline   Hendricks266 

  • Weaponized Autism

  #45

View Posticecoldduke, on 10 September 2018 - 01:10 PM, said:

You need to use standard vector math with floats, rather then using fixed point stuff. Fixed point math was good in a time were most CPU's didn't have a FPU, and floating point math was emulated(and really slow). Now-a-days you have dedicated instruction sets on the CPU(such as SSE) that make vector/matrix math a breeze and a lot faster. ALU cost aside(how much time it takes the CPU to actually do the floating point math), the CPU can better predict what memory to bring into the caches when utilizing sse for vector math operations.

I have not worked in depth with SIMD instructions but since fixed-point math overlays nicely on integers, SIMD instructions should support them. For example, PMULUDQ followed by a right-shift performs a fixed-point multiply using only integers. This leaves you with no NaNs and no denormalized numbers, both of which slow down floating-point computation.
0

#46

View PostHendricks266, on 10 September 2018 - 01:21 PM, said:

I have not worked in depth with SIMD instructions but since fixed-point math overlays nicely on integers, SIMD instructions should support them. For example, PMULUDQ followed by a right-shift performs a fixed-point multiply using only integers. This leaves you with no NaNs and no denormalized numbers, both of which slow down floating-point computation.

With floating point you get the benefit of having operations directly provided in the instruction set, rather then doing the same operation in multiple instructions with fixed point. For example you can write a functions for sqrt, etc in fixed point with SIMD, but that's going to be slower then using the instruction set that provides those functions in floating point only. Generally speaking fixed point math requires more instructions for simple vector math operations like "what side of the line does a point reside on"? So can you use SIMD for fixed point math..yes, but in this case sticking with fixed point is going to be slower then a properly optimized FP based raycaster.

Also switching over to floating point opens the door for him to use some great 3rd party math libraries that have been optimized for speed.

This post has been edited by icecoldduke: 10 September 2018 - 02:00 PM

0

User is offline   Hendricks266 

  • Weaponized Autism

  #47

Modern CPU architectures allot varying numbers of clock cycles depending on the instruction. Is there a measurable wallclock time difference between two integer instructions and one floating point instruction?
0

#48

View PostHendricks266, on 10 September 2018 - 01:54 PM, said:

Modern CPU architectures allot varying numbers of clock cycles depending on the instruction. Is there a measurable wallclock time difference between two integer instructions and one floating point instruction?

In my experience the performance delta between two integer instructions vs one floating instruction is really small(with the single instruction being slightly ahead, be negligible for most apps). Since hes doing a traditional software renderer on the CPU, I think the delta between the two will add up especially if he wants to hit 4k res(or higher), but that's assuming his engine is properly optimized. To your point, I think he'll have to deal with things like cache line misses before he can tackle this issue. At the very least its one less headache for him, if he uses properly optimized math libraries that are all over the internet, which would require him to use floating point math.

This post has been edited by icecoldduke: 10 September 2018 - 02:08 PM

0

#49

View Posticecoldduke, on 10 September 2018 - 02:07 PM, said:

In my experience the performance delta between two integer instructions vs one floating instruction is really small(with the single instruction being slightly ahead, be negligible for most apps). Since hes doing a traditional software renderer on the CPU, I think the delta between the two will add up especially if he wants to hit 4k res(or higher), but that's assuming his engine is properly optimized. To your point, I think he'll have to deal with things like cache line misses before he can tackle this issue. At the very least its one less headache for him, if he uses properly optimized math libraries that are all over the internet, which would require him to use floating point math.

The reason behind integer/fixed-point math is the speed of calculations with the aid of lookup tables, and because game world should have uniform metric scale everywhere (1 meter is 262144 units by default what should be sufficient for most FPS games), it implies integer vectors to use. Also it's the spirit of 90's when not all computers had coprocessors. SIMD supports fixed-point math since AVX2, I use it sometimes, but failed to gain any speed in rendering textured lines. My record was 3-4 clock cycles per pixel for horizontal lines and arbitrary size textures, each clut adding one clock cycle. It's strange, but a loop having three multiplications per pixel worked faster than traditional bit shifting.

So far I'm facing mostly memory access speed limitations (say, filling with vertical lines in 1024 pixels wide image is slower than other resolutions because of cache peculiarities. Ideally, one should fill several lines at once. Also, very large textures will cause cache misses unless they are stored in 'swizzled' mode which optimizes caching a lot.

This post has been edited by Jan Satcitananda: 10 September 2018 - 11:42 PM

0

#50

View PostJan Satcitananda, on 10 September 2018 - 11:35 PM, said:

The reason behind integer/fixed-point math is the speed of calculations with the aid of lookup tables.

Saving ALU with lookup tables isn't going to give you any performance gains, your cache misses are probably gong to kill your ALU gains. By the sounds of the things, your not designing the renderer to be cache friendly.

This post has been edited by icecoldduke: 11 September 2018 - 05:03 AM

0

#51

View Posticecoldduke, on 11 September 2018 - 05:01 AM, said:

Saving ALU with lookup tables isn't going to give you any performance gains, your cache misses are probably gong to kill your ALU gains. By the sounds of the things, your not designing the renderer to be cache friendly.

I'm aware of caching efficiency. I have a special format which allows, for example, using 32768x32768 textures with consistent caching, but I'm yet to support it by Brahma renderer. What I might do at some point is to replace my 65536-point sine table (eating 128k) with 512-point lut plus linear interpolation, but so far I don't face serious caching performance issues. On maps imported from Build, my renderer prototype has slightly better quality and is only a bit slower than EDuke32 classic renderer, and with multithreading enabled is can easily beat Polymost in some cases.
0

#52

View PostJan Satcitananda, on 11 September 2018 - 05:49 AM, said:

I'm aware of caching efficiency. I have a special format which allows, for example, using 32768x32768 textures with consistent caching, but I'm yet to support it by Brahma renderer. What I might do at some point is to replace my 65536-point sine table (eating 128k) with 512-point lut plus linear interpolation, but so far I don't face serious caching performance issues. On maps imported from Build, my renderer prototype has slightly better quality and is only a bit slower than EDuke32 classic renderer, and with multithreading enabled is can easily beat Polymost in some cases.

Why are you focused on creating a system that supports 32k x 32k textures when you don't have collision or physics working? Creating a system that renders megatextures is really easy, creating a development tool that can make a megatexture of value is really difficult and really not worth it.
0

#53

View Posticecoldduke, on 11 September 2018 - 06:35 AM, said:

Why are you focused on creating a system that supports 32k x 32k textures when you don't have collision or physics working? Creating a system that renders megatextures is really easy, creating a development tool that can make a megatexture of value is really difficult and really not worth it.

I'm focused on what I feel inspiration for. I think this is the fastest, most enjoyable and right way to do things. I'll get to pixel-precise collision detection soon.

Modern games usually look complex and very detailed, but the player physics are very simplified and feel the same almost everywhere. My principle is more to "what you see is what you collide with" with aesthetics over excess details.
3

User is offline   Romulus 

#54

View Posticecoldduke, on 10 September 2018 - 08:51 AM, said:

EDIT:
Do you remember NVIDIA trying to launch dedicated physics boards? That product died when CPU's added more then one core on the die, because processing that work on the CPU was faster then NVIDIA's physics boards. Stop putting work on the wrong hardware!


I am sorry, but most of the time when you try speaking hardware, you provide misinformation. The original PhysX calculation chip was developed by a company named NovodeX, which was later acuqired by AGEIA. AGEIA also acquired Meqon, the same Meqon Physics that supposedly got built into DNF's Unreal Engine implementation. AGEIA had dedicated PhysX cards that worked with your discrete graphics card regardless of brand. But after nvidia acuqired AGEIA, nvidia started using GPGPU aka CUDA to accelerate AGEIA's PhysX SDK.

Processing PhysX on the CPU even to this date is not faster than processing it on a hardware that supports it natively. Even on a 8700K, if you play Borderlands with PhysX turned on, it takes a significant performance hit. And PhysX settings can't be set to high on the CPU because nvidia wanted this to be proprietary and judging by the lackluster performance CPUs provided even with the PhysX settings to low, it's justified.
0

#55

View PostRomulus, on 11 September 2018 - 08:29 AM, said:

I am sorry, but most of the time when you try speaking hardware, you provide misinformation. The original PhysX calculation chip was developed by a company named NovodeX, which was later acuqired by AGEIA. AGEIA also acquired Meqon, the same Meqon Physics that supposedly got built into DNF's Unreal Engine implementation. AGEIA had dedicated PhysX cards that worked with your discrete graphics card regardless of brand. But after nvidia acuqired AGEIA, nvidia started using GPGPU aka CUDA to accelerate AGEIA's PhysX SDK.

Processing PhysX on the CPU even to this date is not faster than processing it on a hardware that supports it natively. Even on a 8700K, if you play Borderlands with PhysX turned on, it takes a significant performance hit. And PhysX settings can't be set to high on the CPU because nvidia wanted this to be proprietary and judging by the lackluster performance CPUs provided even with the PhysX settings to low, it's justified.

I could have sworn NVIDIA made there own NVIDIA branded physics board's at one point, but ill stand corrected on that single point. On a couple games I've worked on we tried off loading PhysX work over to compute shaders and the performance gains were so insignificant(and sometimes slower) compared to the same work properly multithreaded on the CPU. So unless your running a giant universe simulation(like the Universe VR game), you don't need physics on dedicated hardware for the level of physics required for most games.

EDIT:
I can't speak for Borderlands(it's possible that title wasn't well optimized for PhysX), but there are only a few usecases for wanting physics sim on the hardware. From a raw computation perspective, you could optimize the sim to run in compute, but at least on PC, you have to send the data down to the hardware, wait for it, and read it back, the gains you get are simply lost during that process. However if you have something like a cloth sim, and you want to deform the vertices against the environment in real time, then you might have something, but the usecase for stuff like that is for one off cinematics, and you can simply bake the deformations, and the GPU time wasted on the sim isn't worth it. On pretty much every game I've worked on the CPU is more starved for work then the GPU is, and having physics on the CPU is well worth it, which leaves the GPU to do other things. So I admire Jan for wanting to put more shit on the CPU, but hes going about it all wrong IMO :rolleyes:.

This post has been edited by icecoldduke: 11 September 2018 - 09:07 AM

0

#56

Have you made any more progress?
0

#57

Just trying to make voxel sprites look normally (without holes between voxels and excess overdraw), and employ back face culling. I don't want to waste resources rendering voxels as real cubes, but there will be a faster 2D approximation of a cube. Then I enable voxel mipmaps, add code for mirrored cases and I'm done with KVX support.

Otherwise, Brahma voxel system is a lot more powerful and flexible than Build's. As I foresee it, voxels can be flipped, mirrored, sliced and skewed vertically, also receiving dynamic light according to normal vectors, and dynamically modified with each voxel sprite instance remaining unique. But no free 6-DOF rotation so far.
3

#58

Recently refined my voxel rendering routine, here are the before/after screenshots. Also added LODs and simpler and more streamlined drawing methods for distant voxels with less quality demand.
Attached Image: CAPT0039.PNGAttached Image: CAPT0041.PNG

The current rendering algorithm is a bit slow for high-detailed high resolution voxels. To stress-test my engine, I've used the old voxel trees I've made back in 2004 to introduce into my now-abandoned Duke mod. These trees are converted 3DS models and are very tough to render, so in my test map I don't get smooth framerates even in multithreaded mode. Actually, the boost is only 2x on 8 threads; maybe I should use better clipping to make multithreading more efficient. How my threading works is that each thread renders its own vertical band of image independently. Maybe not optimal, but simple and reliable way.
Attached Image: CAPT0043.PNG
5

#59

View PostJan Satcitananda, on 14 September 2018 - 12:00 PM, said:

Recently refined my voxel rendering routine, here are the before/after screenshots. Also added LODs and simpler and more streamlined drawing methods for distant voxels with less quality demand.
Attachment CAPT0039.PNGAttachment CAPT0041.PNG

The current rendering algorithm is a bit slow for high-detailed high resolution voxels. To stress-test my engine, I've used the old voxel trees I've made back in 2004 to introduce into my now-abandoned Duke mod. These trees are converted 3DS models and are very tough to render, so in my test map I don't get smooth framerates even in multithreaded mode. Actually, the boost is only 2x on 8 threads; maybe I should use better clipping to make multithreading more efficient. How my threading works is that each thread renders its own vertical band of image independently. Maybe not optimal, but simple and reliable way.
Attachment CAPT0043.PNG

Have you profiled your renderer and checked to see were the bottlenecks are?
0

#60

View Posticecoldduke, on 14 September 2018 - 02:33 PM, said:

Have you profiled your renderer and checked to see were the bottlenecks are?

Yes, I always use profiling to search for places to optimize. Slab column rendering eats up most of the processor time :rolleyes:
0

Share this topic:


  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic


All copyrights and trademarks not owned by Voidpoint, LLC are the sole property of their respective owners. Play Ion Fury! ;) © Voidpoint, LLC

Enter your sign in name and password


Sign in options