top of page
Writer's picturerehsd

Troubleshooting FreeDOS Boot on my 286 System - Hangs on execrh() Call in Kernel

Updated: Jun 5, 2023

As I continue to work towards getting FreeDOS running on my 286 homebrew build, I am encountering a new issue. During boot, the FreeDOS kernel makes a call to execrh(), and it appears the code within execrh() does not like my system, as it hangs at that point. I appear to have an issue with my system -- mostly likely something that should be there, that FreeDOS is expecting to be there, and yet I don't have it there.


I will update this post as I receive suggestions, test changes, and learn more.


This issue has been resolved!

See bottom of post for additional details.


During startup of the kernel, the final steps prior to the system hanging are as follows (the best that I can tell, at least):

  1. The truename() function [newstuff.c] is called.

  2. The truename function then calls media_check() [fatfs.c].

  3. The media_check() function calls rqblockio() in the same file.

  4. The rqblockio() function then calls execrh() [execrh.asm].

  5. I am speculating that the execrh() routine is jumping to FL_DISKCHANGED in floppy.asm, based on the parameters. If correct, FL_DISKCHANGED should then raise interrupt 13H, function 16H (read change status type). My 286 system logging is not seeing this interrupt making it through.

  6. Scratch #5. It appears the execrh() call should be going to routines in dsk.c. I'm digging deeper into this.

The execrh() function makes a call. The kernel execution stops at this point.


I built a debug version of the kernel and also added some additional logging. Here are screen captures of my system booting (click for larger view). See later video with updated boot output.

I have not been able to figure out to where the execrh() function is calling or why the system is hanging up at this point.


If I place the drive (actually, a CF Card) in a standard Dell PC with a PCIe to ISA to CF Card adapter, it seems to boot fine, so there must be something with my BIOS or hardware on my 286 build.




5/28/2023 Update


I submitted the following to the FreeDOS developer listserv: [Freedos-devel] struct alignment issue? | The FreeDOS Project (sourceforge.net).



I have resolved the even (low) byte write issue. I had to run the BHE# signal through a latch before going to RAM. I will need to modify the PCB to latch this signal out of the processor before going to RAM, ROM, and the ISA bus.





Configuration

  • 80286 with 640 KB of RAM, 128 KB of I/O which includes VGA, 256 KB of ROM BIOS. No high memory (HMA) is configured in the system.

  • BIOS built with NASM 2.16.01 on Windows 11 development PC. BIOS installed on high/low byte flash ICs on 286 system board.

  • FreeDOS virtual machine running on Windows 11 development PC. Used for building FreeDOS for 286 system.

  • FreeDOS kernel source 2.43

  • NASM 2.16.01

  • Watcom 1.9

  • FreeDOS bootloader and kernel are on a CF Card, using 8086 with FAT32 options. The CF Card is connected to an ISA IDE adapter in the 286 system.

Additional Details

  • My BIOS currently supports the following INT13H functions (disk.asm):

    • 0x00: reset disk system

    • 0x02: read disk sectors

    • 0x08: get current drive params

    • 0x15: read DASD type

    • 0x16: disk change status

    • 0x41: check extensions present

    • 0x42: extended read sectors

    • 0x48: extended read drive params

  • I have not yet added write sector functions to the BIOS. I am not seeing any calls to write functions in the kernel boot process.

  • I log all interrupt calls through my 286 serial debugger. I can also catch any interrupt calls where support is not present in my BIOS. The execrh() function does not appear to be calling any BIOS interrupts (or at least no interrupts calls are making it to the BIOS).

  • My BIOS supports CHS and LBA. Due to the size of the drive, FreeDOS appears to stay in CHS mode. In the logged output, I see LBA not enabled for drive C:. In the soruce code for initdisk.c, I see this comment: /* Turn of LBA if not forced and the partition is within 1023 cyls and of the right type */.

  • For this stage of development, I have dropped my system bus clock speed down to 8MHz, resulting in a 4MHz internal processor clock.

  • I do not have DMA support in my system, and I am using programmed I/O only.

  • Additional information about the 286 system: 286 Build - Six Months In.


BIOS Source Code

I am far from a skilled x86 assembly or C developer, so please be gentle. :) Here is a link to my current BIOS source: /WorkingCode/20230528_alignmentissue_questionmark at main · rehsd/x86 · GitHub.


Possible Causes ???

  • Some disk, partition, or file system information is not being properly populated somewhere (e.g., standard IBM PC memory location, a variable, etc.).

  • Earlier in the call chain, possibly cds, dpb, dpbp, or mediareqhdr aren't being properly populated.

  • Corrupted memory due to improper writes.

  • Alignment issue.

  • ...?

Suggestions?

If anyone has suggestions, please let me know (thank you!). I plan to research IDE initialization on IBM PC systems to see if I am missing something critical on boot, such as populating some memory structures with system information (such as disk information). I also need to learn more about how FreeDOS handles IDE, FAT, and device drivers (as execrh.asm is documented as a "request handler for calling device drivers"). I will update this page as I learn more.


Things to To / Test

As I receive suggestions of things to try, I will queue them up here and post updates as I work through them.

  • Force LBA. Complete, this did not change the behavior.

  • Inject logging/debugging code in kernel/floppy.asm. In process...

  • Test disk I/O outside of the FreeDOS kernel (e.g., a simple bootloader application). So far, all testing seems to indicate CHS and LBA reading is working fine. I can load the MBR, then load the boot sector, then load kernel.sys without issue.

  • Review all code in my BIOS to make sure I'm not update incorrect memory locations or trashing registers, the stack, etc. In process...

  • Significantly improve error handling in INT13 disk services BIOS code.

  • Work backwards in the FreeDOS code from execrh() and see where pointers/variables might not be getting populated correctly (e.g., dpbp, dpb, cds).

  • Reduce FreeDOS's kernel down to very simple file access to see if I can get that to work.

  • ...


Resolution Details

There was no single root cause, but rather a series of contributing factors. In order of impact:

  1. Hardware bug where BHE# was directly connected from the 286 CPU to the RAM. This signal requires going through a latch, similar to the address signals on the bus.

  2. Moving BIOS-related memory usage out of potential spaces where FreeDOS will load.

  3. Clean-up of all BIOS code to ensure no accidental writes to incorrect locations. This was especially a concern when the data segment was assumed to be 0x0 but was actually something else, coming from FreeDOS into an interrupt.

  4. Stack setup to support returning CPU flags from interrupts.

  5. Setup of BIOS Data Area (BDA) for use by FreeDOS.

185 views2 comments

2 Comments


Unknown member
May 10, 2023

In FreeDOS, the `execrh` macro is used to execute a program in real mode. The parameters passed to this macro are used to set up the environment for the program, including its entry point and memory allocation.


The `contruct` strategy address is a pointer to a data structure that describes the memory layout and other properties of the program being executed. This data structure is commonly referred to as the program's control block, or PCB.


The PCB typically includes information such as the program's entry point, the amount of memory it requires, and any command-line arguments that were passed to it. By passing a pointer to the PCB as a parameter to `execrh`, FreeDOS is able to set up the…


Like
rehsd
rehsd
May 10, 2023
Replying to

Maarten, this is super helpful. Thank you!! I will work on reviewing the values of the PCB (and the structure of it, in general) that are passed into execrh().

Like
bottom of page