Partitioning GPT Disks on UEFI systems

Tom’s Hardware (December 30, 2009 by Patrick Schmid and Achim Roos)

“You need a 64-bit version of Windows XP Professional (with the same core operating system version as Windows Server 2003 SP1), Vista, or Windows 7, to get full GPT support for drives that offer more than 2TB capacity. GPT offers the following features:

  • Maximum raw partition capacity of 18 exabytes
  • The Windows NTFS file system is limited to 256TB each: with up to 2^64 logical blocks in length and a physical cluster size of 65536-bytes (formatting with an allocation unit of 64 KiB sector groups required!)
  • Implementation restriction of 128 partitions per disk (limited by the amount of space reserved for partition entries in the GPT)
  • Primary and backup partition tables available for redundancy
  • Well-defined and self-identifying partition format
  • Each partition has a unique ID to avoid identifier collisions (“GUID” partition table)

Here is an overview of your possible partitioning options and actions on GPT and MBR:

32-bit Windows 64-bit Windows
GPT MBR GPT MBR
Boot No Yes Yes No
Read No Yes Yes Yes
Write No Yes Yes Yes

UEFI support becomes relevant for drives >2TB. It must be present to enable booting from such large partitions provided the other prerequisites (64-bit Windows and GPT) are met.

GPT Details on Windows x64
GPT will automatically install the EFI System Partition (ESP), which contains the boot loader, EFI drivers, and all other necessary files for booting the system (such as boot.ini, HAL, and NT Loader). It utilizes the GUID Partition Table rather than the master boot record. The ESP is 1% of the drive capacity, or a minimum size of 100MB and a maximum size of 1,000MB.

GPT systems will also be equipped with an MSR partition, which stands for Microsoft Reserved. Since GPT partitions don’t allow hidden sectors, Windows utilizes this reserved space for operating system use. If you decide to convert a basic disk into a dynamic disk, Windows will use the MSR partition, shrink its size, and create the dynamic disk database using available space. On drives smaller than 16GB, the MSR partition will be only 32MB. For larger disks, it will consume 128MB.”

Chatwin says: Even if your system is GPT-compliant and created using the necessary partition, you will still need UEFI support to be able to boot from a large volume (UEFI = Unified Extensible Firmware Interface).

In the second place support for 48bit LBA also requires Advanced Format CDB block re-alignment, enabled with an internal PCI-E Host Bus Adapter Card (by using the native Windows AHCI drivers only), your RAID controller and/or HDD device drivers. Here is where Long LBA addressing comes in…

Long LBA addressing increases the number of bytes used to define the LBA in the Command Descriptor Block (CDB). System must support at least 16-byte Command Descriptor Blocks (CDBs) or 32 bytes (Protection Information commands only) and Dynamic Sense Data to access the LBA counts above 2.1TB. Eight LBA address bytes are provided in 16- and 32-byte CDBs. This is a doubling of the 4 bytes in 10-byte commands (http://www.seagate.com/docs/pdf/whit…_readiness.pdf).

Installation requirements
1. Windows XP Professional with SP2 for Windows Server 2003 64-bit, Vista 64-bit, Windows 7 64-bit for GPT.
2. UEFI booting support since Windows Vista with Service Pack 1 (64-bit only).
3. Motherboard with UEFI 2.x implementation for boot drives on a partition larger than 2.2 TB.
4. Windows 7 (SP1) is supposed to understand AF drives, and correctly aligns partitions. Without the updates in a hotfix, Windows can’t detect the drive’s physical sector size, and Windows Update (and some other features) fail. Microsoft explanation and hotfix link: An update that improves the compatibility of Windows 7 and Windows Server 2008 R2 with Advanced Format Disks is available.
5. SP1 with hotfix aren’t correctly updating the storage driver, so it can’t report sector size to Windows. The clue here is that fsutil still reports “bytes per physical sector” as “<Not Supported>”. You can fix this issue to manually install the latest Intel driver (9.6 or later): Intel® Rapid Storage Technology.
6. Alignment instructions for installing XP64:

  • Use LSoft Active Boot Disk Suite (based on WinPE 3.1) to create a partition with an offset of 2048KB, format with 4K sectors and reboot.
  • Replace with XP64 SP2 installation DVD and install without altering the partition parameters.
  • After final installation reboot and use Paragon Partition Manager Server (WinPE 3.0 recovery environment) to convert cluster size from 4KB to 64KB.
  • Shutdown and start from primary HDD partition.

Paragon just introduces a new solution: Migrate to UEFI.
“It’s a simple four-step Copy Hard Disk Wizard that transfers a 64-bit Windows Vista/7 initially installed on an MBR disk in the BIOS mode to a GPT disk and the UEFI mode, thus opening up all benefits of the GPT+UEFI configuration, including support of a high-capacity 2.2TB+ hard drive for use as the primary system HDD.

Become an early adopter and download this component for free>>

Home and small-office users who try to move their 64-bit Windows Vista or Windows 7 software environments from the dated MBR/BIOS configuration to a modern GPT/UEFI platform find that you can’t just change the partitioning scheme and enable UEFI without reinstalling the operating system from scratch. The problem is that Windows Disk Management supports conversion to GPT for empty drives only, so you’ve got no way of converting the system drive to GPT without losing data. Besides Windows Management tools do not provide an option to adjust boot files for UEFI, making it impossible for users transferring established data or system components within a single computer directly from a BIOS boot mode to a UEFI-based mode.

Paragon Migrate to UEFI can help you tackle both issues with minimal effort. Some important benefits:

  • Automatically adjusts Windows OS to start up in the UEFI mode
  • Automatically creates a GPT partitioning scheme during migration
  • Supports AFD (Advanced Format Drive) disks (mixed 512B/4K sector scheme only, including virtual)
  • The source system remains intact and can be loaded in the BIOS mode at any time
  • Copy operation is run without the system restart by employing Microsoft VSS technology”
Advertisements

Table for Operating System – File Cache Alignment

Geeking Out on SSD Hardware Developments (Linux Magazine, February 9th, 2010)

Intel and Smaller Cells
“According to this link the theoretical lower limit on NAND cells appears to be around 20nm. One of the big reasons for this limitation is that as cells shrink in size they are, naturally, closer together. However, the voltage required to program a cell remains about the same (typically 12V). This means that the power density increases (amount of power in a given area) increasing the probability that the voltage will disturb the neighboring cells causing data corruption. Consequently, increasing the density of cells can be a dicey proposition, hence the lower limit.

Recently, IMFT (Intel-Micron Flash Technologies) LLC announced that it had begun sampling 2 bits per cell (MLC) NAND flash chips that have been manufactured using a 25nm process. This announcement is significant because of the increased density and how close the density is getting to the theoretical limit. Plus the fact that one of the best performing SSD drives is from Intel, one of the participating companies in the LLC, and we can see a fairly significant shift in technology.

IMFT is a joint project by Intel and Micro to develop new technologies around Flash storage (primarily NAND). The project started with production several years ago with a 72nm process. They then moved to a 50nm process in 2008 followed by a further reduction to a 34nm process in 2009. It is the 34nm process that current Intel SSDs utilize (Intel X25-M G2). The 34nm process produces a 4GB MLC NAND chip with a die size of 172 mm2.The new 25nm process is targeted for a first product which is an 8GB MLC NAND flash chip with a die size of 167 mm2. So going from 34nm to 25nm doubles the die density.

In addition to the doubling of die density the new chips will have some other changes. The current 34nm generation of chips have a page size of 4KB and 128 pages per block resulting in a block size of 512KB. The new chip will have a page size of 8KB and 256 pages per block. The means that the new block size is 8KB * 256 = 2,048KB (2MB). This change in block size can have a significant impact on performance.

Recall that a block is the smallest amount of storage that goes through an erase/write cycle when any single bit of the block is changed. For example, if any bit within the block is changed then the entire block has to first have the unchanged data copied from the block to cache and then the block is erased. Finally the updated information is merged with the cache data (unchanged data) and the entire block is written to the erased block. This process can take a great deal of time to complete and also uses a rewrite cycle for all of the cells in the block (remember that NAND cells have a limited number of rewrite cycles before they can no longer hold data).

The new 25nm chip will switch from 512KB blocks to 2MB blocks (2,048KB) increasing the amount of data that has to go through the read/erase/write cycle compared to the 34nm chips. To adjust to this change, Intel will have to make adjustments to the firmware to better handle the larger blocks. In addition, it is likely that Intel will have to at least double the cache size to accommodate the larger block sizes. It may have to increase the number of spare pages as well since a single bit change could cause a greater number of blocks to be tagged for updating. However, on the plus side, with larger pages, the controller can do much more optimization for writes including a much greater amount of write coalescing. But this too could increase the amount of cache needed.”

Chatwin’s Table

Conventional Allocation
4 KiB
Alignment
1 MB

Improved Allocation
64 KiB
Alignment
2 MB

OS – 32bit
File Level Cache

Page – 32 K
Size – 128 MB

Page – 64 K
Size – 256 MB

OS – 64bit
File Level Cache

Page – 64 K
Size – 256 MB

Page – 128 K
Size – 512 MB

OS – 32bit
Storage Level Cache

Page – 64 K
Size – 256 MB

Page – 128 K
Size – 512 MB

OS – 64bit
Storage Level Cache

Page – 128 K
Size – 512 MB

Page – 256 K stripe unit
Size – 1 GB

Cluster Unit Aligned
256 Kibit (32 KB)

Windows XP x32
(8*4)
System RAM: 2 GB

Physical Page Size:
32 Kibit (4 KB)
Logical Block Size:
128 KBytes

Sector Size – 512 bytes

Cluster Unit Aligned
512 Kibit (64 KB)

Windows XP x64
(8*8)
System RAM: 4 GB

Physical Page Size:
64 Kibit (8 KB)
Logical Block Size:
256 KBytes

Sector Size – 1024 bytes

Partition Alignment for SSD’s

Microsoft Support: Article ID: 929491 – Last Review: June 8, 2009 – Revision: 4.0 (Summary)

“Disk performance may be slower than expected when you use multiple disks in Microsoft Windows Server 2003, in Microsoft Windows XP, and in Microsoft Windows 2000. For example, performance may slow when you use a hardware-based redundant array of independent disks (RAID) or a software-based RAID.

This issue may occur if the starting location of the partition is not aligned with a stripe unit boundary in the disk partition that is created on the RAID. A volume cluster may be created over a stripe unit boundary instead of next to the stripe unit boundary. This is because Windows uses a factor of 512 bytes to create volume clusters. This behavior causes a misaligned partition. Two disk groups are accessed when a single volume cluster is updated on a misaligned partition.

To verify that an existing partition is aligned, divide the size of the stripe unit by the starting offset of the RAID disk group. Use the following syntax:

((Partition offset) * (Disk sector size)) / (Stripe unit size)

Example of alignment calculations in bytes for a 256-KB stripe unit size:

(63 * 512) / 262144 = 0.123046875
(64 * 512) / 262144 = 0.125
(128 * 512) / 262144 = 0.25
(256 * 512) / 262144 = 0.5
(512 * 512) / 262144 = 1

Example of alignment calculations in kilobytes for a 256-KB stripe unit size:

(63 * .5) / 256 = 0.123046875
(64 * .5) / 256 = 0.125
(128 * .5) / 256 = 0.25
(256 * .5) / 256 = 0.5
(512 * .5) / 256 = 1

These examples shows that the partition is not aligned correctly for a 256-KB stripe unit size until the partition is created by using an offset of 512 sectors (512 bytes per sector).”

SMART Modular Technologies (author: Esther Spanjer), April 2010

“NAND flash devices are divided into erasable blocks composed of multiple pages (up to 256 pages per block, and up to 8KB per page). A flash block must be fully erased prior to re-writing, and a single-block erase process can take up to several milliseconds. The write speed may suffer a great deal if the SSD controller has to perform unnecessary block erase operations due to partition misalignment. Proper partition alignment is one of the most critical attributes that can greatly boost the I/O performance of an SSD due to reduced read modify‐write operations.

Windows XP or Windows Server 2000/2003 start partition offset at 31.5KB (32,256 bytes). Due to this misalignment, clusters of data are spread across physical memory block boundaries, incurring read-modify-write penalty. As a result, the host ends up writing at least 2X more I/O for every write as illustrated in this figure:

Alignment
When choosing a partition starting offset, SMART Modular recommends system integrators to correlate the partition offset with the RAID stripe size and cluster size to achieve the optimal SSD I/O performance. The figure shows an example of a misaligned partition offset and an example of an aligned partition offset for Windows Server.”

Chatwin says: Based on the conclusions as mentioned above, I would recommend the following alignment and cluster settings for NAND flash devices (unit = 512 bytes):

((Partition offset / Flash unit) * (Disk sector size)) / (Erase block)

((2048 / 2) * .5) / 512K = 1
((4096 / 4) * .5) / 512K = 1

With a flash unit of 2 for SLC flash (aka. 1K, 1 byte/cell group [32×32]) and 4 for MLC flash (aka. 2K, 2 bytes/cell group [32×32]), this results in an offset of 1 MB (2048) for SSD’s with Single Layer Cell memory, and 2 MB (4096) for Multi Layer Cell memory, when you want to align your partitions on erase block boundaries. The best cluster size (allocation unit) can be calculated with this formula:

((Single erase block / Flash unit) * (Disk sector size)) = Cluster Size

((128 / 2) * .5) = 32K (SLC),
((512 / 4) * .5) = 64K (MLC, “The Force“)

With 8K as the ideal sector size for flash memory (4K SLC), this is in my opinion the best choice when you want the fastest performance for your SSD (Indilinx, Samsung, JMicron controllers), unless you’re more concerned about maximizing your storage capacity at any price. And not to forget: a longer lifespan, for reducing a lot of overhead and fragmentation compared to the standard NTFS formatting of 4096 bytes. Keep in mind that this setting was default in a time that hard disks were limited to 2-4 GB, instead of 2 TB in this time and age.

%d bloggers like this: