Linux Kernel 5.2 on Ext4 will allow case-insensitive searching

case-insensitive

Ted ts'o, the author of the ext2 / ext3 / ext4 file systems, accepted the Linux-next branch, on base from which the Linux Kernel 5.2 release will be created, a set of changes that implement support for independent case operations in the Ext4 file system.

The patches they also add support for UTF-8 characters in file names. The non-character case mode of operation is optionally included in the link to separate directories using the new attribute "+ F" (EXT4_CASEFOLD_FL).

Case-insensitive for ext4

When this attribute is installed in the directory, all operations with files and subdirectories that are inside her will not be case-sensitive, including the case will be ignored when searching and opening files (eg Test.txt, test.txt and test.TXT in similar directories) will be considered the same).

That is, it matches a directory entry, even if the name used by the user space is not a byte-for-byte that matches the disk name, but is a case-sensitive equivalent version of the Unicode string.

This operation is called a case-insensitive file name lookup. The feature is configured as an inode attribute applied to directories and inherited by their children.

This attribute solo can be enabled on empty directoriess for file systems that support the encoding function, thus avoiding the collision of file names that only differ on a case-by-case basis.

By default, with the exception of directories with the "+ F" attribute, the file system is still case-sensitive. To control the inclusion of case-insensitive mode, a modified set of e2fsprogs utilities is provided.

This patch implements actual support for case-insensitive filename lookups in ext4, based on the feature bit and the encoding stored in the superblock.

A job that took a long time to arrive

The patches were prepared by Gabriel Krisman Bertazi, Collabora contributor and were taken from the seventh attempt after three years of development and deletion of comments.

The implementation does not make changes to the disk storage format and works exclusively at the level of changing the name comparison logic in the ext4_lookup () function and replacing the hash in the dcache (Directory Name Lookup Cache) structure.

The value of the "+ F" attribute is stored within the inodes of the individual directories and applies to all attached files and subdirectories. The encoding information is stored in the superblock.

For now, negative lookups are not pushed into the dcache, as they would have to be invalidated anyway, because we can't trust missing files.

This is bad for performance, but requires some leveraging of the vfs layer to correct.

We can live without it for now, just like everyone else.

To avoid collisions with the names of the existing files, the "+ F" attribute can only be set on empty directories in file systems, in which Unicode support mode in file and directory names is enabled during the mount phase.

The names of the directory elements for which the "+ F" attribute is activated are automatically translated to lowercase and reflected in this way in dcache, but they are stored on disk in the form originally defined by the user.

New disk hashes are computed as the hash of the entire chain of cases, rather than the chain directly.

That is, despite the name processing regardless of the case, names are displayed and saved without losing information about the case of characters (but the system will not allow you to create a file name with the same characters, but in a different case).

It also allows the VFS code to quickly find the correct entry in the cache even though an equivalent string was used in a previous search


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.