HIVE-29241: Distinguish the default location of the database by catalog#6267
HIVE-29241: Distinguish the default location of the database by catalog#6267zhangbutao wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces catalog-aware warehouse paths for Hive metastore, allowing databases and tables under non-default catalogs to have distinct storage locations. The changes ensure that catalog locations are properly isolated while maintaining backward compatibility for the default Hive catalog.
Changes:
- Added new configuration parameters (
WAREHOUSE_CATALOGandWAREHOUSE_CATALOG_EXTERNAL) for catalog-specific warehouse directories - Modified catalog creation to require a 'type' parameter and made location optional with automatic defaults
- Updated database path resolution throughout the codebase to use catalog-aware logic via new
Warehousemethods that accept catalog names
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| MetastoreConf.java | Added WAREHOUSE_CATALOG and WAREHOUSE_CATALOG_EXTERNAL configuration variables |
| HiveConf.java | Added corresponding Hive configuration variables for catalog warehouses |
| Warehouse.java | Enhanced path resolution methods to accept catalog names; added deprecated markers for old methods |
| CatalogUtil.java | New utility class for catalog type validation (NATIVE, ICEBERG) |
| CreateCatalogAnalyzer.java | Updated to validate catalog type property and make location optional |
| CreateCatalogOperation.java | Modified to generate default location from WAREHOUSE_CATALOG config |
| CreateDatabaseAnalyzer.java | Added validation to check if catalog supports database creation |
| CreateDatabaseOperation.java | Updated path building logic to include catalog names for non-default catalogs |
| DDLUtils.java | Added helper method to check catalog support for database creation |
| HiveParser.g | Modified grammar to make catalog location optional and properties required |
| EximUtil.java | Updated to use catalog-aware warehouse paths for replication |
| Multiple test files | Updated test queries to include required 'type' property in CREATE CATALOG statements |
| HMSHandler.java | Updated default catalog and database creation to use catalog-aware paths |
| MetastoreDefaultTransformer.java | Updated to pass Database object instead of just name |
| Migration/Authorization files | Updated to use new Database-based path methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...one-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
Outdated
Show resolved
Hide resolved
...one-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
Show resolved
Hide resolved
...one-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/ddl/catalog/create/CreateCatalogAnalyzer.java
Outdated
Show resolved
Hide resolved
...one-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
Show resolved
Hide resolved
...e-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/CatalogUtil.java
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/queryhistory/repository/AbstractRepository.java
Show resolved
Hide resolved
...one-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
Show resolved
Hide resolved
5940a60 to
49c343a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ql/src/java/org/apache/hadoop/hive/ql/ddl/catalog/create/CreateCatalogAnalyzer.java
Show resolved
Hide resolved
...tore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Outdated
Show resolved
Hide resolved
ql/src/java/org/apache/hadoop/hive/ql/ddl/database/create/CreateDatabaseAnalyzer.java
Show resolved
Hide resolved
...tore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Show resolved
Hide resolved
This could make some confusing. For example, we declare a S3 catalog but the location is absent, then this catalog is created with a hdfs location, some databases without explicit location will be put under the hdfs, while some are on the cloud, it might be a problem if we use the FileSystem created by the catalog's properties to operate against these tables or databases. In HMS, we can enforce the location check when creating the catalog, if it's null and the catalog is native, then we should throw the exception. |
I understand what you mean. I think what we're discussing is more about usage habits. Before the catalog capability was available, Hive used databases as a way to isolate data, and when users created a Hive database, they could omit the location, in which case the database would use the default location. If users wanted to set a value different from the default catalog location, they could specify a specific location, such as an S3 location, when creating the catalog. This syntax of omiting the location is more flexible, and I think users might be more accustomed to the usage of having a default catalog location. I would also like to hear other folks' thoughts on the default catalog location. :) |
|
i'm ok with default catalog location for native catalogs if not explicitly provided |
|
hi @zhangbutao, is the PR ready for 2nd round of review and includes the changes we discussed here? |
e29caa4 to
e299310
Compare
Changes have been made to keep the type optional when creating a catalog. Thanks for the advice. |
Let's go :) |
| // Create a fake fs root for local fs | ||
| Path localFsRoot = new Path(path, "localfs"); | ||
| warehousePath = new Path(localFsRoot, "warehouse"); | ||
| warehouseCatPath = new Path(localFsRoot, "catalog"); |
There was a problem hiding this comment.
shouldn't the catalog be under the warehousePath?
Warehouse Root
└── Catalog (default or named)
└── Database (.db)
└── Table
/warehouse/<catalog_name>/<database_name>.db/<table_name>/ # non default catalog
/warehouse/hive/<database_name>.db/<table_name>/ # default catalog
/warehouse/hive/default.db/<table_name>/ # default catalog & database
|
|
||
| // Initialize props if null, and set default type to native if not specified | ||
| if (props == null) { | ||
| props = new HashMap<>(); |
There was a problem hiding this comment.
could we do this in the var initialization?
| } | ||
|
|
||
| private static void checkCatalogType(Map<String, String> props) throws SemanticException { | ||
| String catalogType = props.get("type"); |
There was a problem hiding this comment.
could we please create constant for that
| } | ||
| // Normalize and validate catalog type (fail fast on invalid input) | ||
| try { | ||
| props.put("type", CatalogUtil.normalizeCatalogType(catalogType)); |
There was a problem hiding this comment.
Could we simplify by handling isNullOrEmpty in normalizeCatalogType
| @Override | ||
| public int execute() throws Exception { | ||
| Catalog catalog = new Catalog(desc.getName(), desc.getLocationUri()); | ||
| String catLocationUri = Optional.ofNullable(desc.getLocationUri()) |
There was a problem hiding this comment.
minor. i think cat prefix in local var is redundant here since we are inside CreateCatalog class
| @@ -51,10 +52,9 @@ public CreateDatabaseAnalyzer(QueryState queryState) throws SemanticException { | |||
| @Override | |||
| public void analyzeInternal(ASTNode root) throws SemanticException { | |||
| Pair<String, String> catDbNamePair = DDLUtils.getCatDbNamePair((ASTNode) root.getChild(0)); | |||
There was a problem hiding this comment.
How about record CatalogDb(String catalog, String database) {}
please see #6379 (comment)
| } | ||
|
|
||
| private void makeLocationQualified(Database database) throws HiveException { | ||
| String catalogName = database.getCatalogName().toLowerCase(); |
There was a problem hiding this comment.
Shouldn't the names be normalized here: special chars/encoding lowercase?
| private void makeLocationQualified(Database database) throws HiveException { | ||
| String catalogName = database.getCatalogName().toLowerCase(); | ||
| String dbName = database.getName().toLowerCase(); | ||
| boolean isDefaultCatalog = Warehouse.DEFAULT_CATALOG_NAME.equalsIgnoreCase(catalogName); |
|
|
||
| // managed warehouse root | ||
| String whManagedLocatoion = MetastoreConf.getVar(conf, | ||
| isDefaultCatalog ? MetastoreConf.ConfVars.WAREHOUSE |
| if (db == null) { | ||
| LOG.warn("Database ({}) for query history table hasn't been found, auto-creating one", QUERY_HISTORY_DB_NAME); | ||
| String location = getDatabaseLocation(QUERY_HISTORY_DB_NAME); | ||
| db = new Database(); |
There was a problem hiding this comment.
looks a bit weird creating 2 Database objects in same method
| Path root = null; | ||
| try { | ||
| initWh(); | ||
| // TODO catalog. Need to determine auth root path based on catalog name. |
There was a problem hiding this comment.
could you please add the ticket to the EPIC
cc @saihemanth-cloudera, as he is the most experienced in this area
| set hive.support.concurrency=true; | ||
|
|
||
| CREATE CATALOG testcat LOCATION '/tmp/testcat' COMMENT 'Hive test catalog'; | ||
| CREATE CATALOG testcat LOCATION '/tmp/testcat' COMMENT 'Hive test catalog' PROPERTIES('type'='native'); |
| -- CREATE with comment | ||
| CREATE CATALOG test_cat LOCATION 'hdfs:///tmp/test_cat' COMMENT 'Hive test catalog'; | ||
| -- CREATE with comment and default location | ||
| CREATE CATALOG test_cat COMMENT 'Hive test catalog' PROPERTIES('type'='NATIVE'); |
There was a problem hiding this comment.
same 'type'='NATIVE' is redundant
|
|
||
| -- CREATE INE already exists | ||
| CREATE CATALOG IF NOT EXISTS test_cat LOCATION 'hdfs:///tmp/test_cat'; | ||
| CREATE CATALOG IF NOT EXISTS test_cat LOCATION 'hdfs:///tmp/test_cat' PROPERTIES('type'='native'); |
There was a problem hiding this comment.
do we have tests for external catalog?
|
|
||
| public class CatalogUtil { | ||
| // Catalog type constants | ||
| public static final String NATIVE = "native"; |
There was a problem hiding this comment.
no need for Strings, please use enum values.
| FileSystem cmFs = cmroot.getFileSystem(conf); | ||
| cmFs.setPermission(cmroot, new FsPermission("770")); | ||
| try { | ||
| // TODO catalog. The Repl function is currently only available for the default catalog path ConfVars.WAREHOUSE. |
There was a problem hiding this comment.
should we add the ticket to the EPIC?
| } | ||
|
|
||
| whRootString = getRequiredVar(conf, ConfVars.WAREHOUSE); | ||
| whCatRootString = getRequiredVar(conf, ConfVars.WAREHOUSE_CATALOG); |
There was a problem hiding this comment.
please see the comment on default FS structure
| return getWhRoot(); | ||
| } | ||
| return new Path(getWhRoot(), dbName.toLowerCase() + DATABASE_WAREHOUSE_SUFFIX); | ||
| Database db = new Database(); |
There was a problem hiding this comment.
are you using Database only as a DTO object? instead you could use a record CatalogDb as mentioned earlier
| if (configUri != null && !configUri.isEmpty()) { | ||
| properties.put("uri", configUri); | ||
| } | ||
| // TODO catalog. Consider adding new created native catalog warehouse(MetastoreConf.ConfVars.WAREHOUSE_CATALOG) later. |
There was a problem hiding this comment.
could you please elaborate, i don't quite get what you are suggesting here
|



What changes were proposed in this pull request?
The main objectives of this PR are:
Allow location to be omitted when creating a catalog. At the same time, it is necessary to determine catalogs that do not require a location, such as JDBC catalogs.
For newly created catalogs, the location of their databases will be determined by the two newly added parameters:
metastore.warehouse.catalog.dirandmetastore.warehouse.catalog.external.dir. This helps better ensure that the location of the default Hive catalog's databases and tables remains unaffected.For example, if metastore.warehouse.catalog.dir is hdfs://ns1/testdir, then the location for a newly created catalog named testcat would be hdfs://ns1/testdir/testcat. Consequently, the default path for a database like testdb created under this catalog would be hdfs://ns1/testdir/testcat/testdb.
typeparameter must be specified when creating a catalog to distinguish its type. Based on this type, it will be determined whether the catalog's databases and tables require a location, or whether the catalog type supports creating databases and tables.CREATE CATALOG test_cat COMMENT 'Hive test catalog' PROPERTIES('type'='NATIVE');Why are the changes needed?
The paths of databases and tables created under a non-default catalog may interfere with those under the default Hive catalog. We need to introduce parameters to distinguish the paths for databases and tables under non-default catalogs, ensuring that the paths of the default Hive catalog remain unaffected.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests.