fix: add pandas to filesystem source dependencies#3887
fix: add pandas to filesystem source dependencies#3887Bhagwat-45 wants to merge 1 commit intodlt-hub:develfrom
Conversation
zilto
left a comment
There was a problem hiding this comment.
Thanks for the contribution. pandas should be an optional dependency for the filesystem source because not all users of filesystem are using read_csv().
Your approach to import pandas inside _read_csv() inside a try/except block is correct.
Though, we have a convention for this. Instead, you can simply do from dlt.common.libs.pandas import pandas. This file at dlt/common/libs/pandas.py already includes the try/except block
| @@ -1,5 +1,5 @@ | |||
| from typing import TYPE_CHECKING, Any, Iterable, Iterator, Optional | |||
|
|
|||
| from dlt.common.exceptions import MissingDependencyException | |||
| try: | ||
| import pandas as pd | ||
| except ImportError: | ||
| raise MissingDependencyException("filesystem reader (read_csv)", ["pandas"]) |
There was a problem hiding this comment.
remove try/except. Instead do from dlt.common.libs import pandas
| filesystem = [ | ||
| "s3fs>=2022.4.0", | ||
| "botocore>=1.28", | ||
| "pandas>=1.3.0", |
There was a problem hiding this comment.
remove dependency and revert changes to uv.lock accordingly.
- Add pandas>=1.3.0 to the filesystem extra in pyproject.toml so it is included when running pip install -r requirements.txt after dlt init filesystem - Add MissingDependencyException in _read_csv for a clear, actionable error message if pandas is not installed - Update uv.lock to reflect new pandas dependency Closes dlt-hub#3876
57ec222 to
b6f4b24
Compare
|
Hi @zilto, I've addressed all the requested changes: Replaced the try/except block with from dlt.common.libs import pandas as pd Please let me know if anything else needs to be changed! |
Description
Running
dlt init filesystem <destination>followed bypip install -r requirements.txtdoes not install pandas. However, the generated pipeline uses
read_csv()which requirespandas, causing a confusing
ModuleNotFoundErrorat runtime.Changes:
pandas>=1.3.0to thefilesystemextra inpyproject.tomlso it is includedwhen running
pip install -r requirements.txtafterdlt init filesystemMissingDependencyExceptionin_read_csvfor a clear, actionable error messageif pandas is not installed
uv.lockto reflect new pandas dependencyRelated Issues
Additional Context
The fix follows the same
MissingDependencyExceptionpattern already used indlt/sources/sql_database/helpers.pyfor optional dependencies likeconnectorx.