[SPARK-55132][INFRA] Upgrade numpy version on lint image

gaogaotiantian · zhengruifeng · commit 94cfa3d0e431 · 2026-01-23T21:58:31.000+08:00
### What changes were proposed in this pull request? Upgrade numpy version on lint image and fixed some minor lint failures. ### Why are the changes needed? When we do `pip install ./dev/requirements.txt` locally, we normally have the latest version of `numpy`. This creates a diff between our local dev environment and CI. We should keep this as close as possible so we can rely on local mypy results. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally mypy test passed. ### Was this patch authored or co-authored using generative AI tooling? No Closes #53913 from gaogaotiantian/upgrade-lint-numpy. Authored-by: Tian Gao <gaogaotiantian@hotmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
diff --git a/dev/spark-test-image/lint/Dockerfile b/dev/spark-test-image/lint/Dockerfile
@@ -91,7 +91,7 @@ RUN python3.11 -m pip install \
     'jinja2' \
     'matplotlib' \
     'mypy==1.8.0' \
-    'numpy==2.0.2' \
+    'numpy==2.4.1' \
     'numpydoc' \
     'pandas' \
     'pandas-stubs' \
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
@@ -11293,7 +11293,7 @@ def _bool_column_labels(self, column_labels: List[Label]) -> List[Label]:
         """
         # Rely on dtype rather than spark type because columns that consist of bools and
         # Nones should be excluded if bool_only is True
-        return [label for label in column_labels if is_bool_dtype(self._psser_for(label))]  # type: ignore[arg-type]
+        return [label for label in column_labels if is_bool_dtype(self._psser_for(label))]
 
     def _result_aggregated(
         self, column_labels: List[Label], scols: Sequence[PySparkColumn]
diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
@@ -1205,10 +1205,10 @@ def map(
                 else:
                     current = current.when(self.spark.column == F.lit(to_replace), value)
 
-            if hasattr(arg, "__missing__"):
-                tmp_val = arg[np._NoValue]  # type: ignore[attr-defined]
+            if isinstance(arg, dict) and hasattr(arg, "__missing__"):
+                tmp_val = arg[np._NoValue]
                 # Remove in case it's set in defaultdict.
-                del arg[np._NoValue]  # type: ignore[attr-defined]
+                del arg[np._NoValue]
                 current = current.otherwise(F.lit(tmp_val))
             else:
                 current = current.otherwise(F.lit(None).cast(self.spark.data_type))