Faster way to load parquet file in s3

TIL that, the pandas read_parquet function allows read all parquet file in a folder. In this way, there is not need to load each file and concatenate later.

Also, it seems the parameters of engine and use_thread and speed things up. However, I didn’t find it change much.

pd.read_parquet(folder_path,engine=pyrarrow, use_threads=True)  

aws s3 pandas


Back to top

Copyright © 2019 - 2024 Johnny Li. All contents licensed under CC BY-NC-SA 4.0 本站所有内容基于 CC BY-NC-SA 4.0 协议发布,转载需要署名.
Please read the LICENSE file for specific language governing permissions and limitations under the License.

Page last modified: Apr 16 2024 at 09:36 PM.

Edit this page on GitHub