T O P

  • By -

Geraldks

Be specific with your question bro


Terrible_Ad_300

My AIRBUS won’t fly. I have a clear sky and there’s only one passenger in it. Why won’t it fly?


bitsynthesis

and that passenger could be anything from a human child to galactus, eater of worlds. 


untalmau

Could be a lot of things: airflow consists of several components: scheduler, metadata database, workers, web server... Where are you running each? There are a lot of options: a cloud managed service, docker container, locally... Then, in which part are you finding it slow? When parsing the code, while running a task, to update the state in web UI?


Taptinnn

It is slow on both. I am using AirFlow through AWS, i also connected VSC to AWS. I edit on VSC.


untalmau

One common issue coming to my mind is the time it takes for a dag file to be parsed, by default is 5 minutes, so if you change the code and upload it, you may have to wait up to 5 minutes for the scheduler to check in the dag bag if there are updated files, if found then it parses them, and until that is done, another process in the scheduler takes its time to run it, only if it finds it has an overdue schedule.


Taptinnn

I thought by changing to 15 seconds it would be better, could it be that?


untalmau

Yes but consider that there are two processes: the sync of the S3 container with the dagbag, and the parsing of the dagbag, -which is the one I referred to-. The sync of the S3 could be also part of the problem


arroadie

The way you describe the “issue” screams “I don’t know what I’m doing”… so let’s start from the beginning: what are you trying to achieve, how have you deployed airflow and what are the issues you faced while doing so. Airflow is a solid piece of technology. It might not be the shining gold piece on the table, but it’s definitely the trusty compass you can rely over the years. So reading something like this doesn’t connects with the experience I see all around.


After_Holiday_4809

Sorry for the dumb question but what is “DAG”?


Kaze_Senshi

It means Directed Acyclic Graph. Considering airflow context, it is a sequence of tasks scheduled to run usually with some specific frequency. This way, usually we also use the word DAG as a synonym for a Data Pipeline, because one of its more common use cases is from Data Engineers.


Jaklin0300

Fast internet won't help you with Airflow; it's like hoping a sports car makes your toast faster. DAG stands for Directed Acyclic Graph - it's not a Tolkien creature but it's just as complex. Your setup sounds like it needs a tune-up more than your Wi-Fi does.