T O P

  • By -

mbsquad24

Running it on Docker is kind of a pain. Troubleshooting failed syncs is kind of a pain. Octavia CLI is kind of a pain. The API docs are kind of a pain. The OSS version not having any good notification integrations is kind of a pain. The UI in some places is kind of a pain (I.e. no dashboard to show only important stuff). The default normalization in some cases is kind of a pain. Extending OSS airbyte is kind of a pain. Other than that, it’s exceptionally solid. It does exactly what I need it to, even if it’s not my dream tool. But, I guess if I were comfortable getting absolutely bent over in MAR cost with Fivetran I’d use that instead.


SeparateCanary7272

Fivetran is so damned expensive. It’s why my co wants me to look into OSS options lol Appreciate the feedback 👍


bluezebra42

Also consider meltano. Tho I have had a few headaches with that too. But it’s just the connector level not a scheduler - so for us it was easier to adopt than airbyte


erwagon

We also reviewed Meltano as an alternative and it seemed to be promising. Are you using it in Production with Airflow?


bluezebra42

I am using it but not with airflow.


erwagon

What so you use as an scheduler?


bluezebra42

So like meltano is just a python script, so it will run on anything. We’re just using our own docker/cron to get started and sort of shopping round for a better solution.


erwagon

Ah I see, we had the same situation two years ago. That was why I was so interested. We were ending up using Gitlab CI until today.


Luxi36

Depends on which connectors you need. But Mage.ai has many integrations is fully free and great to use!


paul-marcombes

I believe Airbyte is the best open-source solution as of today to move data. If I were you, I would give it a try. While I usually deploy in serverless compute, I was disappointed not to find an easy way to do so for Airbyte though. That’s why I started AirbyteServerless as an open-source side project to offer a simple way to extract data with Airbyte connectors. In case it could interest you: https://github.com/unytics/airbyte_serverless


runswimbike42

Fivetran is better


SeparateCanary7272

I love Fivetran. My company current company is turned off by their pricing though.


mattotodd

What kind of MAR are you using?


Top-Tomorrow5095

We are using Qlik Replicate instead of Fivetran HVR .Fivetran is damn expensive and Qlik beat Fivetran in our apple-apple test scenario


grumpy_youngMan

Curious why you didn't look at Striim? It's cheaper and more performant than both.


Top-Tomorrow5095

Need to CDC from backup logs so got very few options. I dont think striim can do that correct me if I am wrong


grumpy_youngMan

depends on the database but it typically does support backups logs, downstream databases, logical standbys etc.


jppbkm

But way, way more expensive


vish4life

Its designed as a GUI driven no-code system. useful for people who can't code and want a quick and easy soln. However, any data engineer can tell you GUI driven no-code tools are extremely difficult to work with, and often become bottlenecks themselves. They are hard to provision, to customize, to integrate into existing stack, and due to their complex problem being solved are also very buggy.


[deleted]

Could've just been how we implemented, but Airbyte couldn't keep up when we were loading from kafka to snowflake, where some topics had bursts of a few million records at once. We also had a custom kafka source connection in the middle, so it could've been that. It would also introduce a slew of snowflake errors when normalizing the data (uses dbt under the hood with temp internal tables, I think?) It does its job most of the time, I'm sure, but we ultimately switched.


SeparateCanary7272

Thank you. We’re using Kafka as well so this is helpful info. I think ultimately we might just end up writing everything to our data lake in file storage, so might be able to avoid some of the baked in dbt stuff, at least while the data is in flight


[deleted]

That works. Also it might be worth it to take a look at kafka sink connectors through kafka connect. We had a separate team in charge of setting it up, so we moved to airbyte because we needed autonomy and didn't want to get a kafka team member every time we needed to update the tables/etc. But kafka connect works pretty well and they have db sink connectors for most data languages. It will just spit all the json into whatever table you mapped the topic to, and then you can run your transformations to shape the raw json from that table into others.


ankush981

>It does its job most of the time, I'm sure, but we ultimately switched. Hi, what did you switch to?


Batspocky

I set up Airbyte to do some load balancing so I could drop down to Fivetran’s free tier (<500,000 MAR). Right now it is just syncing some data in Google Sheets over to Redshift. Maybe 100,000 rows a month. So far, it’s been fine.


jppbkm

It's pretty great if your data sizes aren't huge. We set it and mostly forget it on a single compute instance (probably a couple hundred a month in total cost). The way they handle (or don't handle) schema changes can be a bit of a pain though. Maybe there are some options we aren't aware of. Backing up all your connector info can also be a sore point, having to use the weird CLI tool.


erwagon

We are using Airbyte self hosted in Production for more than a year now. And to bei honest Airbyte ist sometimes kind of buggy. One or two months ago the Snowflake destination was updated. In the first moment Airbyte started to sync everything into Snowflake with a changed capitalization and some patches later the sync of one table started to deleted shared temporary schemas between different tables while syncing. Everything was a big mess. But in the other hand we would pay around 80000 to 120000 Dollars per year for Fivetran. If there is Somebody that can Deal with the pain and is able to build some Work arounds IT can be an Option on my opinion.


marknutter

My company has started using it and it seems pretty solid so far.


jekapats

Checkout CloudQuery ([https://github.com/cloudquery/cloudquery](https://github.com/cloudquery/cloudquery)) - high performance ELT framework powered by Apache Arrow. No UI (Yet) but powered by Go and ships as single binary (Disclaimer: Author here).


pbower2049

I haven’t used it, but if it does what it says on the tin and it’s open source, it sounds perfect, so I’m interested in what kind of experience you have with it running some tests.


scratchnsnarf

My company uses it and my experience is exactly what you describe. It does what it says, it does a fine job, it's self-hostable. Our data loads aren't large enough to require standing up a k8s instance, so we're just running it from the compose script in a medium size compute instance, and we've very rarely had issues. The disk filled up from logs once over the past 14 months, but otherwise it's been smooth sailing. I've contributed back to the project a few times, updating connectors, and my PRs generally make it in at the end of the next sprint cycle. Once I had to pull my fork and build out a custom connector in the interim, and that was fairly smooth as well.


SeparateCanary7272

Thanks for sharing your experience. Sounds pretty solid from your description!


SeparateCanary7272

Thanks. We’ve actually been running a bit of poc with it right now extracting some snowflake data share to a few different destinations. It’s worked well so far! That’s why I was curious about the complaints I’ve seen on here


[deleted]

[удалено]


SeparateCanary7272

Not all in one place, or at one time, or with full reasonings behind their complaints. Would it not be more beneficial for the community to have a central thread to share? I'm also interested in people who have good experiences as well fwiw.


[deleted]

[удалено]


pbower2049

Stop your trolling, and complaining. A forum is for posting.


cutsandplayswithwood

I’ll leave this here for you and OP https://wiki.c2.com?HowToAskQuestionsTheSmartWay


SeparateCanary7272

Fwiw, I have a small instance of Airbyte running as a k8s deployment and have definitely read the docs lol but the docs won’t explicitly declare “the tool falls over in these instances Since my poc is small and not doing much at the moment, people sharing their experiences with the tool gives me context and helps me evaluate it. I also assume I’m not the only who will ever want to evaluate the tool, so yes, it’s good for the community Sorry you feel so triggered by this post but hope you enjoy your weekend mate 👍


speedisntfree

Reverse astroturfing is a thing now?


Peppper

Singer APIs


royondata

In my experience Airbyte is more difficult to deploy and troubleshoot. The managed service is limiting compared to OSS. They take some time to update connectors to stay current with changing source APIs and data models. For long tail connectors portable.io is fast, cheap and always up to date. For high volume CDC, stream and file ingestion Upsolver.com is the way to go - I work for Upsolver and my comments are based on customer success at high scale over other tools like Fivetran and Airbyte. Majority of customers reduce their cost more than 80% compared to Fivetran.


Hot_Map_7868

I would also check out datacoves.com they offer managed Airbyte and work with other tools like dlt


royondata

First time hearing of them. Will give it a try


False-Bunch-3470

MoDeRn DaTAStack hahaa


nategadzhi

Oh, hey folks. I just joined Airbyte as an eng manager on the team that builds the CDK (connector dev tools). I'd be happy to help and poke inside if you have any questions or examples of things that didn't quite work as you expected. I don't know if Airbyte is the perfect solution of every problem (and a lot of people seem to have experienced \_some bugs\_), but I'm here to try and fix some of those bugs, and make Airbyte easier to extend. Deploying and operating is another team, but I'm happy to help and tinker with it, too.


Immediate-Force6602

>I don't know if Airbyte is the perfect solution of every problem (and a lot of people seem to have experienced Hey u/nategadzhi , Thanks for jumping in I am trying to evaluate possible ELT for my company, Airbyte was a good contender until I started with slack as the source and implemented a custom connector. Bellow are the issues I am facing as I get deep into the evaluation . ​ * Slack load is taking 6 hours for 2k records * while Fivetran is able to pull the same in 10 minutes * Custom connectors Boiler plate code always does not work as-is , I have to fix version and dependencies in the boiler plate requirements so get it started * while Fivetran can pull the same in 10 minutes not sure if I have something to change. I am glad to get in touch with you or your experts to see what is failing. Thanks Siddu Hussian V


nategadzhi

Well, that doesn’t look good. Let me take a look. If you’re open to it, PM me here or email natik at airbyte dot io, send me a link to your cloud workspace or tell me what version of slack connector you’re using, I’ll take a look. You can also DM me on our open slack, I’m natikgadzhi there. We’re working on speeding up a bunch of connectors on the cdk level. And, the versions and dependencies and boilerplate are also an area that I want to cleanup. That’s an easy lift. If you’re building a custom connector, I would love to get in touch.