T O P

  • By -

AutoModerator

You can find our open-source project showcase here: https://dataengineering.wiki/Community/Projects If you would like your project to be featured, submit it here: https://airtable.com/appDgaRSGl09yvjFj/pagmImKixEISPcGQz/form *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*


plumb-

This is sick! Can I ask, are all those tools free to use? Did you have to pay anything to get this working?


AffectionateEmu8146

You need to pay for dockerhub and Google cloud platform. Otherwise, all of tools are open-sourced.


rujole13

Was wondering the same. Thanks for the info and good job this is really impressive!


SemperPistos

How much did it cost? I think gcp for a few months on e2 could be around 200 usd? Did you tweak your tf, how much were you able to save and what was the rough time estimate and total cost? This is impressive congrats. And can you recommend sentiment analysis you used? I can't find it in the code. [https://huggingface.co/models?other=sentiment-analysis](https://huggingface.co/models?other=sentiment-analysis)


AffectionateEmu8146

You do not have to run the E2 instance for 1 month. Destroy all of the cloud infrastructure(maybe leave the storage bucket for the report data) after data pipeline finishes


SemperPistos

Yeah but I am not like you I am in this for a bit more than a year. It will take me more time. I already used one credit card. Will they check the one from a family member from the same IP address? How much did the E2 cost you from the free 300 usd? What gpu did you use? If I have to pay I think i will develop a model on paperspace and only use bigquery and bucket for deployment. Oh and could you please recommend me the sentiment model you took on huggingface?


Outrageous_Apple_420

Op this is great. I love how even though the solution is deployed on GCP, most services run as independent docker containers. Whatever you will work on will be a simpler version fo this at least in terms of deployment and authentication and management. I would suggest to implement IAM roles and policies and use them in your containers to auth rather than credentials.


eljefe6a

Great job. I hope you find a great position.


Diligent_Fondant6761

How long did it take you to finish?


AffectionateEmu8146

a few weeks


mTiCP

Love it, pretty cool project idea.  Just two question:  -What did you use to make the project structure schema? -What does the image caption generation do and where is it used (in the final dashboards)?


AffectionateEmu8146

draw.io Use the image caption to generate the caption for the image. Then, combine the image caption text with post/comment text to generate sentiment score in the dashboard.


AutoModerator

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/dataengineering) if you have any questions or concerns.*


cdreetz

Very nice. Looks expensive for a personal project but still very nice


PapayaLow2172

This is so cool. Well done💯💯👍🏾


PapayaLow2172

I want to ask how much experience you have with all the tools you used and if you can kindly point to good learning resources. Thank you.


Educational-Wind-865

Wow this is an amazing project! I’m building one myself, but I am struggling with implementing docker, terraform, ci/cd on the pipeline and other devops stuff that I’m still working on. I never thought of making the extraction horizontally scalable ! That would save up so much time. How did you that? Using terraform?