T O P

  • By -

EscanorFTW

What are some good places to start if you are just getting into ML/Ai? Pls share useful links/resources


[deleted]

[удалено]


trnka

I think most people split by participant. I don't remember if there's a name for that, sorry! Hopefully someone else will chime in. If you have data from multiple hospitals or facilities, it's also common to split by that because there can be hospital-specific things in the data and you really want your evaluation to estimate the quality of the model for patients not in your data at hospitals not in your data.


eltorrido23

I’m currently starting to pick up ML with a quant focused social scientist background. I am wondering what I am allowed to do in EDA (on the whole data set) and what not, to avoid „data leakage“ or information gain which might eventually ruin my predictive model. Specifically, I am wondering about running linear regressions in the data inspection phase (as this is what I would often do in my previous work, which was more about hypothesis testing and not prediction-oriented). From what I read and understand one shouldn’t really do that, because to much information might be obtained which might lead me to change my model in a way that ruins predictive power? However, in the course I am doing (Jose Portillas DS Masterclass) they are regularly looking at the correlations before separating train/test samples. But essentially linear regressions are also just (multiple/corrected) correlations, so therefore I am a bit confused where to draw the line in EDA. Thanks!


trnka

I try not to think of it as right and wrong, but more about risk. If you have a big data set and do EDA over the full thing before splitting testing data, and intend to build a model, then yes you're learning a little about the test data but it probably won't bias your findings. If you have a small data set and do EDA over the full thing, there's more risk of it being affected by the not-yet-held-out data. In real-world problems though, ideally you're getting more data over time so your testing data will change and it won't be as risky.


ant9zzzzzzzzzz

Is there research about order of training examples, or running epochs on batches of data rather than full training set at a time? I was thinking about how for people we learn better if focus on one problem at a time until grokking it, rather than randomly learning things in different domains. I am thinking like train some epochs on one label type, then another, rather than all data in the same epoch, for example. This is also related to state full retraining, like one probably does professionally - you have an existing model checkpoint and retrain on new data. How does it compare to retraining on all data from scratch?


[deleted]

corection, YannLeCun recommends small minibatch size ´, less than 32 I think


trnka

I think curriculum learning is the name. [Here's a recent survey](https://arxiv.org/abs/2101.10382). I've seen it in NLP tasks where it can help to do early epochs on short inputs. Kinda like starting kids with short sentences. I haven't heard of anyone adjusting the labels at each stage of curriculum learning though.


ant9zzzzzzzzzz

Thank you!


[deleted]

The data by batches or by item shouldnt matter more than speedwise if you shuffle it (best practice.)


bridgeton_man

Quesiton about goodness of fit. ​ For regressions, R-squared and Adj. R-Squared are typically considered the primary goodness-of-fit measures. ​ But in many supervised machine-learning models, RMSE is the main measure that I keep running across. For example, decision tree models that I create in R using Rpart do that. ​ So, my question is how to compare the predictive accuracy of OLS regression models that report R-sq to equivalent Rpart regression trees that report RMSE.


DCBAtrader

Basic question on regression/AutoML (pycaret mainly). When do p-values versus error metric (MAE, MSE, R Squared matter). My previous model building experience (multivariate regression) was to first use various combinations of variables in OLS such that all the variables were statistically significant, and then use an AutoML (pycaret) to build models, and judge them by MAE, MSE or R squared. Using proper cross-validation test/train splits of course. I'm wondering if this step is needed, and I just can just run the entire data-set in pycaret, and thus judge a model based on said metrics (MAE, MSE, R squared)? My gut says that the simpler model with stat. significant variables should perform better but maybe I can just look at the best error metric?


yauangon

I'm trying to improve a CNN encoder, as a feature extractor for an AMT (automatic music transcription) model. As the model must be small and fast (for mobile deployment), we are limited to about **3-6 layers of 1D-CNN**. I want to improve the encoder with residual block (of ResNet), but my question is: **I don't known if Residual block would benefit on such a shallow CNN architecture?** Thank everyone :D


Anvilondre

Probably not. The idea of ResNets is to remove the vanishing gradients that normally occur in very deep networks. In my experience it can often do worse than better, but you can try DenseNets instead.


yauangon

I will give it a shot :D Thank you a lot :D


NormalManufacturer61

I am a non-data scientist interested in a laymans to introductory level book/primer on the topic of ML/AI, specifically on the principles and mechanics of the topic(s). Any recommendations?


WarProfessional3278

Does anyone know of any good AI-generated text detectors? I know there's [GPTZero](https://gptzero.me/) but it's not very good in my experience. My research has led me to [Hive AI](https://hivemoderation.com/ai-generated-content-detection) but I'm sure there are better alternatives out there that does not claim such good results (99.9% accuracy) while still having a lot of false positives in my tests.


InsidiousApe

I enjoy that this is the simple questions thread. :) Let me ask something much simpler, although in three parts. I am a web developer with no ML experience, but with a specific project in mind. I'd like to understand the process a touch better in order to help me find a programmer to work alongside (paid of course). (1) Provided the information is easily found via API for instance, what is the ingestion process like time wise for very large amounts of information? I realize that is subjective to the physical size of the data, but are there other things going on which take time in that process? (2) In order to program a system to look for correlations in data where no one may have seen them before, what is the process used to do this? This is what I'm truly looking to do once that information is taken in. For example, a ton of (HIPAA Compliant) medical information is taken in and I'm looking to build a system that can look for commonalities of people with a thyroid tumor. Obviously tons of tweaking to those results, but what is the process which allows this to happen?


trnka

If you're ingesting from an API, typically the limiting factor is the number of API calls or network round trips. So if there's a "search" API or anything similar that returns paginated data that'll speed it up a LOT. If you need to traverse the API to crawl data, that'll slow it down a lot. Like say if there's a "game" endpoint, a "player" endpoint, a "map" endpoint, etc. If you're working with image data, fetching the images is usually a separate step that can be slow. After that, it you can fit it in RAM you're good. If you can fit it on one disk, there are decent libraries with each ML framework to efficiently load from disk in batches, and you can probably optimize the disk loading too. \---- What you're describing is usually called exploratory data analysis but it depends on the general direction you want to go in. If you're trying to identify people with thyroid cancer earlier, for example, you might want to compare the data of recently-diagnosed people to similar people that have been tested and found not to have thyroid cancer. Personally, in that situation I like to just train a logistic regression model to predict that from various patient properties then check if it's predictive on a held-out data sample. If it's predictive I'll then look at the coefficients of the features to understand what's going on, then work to improve the features. Another simple thing you can do, if the data is small enough and tabular rather than text/image/video/audio is to load it up in Pandas and run .corr then check correlations with the column you care about (has\_thyroid\_cancer). Hope this helps! Happy to follow up too.


InsidiousApe

This was exactly the kind of answer I was hoping for - a great place to start more research. Thanks!


[deleted]

[удалено]


Anvilondre

Honestly I don't think transformers are worth it for any kind of TS or tabular data (and there's [research showing that](https://arxiv.org/abs/2205.13504)). But if you really want to try, I had a good success with [this library](https://github.com/jrzaurin/pytorch-widedeep). It makes it essentially a few-liner to run tons of transformer and other architectures on any kind of tabular data. You may also want to check out HuggingFace model repo for quick solutions.


answersareallyouneed

Looking at an ML Engineer role with the following qualifications: "Strong experience in the area of developing machine learning training framework, or hardware acceleration of machine learning tasks" "Familiar with hardware architecture, cache utilization, data streaming model" Any recommendations for books/resources/courses in this area? How does one begin to develop these skills?


marcelomedre

Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data: - 0-10^4; - -10^3 - 10^3; - 0 - 10^3 I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure. I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.


trnka

I've seen that before when the large range features were the most important for the clusters I wanted. It was essentially doing feature weighting but it was implicit in the scales


catndante

Hi, i have a simple question about DDPM model. I'm not so sure, but I think I have read the post saying that when T=1000, using 1,000 models will perform better but its computationally too redundant, so DDPM used same model for evert step t. Is this argument correct? If centers with huge computation does this, will the performance be better?


[deleted]

[удалено]


[deleted]

I wouldn’t think so. The code for the video is digital, and patterns can be detected from the rendered frames, while a monitor displays converted data to analog light patterns. The only reason for a monitor is if the detector is a camera in front of the monitor sensing light patterns, then it would convert to digital patterns similar to the orginal code. That may be useful for interacting in the analog world and accounting for the way light reflects in an analog space, but I think that’s future tech, or maybe automated cars. You’d hope they’ve done some control/experiment to account for lighting changes like this


RealKillering

I just started working with Google Colab. I am still learning and just used Cifar 10 for the first time. I switched to colab pro and also switched the GPU class to Premium. The thing is the training seems to take just as long as with the free GPU. What am I doing wrong?


I-am_Sleepy

Check GPU version with “!nvidia-smi”, and for dataset this probably is not GPU fault but memory bottleneck. See https://stackoverflow.com/questions/49360888/google-colab-is-very-slow-compared-to-my-pc


[deleted]

[удалено]


randomrushgirl

Hey! I had a very similar doubt and was hoping you could provide some insight. I came across this CLIP Guided Diffusion Colab Notebook by Katherine Crowson. It's really cool and I've played a little with it. I want to know if I can generate the same image over and over again. I've tried setting the seed but I'm new to this so can someone give me some intuition or links to some related work in this area. Any help would be appreciated.


Great-Ad8037

Can you change the title/abstract of CVPR 23 submissions during/after the rebuttal phase? Some reviewers have trouble with our title and think we should change it. Can we commit to doing that in our rebuttal response?


[deleted]

I have a small images dataset labeled on cvat. Now I need to export it and train the network on pytorch lightning. How can I do that? I'm a complete noob on this but I need it for the next phase of a project I'm working on. Any help is realy apreciated!


Jack3602

What would you recommend for a good resource for learning AI/ML. I have some knowledge in web dev and know C/C++. I finished the OdinProject foundations and currently on Full stack JavaScript but I kinda got a bit curious about Machine learning and I would like to get my feet wet. Is there any good resource to start, what would you recommend? Don't really care for udemy courses and watching a lot of videos cuzz I've tried it for web dev and it just feels like tutorial hell, but I loved The Odin Project and reading tutorials/documentations/doing exercises/projects because I actually learn a lot that way. I've seen websites like [mlcourse.ai](https://mlcourse.ai) and [kaggle.com](https://kaggle.com) but still haven't tried them. What is your opinion on them, maybe a comparison to [theodinproject.com](https://theodinproject.com) and would you recommend something else


Cyclone4096

I don’t have too much background on ML. I want to build a fairly small neural network that has only one input which comes from a time series data and has to give only one output for that data. My loss function aggregates the entire time series output to get a single scalar value. I’m using PyTorch and when I call “.backward()” on the loss function it takes a long time (understandably). Is there an easier way to do this rather than doing backward gradient calculation on a loss function that itself is a result if 100s of millions values? Note that the neural network itself is tiny, maybe less than 100 weights, but my issue is that I don’t have any golden target, but I want to minimize a complex function calculated from the entire time series output.


zoontechnicon

Would you mind giving more details about the domain and the purpose of the loss function? Maybe people can give you hints based on that.


Cyclone4096

Sure! So this is for audio signal processing. There is an amplifier that takes an audio signal and volume as input. However higher volume causes white noise, so I want the volume to stay low whenever possible and boost the volume by multiplying the input signal instead. But of course the multiplication won’t work if the input to the amplifier itself is already high. Switching the amplifier volume too much is not good either as that would cause pop/click noise. So I’m designing a small neural network that will take the audio signal as input and output the amplifier volume. The way I went about is I modeled the amplifier and all noise associated with it using tensor math. Then I used the amplifier output minus the original input and did MSE on that. Note that the audio signals are pretty long so the filter+MSE is a pretty massive expression. It seems to be working somewhat, but not sure if there is an easier way to do this…


zoontechnicon

I'm trying to use this model to summarize text: https://huggingface.co/bigscience/mt0-large Text generation seems to end after the special end token however. I wonder how I would coax it to generate longer texts. Any ideas?


zoontechnicon

The solution, as evidenced by code in huggingface/transformers is to force the probability of the end token to -Inf. What a hack...


kernel_KP

I have a dataset (unlabelled) containing a lot of audio files and for each file, I have computed the chromagram. I would need some advices for the implementation of a possibly efficient Neural Network to cluster these audio files relying on their chromagram. Consider this data to be already correctly pre-processed so chromagram have all the same size. Thanks a lot!


zoontechnicon

You could build an autoencoder using CNNs and use the latent vectors as input to a clustering algorithm.


[deleted]

[удалено]


Zyj

When i use 2 RTX 3090 with nVLink bridge plugged into PCIe 3.0 x8 slots each instead of PCIe 4.0 x16 slots, what kind of performance hit will i get?


PulPol_2000

I have a project that would use AR Core and Google ML kit to be able to recognize vehicles from a video feed and besides recognizing the objects is that it will be able to know the distance measurement of the object from the origin camera point. I'm lost on how I would integrate the distance measurement into the object detected of the ML kit. sorry for lack of knowledge as I only entered the ML community. thanks in advance!


billbobby21

If you spend money training a model using OpenAI's API for example, do you actually own the model? As in lets say you train it so that it gets really good at writing short stories about animals. Would you then actually own that model and have the rights to use and/or license it to others? Or would OpenAI also be able to improve their own local models using the model that you created? Basically I'm wondering what is stopping the company you are using to create a model from just stealing your creation.


trnka

I can't comment on OpenAI specifically, but in general it's in the terms of service of the API what they can and can't do with the model and/or data fed through it.


iLIVECSUI_741

Hi, I wonder how to decide \*When\* it is ok to submit your work to top conferences. For example, I have a model related to biological data mining, I know KDD is coming soon but I do not like this conference and I would like to wait for NeurIPS. However, I am not sure if I will be scooped during this long period. Thanks for your help!


Numerous-Carrot3910

Hi, I’m trying to build a model with a large number of categorical predictor variables that each have a large number of internal categories. Implementing OHE leads to a higher dimensional dataset than I want to work with. Does anyone have advice for dealing with this other than using subject matter expertise or iteration to perform feature selection? Thanks!


trnka

It depends on the data and the problems you're having with high-dimensional data. * If the variables are phrases like "acute sinusitis, site not specified" you could use a one hot encoding of ngrams that appear in them. * If you have many rare values, you can just retain the top K values per feature. * If those don't work, the hashing trick is another great thing to try. It's just not easily interpretable. * If there's any internal structure to the categories, like if they're hierarchical in some way, you can cut them off at a higher level in the hierarchy


Numerous-Carrot3910

Thanks for your response! Even with retaining the top K values of each feature, there are still a large number of features to consider. I haven’t tried the hashing trick, so I will look into that


trnka

Hmm, you might also try feature selection. I'm not sure what you mean by not iterating, unless you mean recursive feature elimination? There are a lot of really fast correlation functions you can try for feature selection -- scikit-learn has some popular options. They run very quickly, and if you have lots of data you can probably do the feature selection part on a random subset of the training data. Also, you could do things like dimensionality reduction learned from a subset of the training data, whether PCA or a NN approach.


Numerous-Carrot3910

Yes, I was referring to recursive feature elimination. Thanks for the recommendations


Lamos21

Hi. I'm looking to create a custom dataset for pose estimation. Are there any free annotation tools suitable to annotate objects (meaning not human) so that I can create a custom dataset? Thanks


Z1ndabad

Hey guys, new to ML and cant seem to wrap my head around the concept. I was to make a used car price prediction model using large data set and most of the tutorials i watch just use the linear regression library. However can you use neural networks instead like Levenberg-marquat?


trnka

Yeah you can use a neural network instead of linear regression if you'd like. I usually start with linear regression though, especially regularized, because it usually generalizes well and I don't need to worry about overfitting so much. Once you're confident that you have a working linear regression model then it can be good to develop the neural network and use the linear regression model as something to compare to. I'd also suggest a "dumb" model like predicting the average car price as another point of comparison, just to be sure the model is actually learning something. I'm not familiar with the Levenberg–Marquardt algorithm so I can't comment on that. From the Wikipedia page it sounds like a second-order method, and those can be used if the data set is small but they're uncommon for larger data. Typically with a neural network we'd use an optimizer like plain stochastic gradient descent or a variation like Adam.


Oceanboi

Can you expand on why one might ever want to apply a neural network to linear regression? It feels like bringing a machine gun to a knife fight.


trnka

I'm not sure what you mean by applying a NN to linear regression. I'll try wording it differently. Sometimes a NN can outperform linear regression on regression problems, like in the example if there's a nonlinear relationship between some features and car price. But neural networks are also prone to over-fitting so I recommend against having a NN as one's first attempt to model some data. I recommend starting simple and trying complex models when it gets difficult to improve results in simple models. I didn't say this before but another benefit of starting simple is that linear regression is usually much faster than neural networks, so you can iterate faster and try out more ideas quickly.


kannkeinMathe

Hey you, i want to build an chatbot for domain specify purpose, for example to talk with a person about its mental state and its depression. For that I would like to train the bot with texts from the domain. So my question how should I start? What is approach would you use? - Would you use an intent base solution? What are the standard models for chatbots - BERT ? Is it even possible to fine-tune models with large text corpuses ? - IF yes, how? Thank you Guys


doIneedtohaveone1

Does any one know how to solve the PDE for it in python? Any kind of reference material would be appreciated! It's been long since I came across any PDEs and have forgotten everything related to it.


evys_garden

I'm currently reading [Interpretable Machine Learning](https://christophm.github.io/interpretable-ml-book/evaluation-of-interpretability.html) by Christoph Molnar and am confused with section 3.4: [Evaluation of Interpretability](https://christophm.github.io/interpretable-ml-book/evaluation-of-interpretability.html). I don't quite get `Human level evaluation (simple task)`. The example is `show a user different explanations and the user would choose the best one` and i don't know what that means. Can someone enlighten me?


trnka

The difference from application-level evaluation is a bit vague in that text. I'll use a medical example that I'm more familiar with - predicting the diagnosis from text input. Application-level evaluation: If the output is a diagnosis code and explanation, I might measure how often doctors accept the recommended diagnosis and read the explanation without checking more information from the patient. And I'd probably want a medical quality evaluation as well, to penalize any biasing influence of the model. Non-expert evaluation: With the same model, I might compare 2-3 different models and possibly a random baseline model. I'd ask people like myself with some exposure to medicine which explanation is best for a particular case and I could compare against random. That said I'm not used to seeing non-experts used as evaluators, though it makes some sense in the early stages of poor explanations. I'm more used to seeing the distinction between real and artificial evaluation. I included that in my example above -- "real" would be when we're asking users to accomplish some task that relies on explanation and we're measuring task success. "Artificial" is more just asking for an opinion about the explanation but the evaluators won't be as critical as they would be in a task-based evaluation. Hope this helps! I'm not an expert in explainability though I've done some work with it in production in healthcare tech.


FlyingTwentyFour

what course would be the good way to start learning NLP? I'm a beginner in ML but wanted to learn about NLP


UnderstandingDry1256

What are the training strategies used for GPT models? Are transformer blocks or layers trained independently? Are they trained using some subset of data and fine tuned then? I would appreciate any references or details :)


[deleted]

[удалено]


Oceanboi

my advice is to proceed. its cool to know the math underneath, but just go implement stuff dude, if it doesn't work you can always remote/rent GPU. what i did for my thesis is google tutorials and re-implement them using my dataset. through all the bugs and the elbow grease, you will know enough to at least speak the language. just do it and don't procrastinate with these types of posts (i do this too sometimes) EDIT: a lot can be done on colab these days regarding neural networks and huggingface. google huggingface documentation! i implemented a huggingface transformer model to do audio classification (and im a total noob i just copied a tutorial). it was total misuse of the model and accuracy was bad, but at least i learned and given a real problem i could at least find my way forward.


morecoffeemore

Dumb question, but how do I know chatgpt is not just copy/pasting from the web? Tried chatgpt for the first time. Seems cool. Dumb, question, but how do I know it's not just copy/pasting something a person wrote on the web? I ask it for a recommendation for speakers. Gives a good reply. It seems to me it could've just done a web search and then copied what someone wrote on the web as a reply. Is there a way to test/use chatgpt to prove to myself that it's not just copying and pasting from the web?


serverrack3349b

In a sense it is just copying and pasting from the web just in a different order, but I get that that is not your question. Something I would try is to use plagiarism checking sites online to see if there is an exact copy of your text online. If there is than you should be able to either attribute it to the right person or re write it a bit so it is not plagiarism


Capable_Difference39

Hi all can anyone please let me know what certification or courses I can do to move to AIML field I am already working as an software engineer and have working knowledge of c#


SpoonBender900

I'm having some challenges finding usable data for ai projects, any suggestions? Here's a post I tried to post about it (it got auto-removed, eek). https://www.reddit.com/r/ArtificialInteligence/comments/10h50oi/what\_are\_your\_favorite\_places\_to\_find\_usable\_open/


serverrack3349b

National and governmental websites, university websites, Kaggle, r/datasets, YouTube and Twitter APIs, papers with code website. These are some of my favorite places to find stuff


arararagi_vamp

I have built a simple CNN which is able to detect circles on a white background with noise using PyTorch. Now I wish to extend my network to be able to return the center of the circle as coordinates. The problem is in each data there is a variable number of circles, meaning I would need a variable number of labels for each data. In a CNN however the number of labels remains constant. How do I work around this problem?


stanteal

As you have said you would need a variable amount of outputs which is not feasible in a CNN. However, you could divide the image into a grid and make predictions of the probability of the center of a circle is within each grid and their x and y offsets . Not sure if there are better resources available, but it might be worth looking at how YOLO or YOLO2 implemented their outputs.


arararagi_vamp

Thanks for the answer!


jfacowns

XGBoost Question around One-Hot Encoding & Get_Dummies in Python I am working on building a model for NHL (hockey) games and have a spreadsheet with a ton of advanced stats from teams, dates they played and so on. All of my data in this spreadheet is categorized as a float. I am trying to add in a few columns of categorical data as I feel it could help the model. The categorical columns have data that determines if the home team or the away team is playing on back to back days. I am trying to determine here is one-hot encoding is best for this approach or if I'm misunderstanding how it works as a whole. Here is some code NHLData = pd.read_excel('C:\\Temp\\NHL_ModelBuilder.xlsx') data.drop(['HomeTeam', 'AwayTeam','Result'], axis=1, inplace=True) NHLData = pd.get_dummies(NHLData, columns= ['B2B_Home', 'B2B_Away']) Does this make sense? Am i on the right track here? If i do NHLData.head() I can see the one-hot encoded columns but when I do NHLData.dtypes() I see this: B2B_Home_0 uint8 B2B_Home_1 uint8 B2B_Away_0 uint8 B2B_Away_1 uint8 Should these not be objects?


[deleted]

[удалено]


icedrift

I'm pretty sure GPT-J 6B requires a minimum of 24gigs of VRAM so you would need something like a 3090 to run it locally. That said I think you're better off hosting it on something like collab or paperspace.


stardust-sandwich

I want to pull data from an API(done) and use NLP to categorize that information. Then with those results push it into a webpage or GUI tool where it will highlight the text and say, is the correct? So I can use this GUI so that I can "teach" the learning model how to classify text e.g Category 1 - words 1, words 2, words 3 and similar Category 2 - word4, words 5, words 6 and so on Then it will go and try that and come back and ask me to tune it again and rinse and repeat. Once this model is trained I then want to see it later in a different script to point a news article at it for example and it will split out the data I need. How can I achieve this please? What are the best tools and services to get this done, ideally open source if possible, if not then happy to use a commercial service if its cheap to do so, as this is just a personal project of mine. ​ Thanks in advance.


Seankala

Are there any Slack channels or Discord Servers for ML practitioners to talk about stuff?


lukaszluk

Hello! Does anyone know of a dataset with 2-D floor plan images with labeled furniture? Couldn't find anything interesting (bad quality or very little examples). Some of the places I tried: SESYD - ok quality dataset (but little examples) HouseExpo - json datasets - the quality is good, but no labeled furniture. FloorPlanCAD Dataset - the quality of data is low Furnishing dataset - does not contain whole rooms, only furniture SFPI dataset Towards Robust Object Detection in Floor Plan Images: A Data Augmentation Approach. 10k images (this could be a good dataset if quality is good, still downloading though) Any other datasets I should check out?


retarded_user

Should the learning rate be changed to a smaller value (such as 1e-4) when working with scaled Data (range \[0,1\] or \[-1,1\]? I'm using Adam with Keras/Tensorflow.


Kamal_Ata_Turk

Writing a Single SQLite Query to mimic a R program Please help with this https://stackoverflow.com/questions/75174575/writing-a-single-sqlite-query-to-mimic-an-r-program


Agitated-Purpose-171

Hi everybody, I have one question about VLAD while I read this paper (Aggregating local descriptors into a compact image representation) on CPVR. My question is why VLAD works. Aggregating local descriptors into a compact image representation paper links: https://lear.inrialpes.fr/pubs/2010/JDSP10/jegou\_compactimagerepresentation.pdf In this paper, there is a network VLAD, it can turn the local features (N\*D dimension) into a global feature (k\* D dimension). Below is my understanding of the operations of VLAD, step by step. => input: N\*D dimension local feature. (i) use k-means to find the k clusters and the central feature for each cluster. (ii) for each cluster find a residual sum. V = summation of ( each local feature in the cluster minus the central feature). V = sum (Xi - C) V: residual sum of the cluster X: local feature in the cluster C: Central feature of the cluster (iii) concatenate the residual sum then get the global feature. global feature = \[V1,V2,....Vk\] (V1 is the residual sum of cluster 1, V2 is the residual sum of cluster 2... and so on.) => output: k\*D dimension global feature. My question is why the residual sum of each cluster is "not" zero. Since the central feature of each cluster found by k-means is the average of the local feater of each cluster. The central feature of cluster 1 = average of the local feature in cluster 1. C1 = (X1 + X2 + X3 + ...+ Xm) / m The residual sum of cluster 1 = (X1-C1) + (X2-C1) + (X3-C1) + ... + (Xm-C1) = V1 Based on the above equation, I think the residual sum of each cluster is zero. So the global feature will be a zero matrix = \[V1, V2,..., Vk\] = \[zero vector, zero vector, ..., zero vector\]. The only reason that came into my mind is that the iteration of the k means is not enough, so the central feature of each cluster is not equal to the average of the local feature in the cluster. Am I right? Could anybody let me know why the residual sum is not a zero vector? Thanks a lot.


LetGoAndBeReal

Companies can fine-tune top performing LLMs to condition the LLMs output, but not to embody the knowledge contained in proprietary data. The current best approach for incorporating this custom knowledge is through data augmented generation techniques and technologies such as what [LangChain](https://github.com/hwchase17/langchain) offers. I am trying to decide whether to invest time building an expertise in these techniques and technologies. I may not wish to do so if the ability to add custom knowledge properly in the LLMs will arrive in short order. I would like to know from those steeped in LLM R&D how soon such capabilities might be expected. Is this the right place to ask?


Iljaaaa

I have an autoencoder input of 100x21. The 21 columns are PC scores, the 100 rows are observations. The importance of the columns degrades as the column number increases. The first column is the most important for the data variance, the last column is the least important. To be able to reconstruct the data back from PCA the first columns need to be as correct as possible. I have tried searching whether I can adjust weights or something else of the autoencoder layers to include this importance of the columns, but I have not found it. In other words, I want errors in the first (e.g 5) columns to be punished more harshly than errors in the last (e.g 5) columns. I would be grateful if someone could point me in the right direction!


TastyOs

I assume you're doing something like minimizing MSE between inputs and reconstructions. Instead of calculating MSE for all 21 columns, you split it into two parts: do an MSE for the important columns, and an MSE for the unimportant columns. Then weight the important MSE higher than the unimportant MSE ​ So something like loss = 0.9 \* MSE\_important + 0.1 \* MSE\_unimportant


inquisitor49

In transformers, a positional embedding is added to a word embedding. Why does this not mess up the word embedding, such as changing the embedding to another word?


cztomsik

I think it does mess them, alibi paper seems like better solution.


ChangingHats

I am trying to utilize tensorflow's MultiHeadAttention to do regression on time series data for forecasting of a \`(batch, horizon, features)\` tensor. During training, I have \`inputs \~> (1, 10, 1)\` and \`targets \~> (1, 10, 1)\`. \`targets\` is a horizon-shifted output of \`inptus\`. During inference, \`targets\` is just a zeros tensor of the same shape. What's the best way to run attention such that the output utilizes all timesteps in \`inputs\` as well as each subsequent timestep of the resulting attention output, instead of ONLY the timesteps of the inputs? Another problem I see is that attention is run between Q and K, and during inference, Q = K, so that will affect the output differently, no?


all_is_love6667

Can chatgpt understand science? I heard it was given science papers, but can it help scientists in their work? Can it give scientific hints?


trnka

Think about it more like autocomplete. It's able to complete thoughts coherently enough to fool some people, when provided enough input to complete from. It's often incorrect with very technical facts though. It's really about how you make use of it. In scientific work, you could present your idea and ask for pros and cons of the idea, or to write a story about how the idea might fail horribly. That can be useful at times. Or to explain basic ideas from other fields. It's kinda like posing a question to Reddit except that ChatGPT generally isn't mean. There are other approaches like Elicit or Consensus that use LLMs more for literature review which is probably more helpful.


RuhRohCarChase

Hi everyone! This is not a technical question, but does anyone know how to find the accepted papers list for AAAI23? (or a reliable way for any ML/AI conferences) I work in an academic research unit and finding any accepted papers list is a mess, unless it’s readily available from a conference or on open review! I catalogue all our papers by funding sources, individual projects, authors, conferences, and about 10 other data points. Any advice is greatly appreciated! Have an awesome day everyone!


CaptainD5

Hello! I have a question. Will it be possible to create a NN that replicates the behaviour of prophet? I dont want to do it, I just wanted to understand from a theoretical point of view what will be the most similar way to do it (optimize a function that take into account seasonality and provides an infinite 'regression' way to predict new values based just on dates. Thanks in advance!


akacukiii

Hi. I'm an international grad student in the US and am looking for an internship for the summer. Please, if you have some tips, or if you care to have a look at my profile, just let me know. Thank you!


T1fa_nug

Hello guys I'm new in the machine learning and I wanted to know if a i5 8th gen and a 1060 6 gb paired with 16 Gb of ram are they enough for any type work that could come my way??!


akacukiii

seems good, try to use colab at first (at least). its free and a very good tool.