Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Go+: Go designed for data science (goplus.org)
199 points by angrymouse on March 27, 2021 | hide | past | favorite | 170 comments


Go is a great language, but it seems terribly suited to data science. The popular data science languages are Python, R, Julia, and to a lesser extent Scala. They’re all extremely flexible languages, where you can easily write high level abstractions/DSLs, and they all have very strong functional programming support, because data science tends to be extremely functional. They also tend to be very concise languages.

Go is at the complete opposite end of the spectrum - not flexible at all, it’s purposefully difficult and awkward to write high level abstractions/DSLs, there’s very poor functional programming support, and it’s very verbose. There are great reasons for these restrictions, they’re intentional design decisions, but they also make it a very poor fit for data science IMO.


Not trying to start anything, but what's functional about Python? It doesn't have/support tail recursion, a strong type system, pattern matching, immutability-by-default for lists and dictionaries.

From where I'm standing, python has some features that kinda look like functional programming concepts, but overall is an OO imperative language, like Ruby and many others.

My understanding for its preference from the DS community is due more for its library support in that domain.


> strong type system, pattern matching, immutability-by-default for lists and dictionaries

As a side note, its really interesting just how much the popular conception of "functional" has changed. 10 years ago, I don't think anyone would have listed any of those as being important or suggestive of functional programming. Nowadays, "functional" means "like Haskell" instead of "like Lisp." I think we need to be careful when we talk about functional programming because so many ideas have jumped the paradaigm and it means so many different things to different people.


Scheme and Standard ML standardized these very features in the 70s and 80s as a part of the “functional programming” paradigm.


Scheme doesn't have a "strong type system", "pattern matching", or "immutability-by-default".

Scheme didn't standardise those very features in the 70s and 80s – it still doesn't have them.

Some of those features are available in add-on libraries or as extensions in some specific Scheme implementations, but they are thus far absent from the standardised language.


That's true, but python isn't really functional in the "like lisp" way either. Things like ifs, loops, etc. are statements, not expressions. Lambdas are pretty limited (they can only be one line). There is no tail recursion.

Functions are first-class objects, and it supports higher-order functions, and had closure (even if in any non-trivial case you needed a full nested def) which were less common features when Python was first introduced, and probably why python was labeled as "functional." But now those are standard features in almost every modern language, so using that as a criteria for "functional" languages is not a very useful distinction.


No, Python's lambdas can have as many lines of code as you please. They are not limited to one line.

I'm not sure where this myth comes from, but I see it a lot. Maybe some people think that "lines of code" == "statements", but these are not remotely the same thing, even if they happen to coincide in simple cases.

Python's lambdas are limited to one expression in the implied return statement, but not allowing multiple statements in lambdas is no real limitation when programming in the functional style, as the true functional languages have no statements to speak of, only expressions, and their lambdas work exactly the same way Python's does. A single expression is all that a functional programming language's lambda needs.

Multiline lambdas are considered poor style in Python ("Why not use a `def`?" they'd say.), so you may not see them much, but they do work. The Hissp compiler, for example, relies on this feature. (I am the author of Hissp BTW.)


Python is not fully a functional programming language but it supports some functional patterns. There's a nice mini-ebook by David Mertz on functional programming in Python, and it used to be freely available but I can't find it at the moment. However, he wrote an article version here: https://developer.ibm.com/languages/python/articles/l-prog/

Also, pattern matching is coming to python in 3.10. You can read about it here: https://www.python.org/dev/peps/pep-0634/


Dry python returns library gets pretty close to feeling like scala cats. Unproductive diversions and all.


> a strong type system

It's a myth that dynamic languages can't have strong types. Python aborts almost immediately whenever it can. For instance, adding a number to a string? Exception. Accessing undefined properties?

Furthermore there's a language-standard static type checker, mypy.

> pattern matching

We have that in Python 3.10.

> immutability-by-default for lists and dictionaries

We do have tuples and frozendict.

Arguably its implementations of functional features are much weaker than "truly" functional ones such as Lisp, Haskell, OCaML or F#.


> Python aborts almost immediately whenever it can

doesn't sound very strong


Runtime type checking is most definitely not what people mean when they talk about strong type systems.


Strong != static.


I love python, and I don't know if the GP are good points, but your answer is really disengenuous.

> > pattern matching

> We have that in Python 3.10.

> > immutability-by-default for lists and dictionaries

> We do have tuples and frozendict.

3.10 like the version that is not released yet?

Tuples an frozendicts, so precisely non default list and dicts?


The tools are there but you don't like their names.


Neither tuples nor frozen dicts offer efficient (logarithmic or constant time) updates, like lists or balanced trees or HAMT do. You can't really write a program with only immutable structures in python, unless you accept it will be unbearably slow, even for python. Clojure, erlang, elixir, these are dynamically typed and functional.


I think the appeal is with Jupyter [1] notebooks. Python is not about performance. Usually numpy (or other libraries) that does the heavy lifting on another language anyway.

But having the Jupyter notebooks allows for intractability with the data. Make changes, and see how it affects every step after it.

[1] https://jupyter.org/


- map/reduce/filter/for-comps in the standard library. Go doesn't support this style of programming, and because of the lack of generics, you can't write generic data structures with these types of methods either. It's all loops and mutation in Go

- first class functions. Go does have these

- concise lambda syntax, that makes them nice/easy to use. Go has first class functions, but a very verbose/awkward lambda syntax

- can easily create your own generic data structures with functional interfaces (can't do this in Go b/c no generics)

- Python is pretty strongly typed, and if you meant statically typed, there's now optional static type checking in Python, similar to TypeScript (not as robust/well implemented though)

- Python has decent immutability support. For example, dataclasses (https://docs.python.org/3/library/dataclasses.html) with frozen=True are a lot like immutable classes in more purely functional languages (i.e. case classes in Scala). Tuples and named tuples. There are libs out there for frozen (a.k.a. immutable) dicts, lists, etc.

- Python is about to get pattern matching in 3.10

- functools (https://docs.python.org/3/library/functools.html)

- etc.

You can absolutely use Python in a very mutable-OO style, but it also has pretty good functional programming support. If you look at most Python data science code, it's written pretty functionally.

I'd say most important for data science applications is the ability to create generic data structures with functional interfaces - you can't do this in Go, makes it really awkward to write a lot of the foundational vector, data frame, etc. libraries, that basically all higher level data science libs depend on.


Functional languages don't need a strong type system


IDK if its Go's problem honestly. Data modeling is hard. Its hard for a reason. If a language like python makes it seem easy, its still hard but your perception and attitude towards it has changed because some of the busy work has been taken out of it - possibly in a way that costs you down the road.

Let's be honest programming languages are the punching bags of developers.


There are mainly two types of data scientists, A and B [1].

Those B types are probably want to use Go for building data analytics pipeline similar to Pachyderm[2]. If you want to go the way of the compiled language for data science and numerical analysis the best bet now is probably Fortran. The fact that Swift for Tensorflow project was started and terminated recently really showed that there is a need for a proper and modern compiled language for data science and numerical analysis.

There is, however, a dark horse in the data science and numerical analysis in the programming languages race that perhaps can satisfy both type A and B data scientists. The dark horse is D language. It supports functional, object oriented, borrow checker, inline assembler, REPL, metaprogramming, CTFE, open and multi-methods, just to name several modern features suitable for data science and numerical analysis but admittedly the eco-system is rather poor as of now (e.g. no library for Arrow). It also very fast to compile and run even with GC (the GC is also configurable) and you can selectively opt out for no GC inside the same code base if blazing speed is your things.

But the glimpse of what it is capable of are there already albeit still in infancy compared to the mature languages like Matlab, R or Fortran [3][4]. But hey, Rome was not built in a day.

[1]https://www.quora.com/What-is-data-science/answer/Michael-Ho...

[2]https://www.pachyderm.com/

[3]https://tech.nextroll.com/blog/data/2014/11/17/d-is-for-data...

[4]http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...


That need is fulfilled by languages like Fortran, which is quite modern with OOP and generics, the age of punch cards is long gone.

Or HPC languages like Chapel.

Not only they are compiled, they offer first class support for distributed HPC and GPGPU computing.

Go is nowhere close to offer such capabilities.


Why not Julia?


Please check this post mortem on Julia[1].

Granted, this is probably a pre-mature assessment on Julia.

Coincidently the top most comments are lamenting on Google having a missed opportunity on Swift for TensorFlow project (mentioned in my original comments) and if it was done in Julia, the project would have been a success ¯\_(ツ)_/¯

[1]https://news.ycombinator.com/item?id=26384133


> Go is at the complete opposite end of the spectrum - not flexible at all,

You must be kidding. Go is the flexible one (not one of) in static popular languages. It is even more flexible than many dynamic languages. It supports function types as first-class citizen, closures, value methods as functions, type methods as functions, type deduction, .... IMHO, the main sell point of Go is not simplicity, but overall balance and flexibility: https://github.com/go101/go101/wiki/The-main-sell-point-of-G...

> there’s very poor functional programming support,

This is true currently, but this is not caused by lack of flexibility, it is caused by lack of custom generics instead.


Fair enough, flexible is an extremely loose term. I was referring mostly to the ability to a language that's flexible enough to let library/tool authors create their own very high level abstractions and DSLs. In Go, lack of custom generics often makes this very difficult. You look at the kind of APIs offered by mega-popular data science toolkits like pandas and Spark, it's really hard to offer something similar in Go. You end up with a lot of inferface{} types everywhere, vectors/series/whatever carrying their type in a struct field, etc.


The first things I would look for in a data science language are multidimensional arrays, linear algebra packages, data frame and time series libraries ... none of which feature on this page.


Yeah I'm confused. The only "data science" I can see here is the the title.

How is list comprehension a data science primitive? How did this get over 4,000 stars on GitHub with a glaring lack of basic data science functionality? Is this used by actual practitioners?


GitHub stars are bookmarks for me, not an indicator of usefulness.

It does say it’s under heavy development.

Maybe 4.3k+ GitHub users just want to make sure they get updates?


I wish HN had a way to save a story without upvoting it or showing it publicly on your profile (the "favorite" feature implemented right now), like Reddit's "Save". Many times I'm interested in something to check out later but it's not something worth upvoting (like this story, based on other comments) and I want my interests to stay private.


This issue is better solved externally - using a bookmark manager. It would allow you to have all your "read later" links in one place rather than being scattered over different websites. Personally I use Safari's reading list feature for that.


Quite a few people were unhappy when twitter renamed "favorite" to "like" because they had used it as a bookmark and did not want to imply advocacy. Seems like both intents could be supported fairly easily.


I recommend Instapaper or Pocket. They’re cheap, but worth it


Right next to Star is Watch, which would be much more suitable towards that, no?


No, Watch emails you a bunch. Stars just show up in a list so you can find it later. That being said, public bookmarks always seemed weird to me. Why not just actually bookmark it with your browser? Not that it matters.


If you are using github app, it's less friction to star it than open the page in the browser window and bookmark it.


What's the benefit of using the app?


App provide better UI and UX for mobile devices. It's faster, easier to navigate (bottom navigation) and some features work in it which don't on the mobile site (I can't remember which views force desktop view).


GitHub added a "custom events" for Watch. You can for example only watch on new releases. You should maybe check it out!


That's still not the same as stars, it still emails you.


Why would it be the same? The purpose of "Watch" is to have a notification.


I know, that's what I'm saying. We were talking about stars and then you said I should look into Watches instead, but what I was saying is that I don't like watches because you get emails. Customizing which emails doesn't help with that. Apologies if I was confused about something.


If you've ever tried to use Watch as a bookmark, I feel like it's obvious why that is not a good solution


You may want to revise your judgement, GitHub added support to watch on "custom events", such as: new issues, new PRs, new releases, etc. You might want to try again.


Chinese based github project, the stars mostly are hyped.


Thank you! I've seen this language/extension/library pop up a few times and I don't see, even remotely, how it could displace the Python data science stack. The biggest competitor to Python in this space, IMO, is Julia. Go+ seems light-years behind, and heading in the wrong direction entirely.


R is the competitor, actually many of things in the Python data stack are directly copied from R: seaborn's ~ operator, dataframe, ...


My point is that I enjoy the Python stack, and I'm seriously considering Julia on future projects; I'm not giving R the same consideration. Python vs. R is almost a matter of taste IMO. I vastly prefer Python to R for data science. That's not to throw shade at R. Like you suggested, the Python stack owes R everything.


Apart from Gonum[1] numerical libraries, I haven't found specific data science related Go libraries in my search for it for some hobby projects when compared to Python ecosystem.

Interestingly Prose[2] A Go library for text processing yielded better results for named-entity extraction when compared to NLTK in my tests in terms of accuracy and obviously performance.

Perhaps Go is not being applied enough in the Data Science/ML and for fields where it's applied (Network) Math in the standard library seems to be sufficient.

[1] https://github.com/gonum/gonum

[2] https://github.com/jdkato/prose


Yea, my list would also be:

- ndim arrays with broadcasting

- time series

- plotting

- linalg: blas/mkl

- storage - hdf5, zarr, arrow, parquet, netcdf

I don't see any of those either in go+.


Seems like Julia could do all of those things.


I might contribute a feature towards this, specifically a time series lib.


I write a lot of Go, and I spend most of my time doing analysis (usually in R, occasionally in python). I'm interested to understand whether there was a specific motivating example that drove the creation of this new go-like language.

This is Hacker News, so there definitely doesn't need to be anything beyond "I could, so I did." But if this actually solves some problem better than existing solutions, it would be cool to read about. Edit: Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science.


> Without a motivating example, it's hard to imagine that people will want to pickup a Go-like (but not exactly Go) language for data science

Exactly. I use almost exclusively python (including for data science- or ML really). I've been wanting an excuse to learn Go by doing a project with it. But learning some third Go-like language would be a tougher sell for me, unless there is really something it does better than python, because it still doesnt give me the benefit of learning Go.

But like someone else said, "because you can" is usually a good enough reason to build or learn a new language, so I'm sure it's still worth it for many.


If you are looking to learn a language specific to data science, Julia is fairly mature.


It seems it compiles down to golang usually? As in the read me they run ``` gop go tutorial/ # Convert all Go+ packages in tutorial/ into Go packages go install ./... ```

It's more like typescript for javascript than a completely separate language.


Go has a ton of potential in the data science space.

A basic DataFrame library would go a long way. Doesn't have to be as full featured as Pandas. Just something that's maintainable and portable.

I wrote a blog post a few months ago on the current Go DataFrame libraries (gota, qframe, dataframe-go): https://mungingdata.com/go/dataframes-gota-qframe/. None of the current offerings are integrated with Arrow.

An Arrow-backed Go DataFrame library that can read / write Parquet files could really jumpstart data science in Go (really data engineering in Go, which is where they should probably focus first).


Maybe a high-concurrency experiment runner or data flow engine, but Go would probably be the last "modern" language I think of as being good for data science.

All of the features that make it great for writing high-concurrency web applications would make it painful for writing tabular data processing, array manipulation & linear algebra, and plotting.

Nim seems a lot more practical; it's easy to bind to existing data science libraries, and you can use the macro system to build more expressive DSLs. That said, since Julia already does pretty much anything I would need to do (and will hopefully one day have a fast start up times and/or AOT compilation), I'm not sure why you would want to use Nim either. Maybe use it to write some kind of "mid-level" library code that binds to something like Torch, which you could then use from an even higher-level interactive language.

Apart from the incumbents -- Julia, Python (grandfathered in + you can use Hy/Hissp/Coconut), and R -- maybe you could have a good time doing data science in Common Lisp or Racket. Again: good CFFI story, macros for expressive DSLs, flexibility to run in interpreted and compiled modes, dynamic/gradual typing for easy iteration, etc.

Hell, I would sooner take Lua for data science over Go.

That said, I am an "Arrow maximalist", because the beauty of it is that you should be able to use data frames even in Go if you really want to, without reinventing the CSV parsing and memory layout wheels.


> data science in Common Lisp or Racket

Similarly, Chibi or Gambit Scheme.

> I would sooner take Lua for data science

Which provides for a low level language like Terra or a Lisp via Fennel or Urn.


Incidentally, Lua has DS history, as it was used by Yann LeCunn for torch, which was a Lua library.

There were a whole bunch of goodies in the surrounding ecosystem, as I recall.

Then Yann got acquired by FB, and it all got re-written in Python (hence pytorch, as opposed to torch which was in Lua).


> Go has a ton of potential in the data science space.

Does it? I'm not familiar with Go data science applications but the design of the language, tooling and runtime, eg low latency garbage collector, errors thrown for unused imports, do not, to me, seem to fit well with the needs of data science. I'm interested in hearing what advantages Go brings.


I guess the best thing would be the lightweight and simple concurrency model of go when it comes to data science applications. But other than that, I can't really think of a good reason why go should have so much potential.


How do unused imports relate to a language's suitability for data science? Your Python IDE adds and removes imports as you use them. Your Go IDE adds and removes imports as you use them. Unless you're using "ed" as your editor, it shouldn't even be something you see or ever think about.


> errors thrown for unused imports

You're doing something wrong if it doesn't get cleaned up automatically.


Data science is an experimental activity, whereas golang is explicitly a production platform. The amount of friction this will introduce is too high for practical use.

For example, in golang you will get a complication error if you have an unused variable, leading to significant extra work when exploring code level alternatives.


I can see a lot of potential in Go for data engineering specifically, yeah. Those would probably be some very stable and performant ETLs. And the concurrency and network primitives would make it easy to develop libraries like Prefect/Airflow.


Yep, agreed. Go is a great language for AWS Lambda type workflows.

Python isn't as great (Python Lambda Layers built on Macs don't always work). AWS Data Wrangler (https://github.com/awslabs/aws-data-wrangler) provides pre-built layers, which is a work around, but something that's as portable as Go would be the best solution.


Love awswrangler. I use that over boto whenever I have the opportunity.


We use Go for our ETL, with some Python too. We are in the process of transitioning to Argo Workflows from a K8s CronJob/Job setup which has been pretty stable itself.


The biggest hurdle for Go in this realm is honestly the Go—>C FFI latency. It severely limits acceleration


> Go has a ton of potential in the data science space.

I don't think that a language where you can't write generic map/fold/reduce and typed DataFrames (such as Spark's DataSet) has "a ton of potential".

Go is worse than nearly any dynamic or static language I know in that regards. Even Java has way more potential than Go.


In the same GitHub organization, there is https://github.com/goplus/pandas, but it seems to not have progressed past a README.


The env gop run shebang line is not posix-compliant; posix only requires support for a single argument in the shebang and this one has two arguments (gop run).

</irrelevant unix nerd mumbling>


POSIX is thirty three years old.

Can we please consider certain modest improvements?


The latest revision of the POSIX standard is only 4 years old.


And yet it is still all about writing CLI and server daemons, stuck in the early 80's timesharing computing world.


That is an odd comment to put using HTTP onto a process listening to port 443 so it can be stored by way of sending certain bytes to a different process listening to a port.


Except that the application that is able to display and understand what those magical HTTP contents mean isn't part of POSIX.


There is env -S that supports multiple arguments. This was always an extension available in BSD I think, and it is available now in recent versions of GNU's env.


GNU Coreutils env supports the -S option as of v8.30. Ubuntu 18.04 LTS appears to be on v8.28, but 19.04 supports it. (https://stackoverflow.com/q/4303128)

(Also, I didn't think the shebang was specified by POSIX at all? Am I wrong?)


If anyone is looking for an alternative to R or python, there's Julia already.


If this project can maintain Go's fast compile times and ability to make reliable, concice binaries, those would be two big pluses in areas where Julia is currently weak. That would make this a good choice in projects where those are high priorities.


I think what people want is all the great things Go brings to the table but just geared a little more towards data work.

Julia offers a lot in the data world and not much in the engineering world.


Also Rust has a good datagrams library now, polars. Not as mature an ecosystem as Julia, but hopefully it improves in the future.


I find Rust's borrow checker too clunky for exploratory work. It breaks my flow and imposes higher cognitive load. The slow compiler doesn't help either.


That is fair. I have not done enough exploratory work in Rust to comment. Maybe there are abalysis patterns that can avoid bumping up against the borrow checker.


Yeah Rust is really close to what I want but I agree the borrow checker adds a bit too much overhead when dealing with data science.

It seems like reference counting is probably the move here


I tend to think the innately slow compiler is basically a fatal mistake. Rust will never be able to be used for large projects. It's not so obvious now because everything it's used for is tiny.


Hopefully some of the work out of cranelift, gccrs, and/or rust-analyzer can be used to speed up compilation.


Seriously one hopes. I think of projects in C++ that takes half an hour to compile, in rust they would take half a day or longer.


What supports that statement exactly? If you split your big projects into crates it would recompile pretty fast. C++ can also take ages to compile (eg. compiling Firefox from scratch). Keeping your code modular to get decent compile times seems like a win win.


I never compile C++ projects from scratch like I am forced to do with Rust.

The only code I compile from scratch in C++ is the code I write myself, everything else is available as binary libraries, something that cargo doesn't do, and it is not part of the near future roadmap, if ever.

Then, after compiled, most of the stuff lands on the VC++ metadata files, so incremental compilation and linking cuts even more time from the usual edit-compile-debug workflow.


Interesting, is this common on windows? On Linux I've never seen precompiled C++ libraries (at least not these with templates) back when I compiled stuff more often (read: back when I used gentoo). Do g++ and clang++ support precompiled libraries in the general case? I suppose C++ modules might make it more common anyway, but I don't see why rust couldn't do it if they ever prioritize it.


The distro takes care of it and you just yum / apt-get / whatever the lib and then compile the code you typed in. Template libs will slow down your compile times but there is still a lib boost.so etc sitting around.


I used to do a lot of machine learning code in go and think it has great potential as a compiled, static language with similar ease of development to python.

However it is hard to get around the lack of operator overloading and (to a lesser extent at least to me) generics. I love the simplicity of the language and understand their feeling that operator overriding is too often abused but at the same time not being able to use algebraic operators for matrix and tensor libraries makes them really hard to use.

The compacting garbage collector can also make it hard to pass pointers to memory to non go libraries which is key in data science.

If this project could address those things I think it could have real potential


> With similar ease of development to Python Isn't the goal of general typed languages like Go or Rust to run -not build- the scripts of softwares in data science for example? I wouldn't compare Python and Go, it's different use case to me.

While Go looks to be in the middle, Rust is at the opposite of Python and it must be a good to choice for building data software that run data scripts.

> The [Go] lack of operator overloading => https://doc.rust-lang.org/rust-by-example/trait/ops.html

> The [Go] lack of generics => https://doc.rust-lang.org/book/ch10-01-syntax.html

> not being able to use algebraic operators for matrix and tensor libraries https://tensorflow.github.io/rust/tensorflow/struct.Tensor.h...


One of the original intents of go was to make a static, compiled language that felt familiar to python/ruby programers. This manifests as a really concise syntax (type inference via := etc) and a tight development loop enabled by fast compilation times (enabled by being strict about unused dependencies etc).

I was for a time optimistic you could use it as your scripting language without much downside and get all the upside of compiled static types. Rust looks cool and I want to do a project in it at some point but at the moment I'm most optimistic about python with optional type annotations that are understood by compilers and alternative runtimes.


At Google, Go is mostly used for stuff that they would have used Python for in the past. Idk about the rest of the world.


Currently working as a backend dev in a mid-sized company. Current directive is a gradual migration to Go for backend services that used to be written in Python/Django.


Why?


Go is a more restrictive language, which makes it slightly harder to create horrible codebases. It's also faster and a bit cheaper to deploy.


> which makes it slightly harder to create horrible codebases

Going to have to strongly disagree. It forces you to make horrible codebases with endless boilerplate code and increased complexity introduced by workarounds for abstractions you can suddenly no longer make due to questionable language limitations. You will get improved performance, however.


I've seen people complain about that, but I've been using golang for over two years, and I haven't really had to face that pain, yet. I used python for twenty years prior to that, and love sophisticated programming constructions (did a lot of work with clojure, learnt haskell, went through On Lisp), so it's not as if I don't know what I'm missing.


Any abstraction possible in Python can be expressed in Go just via the interface{} type, as the type of everything in Python is just interface{}.


No, that's not true at all. Just try to create an OrderedMap that supports the same abstract interface as Go's built-in map type, or try to implement a decimal floating point type that supports the same operators as the built-in binary floating point type. It's not possible.


Whether something can technically be done and whether it is good/easy/simple/etc. are totally different conversations. I'm pretty sure you can't implement a min function that works on both strings and ints in Go by using the interface{} abstraction.


Interesting, I wouldn’t have thought of Go for ML. But I do share the enjoyment of static languages for Ml/data science. You might give Nim a look as it’s pretty practical for wrapping C++ code!


Ref operator overloading. As someone not used to python but had to read a simple numpy script last week, I was stumped for a while on this line of code: X[y==1,0] Just that.I first thought, what would X[False,0] be? Since y was a vector, it obviously wouldn't be equal to one. Okey, but extracting that part, it looks like y==1 takes my vector, and replaces with an array of same size, with true or false for each element. Basically == is overridden to run a predicate over all elements. Okey, but then what does X[[True,False,False..],0] mean? Looks like numpy has overridden the [] so one can pass an array of booleans in addition to a normal index, and then it only keeps those elements corresponding to True indexes.

Clever and useful when done daily I guess, but damn it was hard to understand those 9 characters as someone not well-versed in this domain.


I never understand why operator overloading is said to make things more readable.

If the meaning of an operator can change wildly with the operands then that's just confusing - you can't assume that '==' means what you think it means and you have to go find out what it means.

In comparison, having an actual function name to clue me in on what something does is useful. Like, how is "X[y==1,0]" more readable in this case than something like "filterElements(arrayToFilter, arrayOfBools)"? (if I've understood what the original was trying to do, which I'm not sure I have).

People seem to confuse "less typing" with "simpler", and that's not true. One of the great strengths of Go is that it rejects this and embraces true simplicity.


> I never understand why operator overloading is said to make things more readable.

Because, used properly, it does.

> If the meaning of an operator can change wildly with the operands then that's just confusing

Yes, irresponsible use of operator overloading makes things confusing.

Overloading enables preserving existing semantics with new types that have similar semantic roles, it also enables natural, concise, domain specific notation which may sometimes have different semantics than the standard use (while wild, unpredictable semantic swings hurt readability, humans are naturally quite good at incorporating context into interpretation of symbols/language, and avoiding context sensitivity for naive simplicity does not aid readability.)

Verbosity can be quite bad for the ability to quickly grasp the meaning of things.

> People seem to confuse "less typing" with "simpler

Conciseness (not mere terseness, but clarity and terseness together) greatly aid readability. Verbosity is not zero-cost.


> Conciseness (not mere terseness, but clarity and terseness together) greatly aid readability. Verbosity is not zero-cost.

I've been coding for 40-ish years. I've never found this to be true. Simple expressions are (in my experience) more readable.

I understand it like this: to understand a complex expression you have to unpack it in your head to a simpler version in order to grok it. This is an operation you don't need to do if the expression is in the simpler, more verbose, version in the first place.

This is a known thing in writing, btw - complex sentences are harder to read. If you want your audience to understand you, write more, simpler, sentences.


> I've been coding for 40-ish years.

Good for you, I've only been coding for 38 years.

> Simple expressions are (in my experience) more readable.

Simple is not the inverse of concise; there may be times when simpler expressions are more verbose, but that's not even approximately generally the case. “x²+1” and “x*2+1” and “add(pow(2,x),1)” and “x raised to the second power plus one” are equally simple (or, at least, the later ones are not more simple), but they are progressively less concise.

(It's true that expanding the space of concise expressions may require more complex notation, and when the notation is unfamiliar, that creates a learning curve for learning the notation, but there's a reason people familiar with domains develop notations that support more concise expressions.

> I understand it like this: to understand a complex expression you have to unpack it in your head to a simpler version in order to grok it.

That's true of complexity of expressions, but again that's not the issue here. And concise notation expands the kind of expressions that can be grokked by pattern recognition rather than unpacking.


I think for terser expressions to be more readable, the reader has to be more context-aware and generally more immersed in the paradigm. There's an understanding of the language that needs to be acquired.

Less terse language relies less on shared context, and thus is easier on newbies. There is less assumed knowledge, more things made explicit.

> And concise notation expands the kind of expressions that can be grokked by pattern recognition rather than unpacking.

I have this totally the other way. After years of coding in Go, I can parse "if err != nil" subconsciously and only ever deal with it if it's not that (e.g. if err == nil). It's not concise, but it is very, very easy to read.


I can comfortably display maybe sixty lines on my screen. “if err != nil” wastes three of them, every time I do anything. I don’t want to explicitly bail out on an error for the same reason I don’t want to explicitly set up a stack frame or interpolate values into a string. I only want to deal with how this program is different than other programs, not the mechanics of how f(g(x), h(y)) is orchestrated.

Any worthwhile tool is going to be used for years, and you’re only going to be newbie for a small fraction of the time. It’s better to invest time learning a good notation than to force all the expensive experts to slog through a bad notation forever.


dude, scrolling the page is literally a finger on the mouse wheel. I don't think "I need to see my entire program in one 60-line screen" is a good dynamic for coding.

Explicitly handling errors is one of those things that you get used to, for really, really, good reasons, when learning Go.

> Any worthwhile tool is going to be used for years, and you’re only going to be newbie for a small fraction of the time. It’s better to invest time learning a good notation than to force all the expensive experts to slog through a bad notation forever.

No, because assuming the next developer knows as much as you is probably wrong. Because reading code you wrote 6 months ago is like reading an alien script. And because Go (for very, very good reasons) optimises readability over terseness.


Long expressions with matrix operations is a pretty standard example. When people talk about operator overloading in data science, they usually mean “standard operations on various arrays of numbers,” which are defined in common libraries or the programming language. Not “I need to define my own ad hoc equalities.”


yeah, I get this. If there are standard definitions of operations that everyone understands, that's fine.

But I always think that maybe we should be using new operators for this, instead of overloading existing ones that have other, different, meanings in different contexts.


In a data science context, the key operations are math, so overloading makes a lot of sense and is massively helpful in implementing algorithms and equations. I go back and forth on the wisdom of some of the other common uses — filtering, etc. In addition to the problems that have been mentioned, there are often hidden and infrequent but painful performance issues.


I often think that maths could use the same slap around the chops. Less arcane operators and symbols, more explicit function names please!


I consider it kind of important that the notation for expressions like “A²” doesn’t depend on whether A is an integer, real number, complex number, matrix, random variable, etc., (even if the results do) or what the specific domain is, but if you feel like it’s important to embed all of that context in the exponent operator... give it a try :)

(And whether “2” is integer, real, rational, complex, etc)


Yeah, but operator overloading doesn't say any of that. You have no idea what the "^" operator does, depending on the operands


Ehhh, it would seem that way, but the compactness of the syntax functions to get out of the way and help you understand the overall structure. Having longer function names ends up getting in your way more often than not in my experience.


It's not so much a matter of reduced typing as that if you're invoking an operation many times, developing a concise notation for it can cut down on the noise it creates for a reader. It should be used very sparingly and heavily documented, though, for exactly the reason you outline.

It really comes down to who you're writing the code for. For something like numpy, whose users will mostly be familiar with matrix notations, operator overloading enables a huge improvement.


> I never understand why operator overloading is said to make things more readable.

Ocaml doesn't overload even the arithmetic operators, so you write for integers

    1 + t * A
and for floating point

    1 +. t *. A
and for matrices you would make something like

    scal_mat_add(1, scal_mat_mul(t, A))
Do you really prefer these three, over writing

    1 + t * A
for all cases?


> how is "X[y==1,0]" more readable in this case than something like "filterElements(arrayToFilter, arrayOfBools)"?

Just the same way that a[i] *= b[j] is more readable than a.IndexElement(firstIndex).MultiplyByFloat(b.IndexElement(secondIndex))


yeah, sorry, but I didn't understand the first one at all, and totally understood the second one. Your definition of "readable" and mine differ ;)


I don't follow. You have been coding for 40 years and you do not understand a[i] *= b[j] at all but you understood the expression I made up?


Bah, it's the same as Maths: notations 'compress' the formulas but at the cost of having to learn these notations..


yes, this. Completely.

Is that more readable or less?


For who? Beginners or experts?


Experts. Optimising your whole notation for beginners is pre-emptively putting up a skill ceiling. Beginners stop being beginners (at which point they’ll outgrow the beginner oriented syntax) but experts will remain experts.

Instead, optimise for teaching/learning the skills better rather than capping everyone’s skills. The presence of a learning curve is not an inherently bad thing.

Edit: re-reading your previous comments, I think you and I are in furious agreement haha


That example is using “logical vectors”, which you’d come across in more data-science languages like Matlab, Octave, R, etc. Julia[1] has a more modern take on y==1, by having explicit syntax for element-wise operations, so it uses y.==1 instead.

What I’m really saying is that there’s quite a bit of precedent for that syntax, but it comes from a more specialised field so it is easy to have not come across it before.

[1] https://docs.julialang.org/en/v1/manual/functions/#man-vecto...


MATLAB introduced automatic broadcasting of operators over n-dimensional arrays and logical indexing nearly 40 years and it is still the primary learning language for applied mathematicians, engineers, and scientists, and also a popular prototyping language for numerical algorithm developers. And it provides a great interactive REPL with built-in plotting for exploratory data analysis.

Since doing this, the idea and basic syntax has been adopted by GNU Octave, S, R, and now NumPy and Matplotlib, which did it to make it easier for statisticians, engineers, and scientists to adopt Python. Specifically targeting these groups with familiar syntax is exactly why Python is so popular for data science, because data scientists tend to recruited from the hard engineering and science disciplines. It's a lot easier to teach basic programming to someone with a great background in applied math, experimental design, and research methods, than it is to teach all those things to programmers.

This is an area in which languages with operator overloading shine, creating DSLs that mimic the syntax and semantics of other languages. You might have a lot to learn because you're used to == only being defined for scalar data types and arrays only being indexed by natural numbers, but the people the language is designed for are used to broadcasted operators and logical array indexing.


I find this is common in python: there are nice shorthand things you can do that are definitely powerful, but they are not easy to understand nor to remember. Particularly with conditions applied to arrays / series this is a problem. "Truth of a series is ambiguous" is one of my most frequent errors.

That said, the overall ecosystem still makes python the most practical general data science language in my view.


Operator overloading and the Go->C FFI are pretty big hinderances.

Go just wasn’t designed for this kind of work. Which is unfortunate because it brings a lot of great things to the table.

Vlang is probably the closest spiritual successor that would work, or someone just needs to write a new language


There are quite a few languages I would like to use before Go. Especially F# seems very interesting for DS.


Even been wanting for some time to check out the F# R Type Provider: http://bluemountaincapital.github.io/FSharpRProvider/ Unfortunately it appears to be Windows-only, and my curiosity hasn't yet reached the point that I'd boot into Windows.


Sadly, it also doesn't work with .net core yet. Otherwise this would be a pretty convincing point for F#.


It also looks like it's been unmaintained for several years, and it talks about supporting R 2.x. R 4.0 has enough changes to the language internals that I'm not sure it would work with recent versions.


Yeah, that as well. Unfortunate.


Why can't you just build libraries to make Go a better language for data science? There's already Go support for a Jupyter Notebooks kernel: https://github.com/gopherdata/gophernotes


We could build such libraries, and people have built some.

However the task at it's heart is a vast duplication of work, and while Go has a lot of things going for it, it doesn't seem enough to sway many data scientists into reinventing their wheels in Go.

I don't blame them. Rewrites being difficult to justify or motivate when you already have a compelling implementation is part of the reason why we have significant amounts of FORTRAN77 code still kicking around today. It is also why for many things we opt to just write wrappers around existing C libraries to call them from other languages.

It has many shortcomings, but overall I prefer the sharing of a library across languages, each with it's own bindings that can attempt to make it more idiomatic to that specific language. The Go culture/community doesn't favor this approach, the Python community embraces it.


I just barely picked up Go, and my first impression is that it's very... opinionated.

It wants me to do if/else guards a certain way, you have to capitalize first letters of "exported" functions, it won't let me import `fmt` unless I use it, etc. I'm not sure I like it.


The opinions are by design. By removing flexibility, you can increase uniformity. Instead of having 12 different styles of code, there can be 1. It removes cognitive load, so you can spend your mental energy on solving problems.


And that's fine if go were the only language I ever used, but it's jarring going from unopinionated languages to a highly opinionated one. I have to have a special set of "go rules" in my mind to be sure to follow when using go which has the effect of increasing cognitive load. "oh right, go wants me to compress my if else clauses and put the brackets a certain way"


I don’t that appreciates the value enough. Yes, there is a cognitive load to learning the opinions of go (though gofmt keeps you from having to learn a lot of them, as do compiler errors) but that is true of any language. I also think the number of opinions you need to learn is far smaller, as you don’t have as big of a surface area to navigate when making design decisions about daily coding. I think they are net very positive.


I hope you have the presence of mind to realize the amazing benefit this has for the entire Go codebase in existence.


Nobody in data science wants fragmentation. Therefore, any aspiring new platform would need to bring some serious benefits to the table. I'm not sure what they are here.


I wish them all the best in this but it seems like an uphill battle, and doesn't seem to have a clear use case to me. For lightweight to medium projects R and Python are so well supported it's hard to reject them as the null. If you're doing exploratory stuff and want visuals, it's the same story with Rmd and Jupyter. For more behind-the-scenes production pipeline stuff there is already Scala which has inroads with Spark. If you really want to use something new, Julia is starting to mature and has all sort of plotting and linear algebra support. To me it seems Go would aim more to compete with Scala I suppose? I suppose then it might come down to plotting.

In terms of being a general-purpose DS language, I can't imagine using anything that doesn't have a clear strategy to A) get a dataset into a DataFrame or similar, B) get my collaborators a plot in a way that is quick and easy, and C) a lesser extent, some kind of notebook/reporting tool.

They do say there is a lot of development going on but it seems like a space with a lot of great incumbents and a rapidly maturing up-and-comer in Julia.

edit: typo


Seems like there's a potential trademark risk if Google decides it wants to protect the Go trademark.

https://news.ycombinator.com/item?id=20023137


There are a lot of numerical structures missing from this. Not sure if you can really advertise it as for data science without some kind of dataframe structure.


DataFrames? It doesn't even seem to have specialized array primitives like Series or NumPy. What the heck?


Agree with all the comments like "where are the nd arrays?"

BUT, they have list comprehensions!! One of the main things I miss coming from Python.


Once generics get introduced into the language, you can write a generic map function which can take a []T and a func(T) U to return a []U. While it's not as elegant as a list comprehension, it's nicer than writing a for loop every time. Although, I can't remember what the performance impact of closures are in go, so this might not be a cheap operation.


In practice, no one will do this, unless there happens to be a function with the correct signature already available. The lambda syntax is so verbose that it's easier to just write the for loop.

Another problem is that tons of Go functions return (value, error), and it's not clear how such functions should interact with a "map" function. Return all the errors in a separate slice? Stop at the first error? What if you only want to stop when the error is io.EOF? etc.

I think we'll only see map/filter/reduce if the language is changed to specifically accommodate them. I've experimented with doing this myself, which people tend to view as heresy: https://twitter.com/lukechampine/status/1367279449302007809?...



Why not something like a channel map? Give it a channel and a function and you get another channel with a goroutine running in the background.


Oh jesus christ fuck no


Really cool thing! When it will be ready for production use?


What's the point?


not even half baked. its a webpage with a single feature that is broken.


Many of these features should be part of the upcoming GO 2


When I realised I couldn't divide a time period by a number or integer, the penny dropped that different languages excel at different things.


Are you talking about `time.Duration`? Because you can definitely do that.


Has that changed? I couldn't before!


You have been able to divide a time.Duration by an integer since before go 1.0. As you can see in the stdlib, time.Duration is just an int64 (https://golang.org/pkg/time/#Duration). You do have to cast the integer to a time.Duration sometimes.

Here's a playground showing cases where it works and cases where it require a cast: https://play.golang.org/p/6Pbqrz8ZZ3t


So I'm not wrong then? You're needing to cast an Int into a time duration in order to divide a time duration?


Go requires explicit type conversions. If that was your original point, it was very poorly worded.


A time unit can only be divided by a time unit (or integer converted to a time unit)? That doesn't make sense to an engineer or mathematician.


But don't try to use BC dates with time.Duration, because they don't work!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: